EAD 2002 to EAD3 Walkthrough via GitHub

At the unveiling of EAD 3 at the 2015 Society of American Archivists annual meeting, the Encoded Archival Description Roundtable sent out a call for resources for the new standard. Here at the New York Public Library, we were excited to begin exploring EAD3's approach to archival description and how we could implement it locally. As such, we wanted to create a step-by-step migration walkthrough as a real-world demonstration of changes to the standard.

For our walkthrough, we wanted to provide line-by-line documentation and rationale for each step in the transformation. The natural choice for a delivery method was Github, a system for managing and sharing code. Github is powered by the git tool, which offers line-by-line versioning for documents — in other words, we could provide granular side-by-side comparisons of EAD2002 versus EAD3, along with documentation of each step. In addition, material on Github can be publically accessed on the Web; thus, the process and documentation can be shared and accessed publicly.

A line by line comparison of the changes in github

A reverse chronological list of changes made in the transformation can be accessed via the "History" tab on the repository page. Starting at the bottom, clicking on each change's title will open a line-by-line EAD2002 / EAD3 comparison, along with a description of why the changes were made. The final EAD3 record is available in the main repository page, or via this link.

A list of changes made during the conversion

Creating the walkthrough (and needing to show our work!) was instructive in the philosophy and design of EAD3. The driving force behind the new standard — and one that we were thrilled to see — is a greater emphasis on machine-readability of archival description. Certain elements in EAD2002, such as <physdesc> and <unitdate>, made affordances to natural language that were difficult to encode. For example, a complex <physdesc> such as:

<physdesc>
<extent unit="linear feet">8.3 linear feet</extent>
<extent unit="containers">21 boxes</extent>
<extent>679 kb (11 computer files)</extent>
<extent>19 audio files</extent>
</physdesc>

relies on the user's intuition to understand the complete extent (do the boxes and linear feet overlap. are the computer files part of the audio files?). EAD3 removes these ambiguities by adding the element <physdescstructured>, which requires the archivist to describe exactly what the extent describes and how:

<physdescset parallel="true" coverage="part">
<physdescstructured coverage="whole" physdescstructuredtype="spaceoccupied">
<quantity>8.3</quantity>
<unittype>linear feet</unittype>
</physdescstructured>
...

This design philosophy neatly resolves many of EAD2002's limitations and provides the ability to make more powerful statements about collections. Another example is how EAD3 makes identifiers – historically relegated to unitid elements and the obscure authfilenumber attribute – more powerful and prominent across the standard. As an example, container elements can now have barcodes and parent/child relationships directly encoded in the finding aid.

Similarly, controlled access terms, which previously were difficult to subdivide, can now be more granularly encoded using the new <part> element. <part> (an element inspired by EAC-CPF) can be used to encompass different parts of a controlled term; for example, names and dates can be separated out within the EAD3 record. Compellingly, <part> allows for the proper encoding of subdivisions, allowing for the assignment of authority URIs to each term. For example, the following EAD2002 subject:

<subject source="lcsh">Crimean Tatars – Civil Rights – Soviet Union</subject>

could never have proper id.loc.gov URIs assigned to it. However, in EAD3, the same subject:

<subject source="lcsh">
<part source="lcsh" identifier="http://id.loc.gov/authorities/subjects/sh87003883">Crimean Tatars</part>
<part source="lcsh" identifier="http://id.loc.gov/authorities/subjects/sh99004998">Civil rights</part>
<part source="naf" identifier="http://id.loc.gov/authorities/names/n80126312">Soviet Union</part>
</subject>

now has URIs at the subject level. This provides for powerful linked data applications and vocabulary control.

Our next steps at NYPL will be to generate EAD3 from archival descriptions in our Archives Portal, and continue to use EAD3 to guide our encoding practice. We also look forward to EAD3 implementation in ArchivesSpace, and building powerful new features using the standard. We are also excited to see how the archival community adopts EAD3 and builds upon it.