- About Archives
- About SAA
- Careers
- Education
- Publications
- Advocacy
- Membership
Access a copy of these annual meeting notes on Google Drive
SAA Metadata and Digital Object Roundtable Annual Meeting
Wednesday, August 13, 5:15pm - 7:15pm
2014 COSA-NAGARA-SAA Joint Annual Meeting
Welcome! This document will be updated in real-time with notes from the 2014 MDOR meeting. Feel free to contribute content or ask questions. We’ll have dedicated MDOR Steering Committee Members adding content and reviewing the document for questions during and after the meeting. You can also post questions to the MDOR listserv at metadata@forums.archivists.org.
For now, we’ve included a brief agenda for the meeting below. More details, including presentation abstracts, can be found at http://www2.archivists.org/groups/metadata-and-digital-object-roundtable/2014-annual-meeting-agenda.
Business meeting 5:15-5:45
SAA 2015 Program Committee report (Kim Sims)
No theme
Sections and roundtables will not be endorsing session proposals
All sessions will be 60-75 minutes (no more 90 min sessions)
Council liaison comments (Helen Wong Smith)
Guidelines for A&A listserv participation will be issued by SAA (any problems should be referred to listserv admin)
COPA: asking each component group to share your advocacy activities
Note on reception: Library of Congress has opened up several floors – go up to explore!
OCLC Research update (Jackie Dooley)
Faceted application of Subject Terminology – “FAST”
Simplified version of LCSH that is linked-data friendly
Took all the terms, pulled them all apart, and update database regularly
Faceted indexing instead of the more complicated strings of terms
Simplified interface
ArchiveGrid database
Used to be a subscription product, now free; OCLC Research actively working on it
Bruce Washburn will be in the exhibits hall
Election results (Sarah Dorpinghaus)
Thank you to members ending their terms:
July Seifert (intern)
Jacqie Ferry (committee member)
Cristela Garcia-Spitz (committee member)
Jody DeRidder (co-chair)
Welcome, new members!
Jasmine Burns (intern)
Rebecca Goldman (committee member)
Arcadia Falcone (committee member)
Kari Smith (co-chair)
Review of bylaws change (Sarah Dorpinghaus)
Sarah announced that MDOR membership approved three changes to by-laws and that the by-laws will be sent to Council for final approval.
Co-Chairs may be nominated from the general membership or the steering committee.
Co-Chairs and Steering Committee members are to be elected annually by the membership in an electronic election.
Leadership roles shall have one-year (renewable) terms, as decided by a majority vote of the Steering Committee.
Review of goals for 2014-2015 year (Sarah Dorpinghaus)
MDOR steering committee spent a significant amount of time on social media this past year
Steering committee is interested in exploring other communication channels, possibly a newsletter
Considering free MDOR-sponsored webinars led by MDOR members
New on the website: A Google calendar showcasing upcoming events of interest throughout the community
Metadata presentations 5:50-6:30
"What are We Thinking? Using Faceted Classification and Tagging to Enhance Subject Access to the Public Mind" by Elise Dunham, Metadata Production Specialist, Roper Center for Public Opinion Research, University of Connecticut
Overview of the Roper center: data sets and polls from thinktanks, media outlets, etc.
Homegrown systems for backup and management
Roper Express: primary goal to require the data sets from public opinion surveys
ASKI downloads
iPOLL database:
Question text, response categories, and marginal – from secondary sources, after press releases hit the web about a poll
Now they are publishing their own secondary research material “The Power of iPOLL”
All of this content in the three systems historically managed/cataloged by separate teams with different descriptive practices
Free keywording
Static lists of topics with terms strung together
Clear connections between content but lack of descriptive consistency
Goals:
Develop system for concept-based classification of manageable content
Implement workflows for identifying links between content at the point of acquisition/creation
Benefits of classification and tagging
Flexible, agile – be responsive to events
Indexer AND end-user friendly: student workers are familiar with tagging
Post-coordinated systems are quicker – but it is “quick” relative to other things
Iterative project
Had no formal taxonomy training previously so had to read up
“The Accidental Taxonomist” by Heather Hedden
Controlled vocabularies by other adjacent fields like NYTimes topics vocabulary
Conceptual: LC FAST, etc.
Develop an “aboutness” model to develop facets
Controlled set of tags by analyzing various data sources
Looking into infrastructure
Giant step: assigning tags!
Aboutness model
Roper Center expanding thinking of aboutness beyond “topics”
Challenges
Content is controversial. Pollsters ask us things that divide us – sensitive to our own biases
When to sacrifice theoretical purity for implementation? E.g. “etc.” facet
Moving forward
Tag-at-thon – give students controlled tags and pizza; assess the results for accuracy, consistency
Peer review – call on fields from social sciences, archives, etc.
Formal user testing and analysis (once site comes togeether)
Priority to “be ready” for linked data; have erelationships standardized and manageable
"What do Users Want: Enhancing Metadata Using Google Analytics" by Jackie Couture, Eastern Kentucky University Special Collections and Archives
Just got off the plane from Kentucky! She made it!!
Context
Medium-sized regional university; 16,000 students and 3 FTE in archives
1,000 accessions and 300 processed collections
4,000 cu ft of materials
2012 – all finding aids into Archon; moved everything from flat data to structured data
Google Analytics added for the first time
We can see what people are searching for and finding in our collections!
Archon also generates pages every time people make a search
Take the spreadsheets from Gooogle Analytics and analyze the data
Frustration with Google Analytics: changes every day!
Findings - Personal names, organizational names, place names
Took one of the most popular search terms and started analyzing the matches when doing searches in the database; example was users using an acronym and not finding subject headings
Tried to make the records more findable, regardless of the shorthand terms used
Added variant names based on web statistics findings; eventually would like to do full EAC-CPF records for theses
Findings – subjects
WWI is called: WWI, WW1, World War I, World War One, etc….
Not sure about the solution for this
Hoping there will be more variation and nuance in the terms for future versions of Archon (ArchiveSpace)
Next:
Add more organization acronyms
Add more name variants, creator records
Add topical subject headings
"Encoded Archival Context – Challenges, Possibilities, and Future (EAC-CPF)" by Iris Lee, Project Metadata Analyst, American Museum of Natural History Library & Nick Krabbenhoeft, Project Data Specialist, American Museum of Natural History Library
CLIR Hidden Collections project – awarded 2012
3-year grant to described Expeditionary Field Work for archival docs in the science dept
Finding aids
Entity records
Challenge #1:
Traditionally sent people throughout the field
Now, in the museum there are different divisions administered independently
Sometimes people throughout the museum are unaware of related collections
Inventory a few years back showed lots of connections that could happen virtually
Possibility #1: EAC-CPF
Descriptive standard for the creators of archival material
A great way to reconnect collections and increase resouorce discovery
Plus we could re-use the bioghist notes across finding aids!
Vision: 5 finding aids all linking to one EAC record
User could go to vertebrate zoology finding aid related to an EAC record on the leader of the expedition
Then if they wanted to find related material, by looking at the EAC record they might find more
Another way EAC leads to resource discovery
<relation> and <date> fields of an EAC record can continue to build more context for collections and specific items
Looked at best practices of other people using EAC
Challenge 2: incorporate linked data
Solution #2: adding links
Define controlled vocabularies
Using controlled headings: Getty, LC, VIAF
Challenge #3: create XML
EAC easily in XML, but lots of data
After a lot of assessment of best system (oXgen, etc.), ended up using Excel
Macro: finds a parent node, child node, finds all data – it figured it all out
Then goes row-by-row and codes all data wwith tags
200+ files now, 2,000+ by the end
Challenge #4:
Lots and lots of files on their server
Single EAC files linked together, ,plus there will be new ones coming along; how to manage?
Ideally want to manage them all in a database
Future goals – not necessarily in project timeline
Use xEAC (pronounced “zeke”) – editor and system they’re testing right now
Connect EAC to the library – DSpace, etc.
Connect EAC to the World – public access points through website, authority cooperatives
Q&A
Any info or instructional docs on how to use Google Analytics for archives, in way described here?
Not really - hard because it’s always changing, and not a lot out there in the wya of resources
Great idea for an MDOR webinar!
Management Presentations 6:30-7:15
"The Blue Devil is in the Details: Digital Collections Workflows at Duke University Libraries" by Molly Bragg, Digital Collections Program Manager, Duke University Libraries
High level details of the digital collections workflow
“Digital collections” are the stuff they scan and put online
Digitization at Duke “cooking with gas” in 2005 when center was established
2007 built their own interface called “tripod”: amazing collections on their site – you won’t be let down!
Cast of characters
Most projects run out of the production services department, part of IT
2 developers and other staff
Do all theh user interfaces for all aspects of the library website, finding aids, anything else “digital”
Not just digital collections focused
Stakeholders:
Rare Book and manuscript library, conservation, etc
Project management Chapterr 1: inspiration strikes
Anyone in the library can propose a collection
Go over the project with digital production services
Draft a proposal
Chapter 2:
Vet the proposal
Advisory council vets and approves, or returns it with corrections, or it’s not the right time
A project champion later becomes an advisor and consultant
Chapter 3: implementation
A team of people; but only one full-time
Basecamp used
Consult with project champion
(Before digitization the collection is processed and described)
Derivatives and preservation copies created by the program and sent to their respective servers
Chapter 4: Promotion! Yay!
but that’s not the end
Chapter 5: the aftermath
“post-mortem” of sorts
Time tracking
Project summary report – mainly for internal purposes, don’t make the same mistakes next time
Does it work? Mostly
Proposal process can be complicated
Sharing resources between departments can be hard; conflicting priorities
Requires persistent management
Legacy projects
Benefits
Lots of stakeholders have a lot of energy and resources invested, and they get a voice
An opportunity for anyone in the community to bring out an idea
Successful outcomes
Promotion is great!
Next up
Integrating with a digital repository and things will change
Streamlining publication process
Making the case to add more resources
Helping the legacy projects
"Versioning in Digital Archives: A Workflow" by Laura Alagna, Digital Accessions Specialist, University of Chicago Library
Briefly discuss the complications that arise from versioning of digital objects
“LDR”: dark archive, permanent, long-terrm; 20 TB
Issues from versioning:
How do you replace digital objects
What happens to obsolete version?
How do you communicate changes to the version
Example: The Chicagoan
(This sounds like an awesome magazine! Check it out!)
They have the most complete run, but not all the issues and some have preservation issues (e.g. portions torn or missing)
NYPL had some more complete versions, but they had already scanned the old ones
Came up with a workflow:
Quarantine the obsolete versions! Create a mirrored directory
Created a plain-text including all info about the versioning event
Example: showed all the background info about old Chicagoan issue and where the new one was coming from
Copy the record and move it to the mirror director
Cope the new version files to the directory
Update other associated metadata
So, they save the old versions along with the plain text doc
The repository points to the “old” directory to alert users to the change
"An Institution-Wide Approach to Digital Preservation" by Edward M. Corrado, Director of Library Technology, Binghamton University Libraries & Rachel Jaffe, Binghamton University Libraries
Unlike previous presentations, at Binghamton, use the same repository for preservation and access
Rosetta – a preservation program from exlibris
Previously used CONTENTdm for collections and a partner with ContentPro
Used these as “learning experience”
Decided they needed a full-blown preservation program and to “get serious” about their workflows
Needed to collaborate more across the libraries and the university
In CoNTENTdm land, didn’t involve anyone else outside of the library, especially special collections and university archives
Once they moved to new system, this had to change
Challenges:
Setting priorities for metadata
Determining who’s responsible for each task
Keeping in mind long-term sustainability
First step is deciding what to digitally preserve
Reviewed by subject specialists and department heads to determine if it meets collection development criteria
If determined to be appropriate: IP rights
Copyright, access, use
Sensitive materials
Access to objects and metadata for some collections must be tightly controlled and/or restricted
Enough value to libraries to be worth the effort?
Technical requirements at this stage: size of files, formats, types of objects, etc.
Step 3: decision making
Proposal goes forward to the administrators
Along with the dean, will make the decision to move the project forward
Staff and tasks discussed
Step 4: MOU for all parties
Work with folks to create one and get it signed
Typically outlines details like licenses, rights, access policies, responsibilities, funding, other expectations
Step 5: digitization
Also look into grants at this time
Step 6: metadata
Metadata librarian works with other stakeholders
Does other metadata exist already? What should go in?
Which of the library’s Dublin core fields should be used?
In many projects, metadata librarian does not actually create metadata but gives the stakeholders the skills they need to do so.
Step 7: loading SIPs into test instance
Step 8: check data in the test instance
System is great at digital preservation, but not as great for DAMS
Can be hard to get stuff out because of this
Step 9: check display
Web services librarian checks the front-end, identified issues
Step 10: Final actions
Load metadata and digital objects into production server
Another test, but hopefully no changes needed!
Digital preservation is never complete; can’t just put it there and forget about it.
Must continually check for file fixity, obsolescence, usefulness, etc.
Maybe you want to weed a digital collection
2 takeaways:
Digital preservation is institution-wide
Digital preservation is ongoing
Q&A
Molly: who is the project champion?
Usually from special collections
Great if they can provide some metadata
Molly: Elaborate on CONTENTdm for creating metadata?
Previously work wasn’t scalable to divide work among folks or go back to metadata to update it, nor an OCR display
CONTENTdm allows them to do these things, they have an internal place to see this and do all the internal work they need, but then they spit out the metadata and put that in their display system
Kinda funky but has worked well
Molly, Rachel, Edward: resistance from someone who doesn’t respect processes, committees, etc.?
When Binghamton had CONTENTdm, special collections ran it by themselves and could do lots of specific things
Have been some issues standardizing things and unrolling it more broadly, but seem to have smoothed things out and the benefits are more apparent
Can’t see it as “Edward’s system”; needs to be seen as the library’s system
At Duke: when Molly came on board there was a project management tool she didn’t like much; with this and other things she tried to show the benefits of other options. E.g. “too much email,” “too many meetings,” etc.
Program management for digitization management: is there a frame for that at your institutions?
Laura: importance of goals for a particular project, but also the broader program
Edward: really successful with external projects for the long-term goal. Also for born-digital objects where there may be some more pushback for long-term preservation of digitized objects (since we’re not tossing the physical materials after digitization)
Example of a faculty member that lost a letter and had the digital copy as a backup
Projects that are successful haven’t just followed the library’s strategic plan, but also the university’s strategic plan
Repetition also helps, and documenting the process and making people consciously aware of the different phases of a process
Identifying the different types of projects that come through the door and saying “you’re coming through the preservation program now” is also helpful
Molly had mentioned a digitization guide and audience wanted to know more
This is created by digitization specialists. Sometimes starts with itemized spreadsheet by special collections
It’s an item-by-item list of what was digitized, what format, who digitized it, when, etc.; plus columns for QC
This guide is saved as part of the record-keeping
They put the guide in the file system in a folder for each project; eventually that kind of thing would go in the repository
Why maintain the quarantine/obsolete items, Laura?
Suspects that if they were taking up a lot of space, they would revisit that policy
So far has required only 4-5 events and resources are not large enough to cause issues
When developing the workflow, decision that for the long-run a good idea to keep them just in case
Instances (outside of the Chicagoan example) where there may be a need to refer back to the obsolete version