2014 MDOR Annual Meeting Minutes

Access a copy of these annual meeting notes on Google Drive


SAA Metadata and Digital Object Roundtable Annual Meeting

Wednesday, August 13, 5:15pm - 7:15pm

2014 COSA-NAGARA-SAA Joint Annual Meeting

 

Welcome! This document will be updated in real-time with notes from the 2014 MDOR meeting. Feel free to contribute content or ask questions. We’ll have dedicated MDOR Steering Committee Members adding content and reviewing the document for questions during and after the meeting. You can also post questions to the MDOR listserv at metadata@forums.archivists.org


For now, we’ve included a brief agenda for the meeting below. More details, including presentation abstracts, can be found at http://www2.archivists.org/groups/metadata-and-digital-object-roundtable/2014-annual-meeting-agenda.  



Business meeting 5:15-5:45

 

SAA 2015 Program Committee report (Kim Sims)

No theme

Sections and roundtables will not be endorsing session proposals

All sessions will be 60-75 minutes (no more 90 min sessions)


Council liaison comments (Helen Wong Smith)

Guidelines for A&A listserv participation will be issued by SAA (any problems should be referred to listserv admin)

COPA: asking each component group to share your advocacy activities

Note on reception: Library of Congress has opened up several floors – go up to explore!

OCLC Research update (Jackie Dooley)

Faceted application of Subject Terminology – “FAST”

Simplified version of LCSH that is linked-data friendly

Took all the terms, pulled them all apart, and update database regularly

Faceted indexing instead of the more complicated strings of terms

Simplified interface

ArchiveGrid database

Used to be a subscription product, now free; OCLC Research actively working on it

Bruce Washburn will be in the exhibits hall

Election results (Sarah Dorpinghaus)

Thank you to members ending their terms:

July Seifert (intern)

Jacqie Ferry (committee member)

Cristela Garcia-Spitz (committee member)

Jody DeRidder (co-chair)

Welcome, new members!

Jasmine Burns (intern)

Rebecca Goldman (committee member)

Arcadia Falcone (committee member)

Kari Smith (co-chair)

Review of bylaws change (Sarah Dorpinghaus)

Sarah announced that MDOR membership approved three changes to by-laws and that the by-laws will be sent to Council for final approval.


Co-Chairs may be nominated from the general membership or the steering committee.

Co-Chairs and Steering Committee members are to be elected annually by the membership in an electronic election.

Leadership roles shall have one-year (renewable) terms, as decided by a majority vote of the Steering Committee.

Review of goals for 2014-2015 year (Sarah Dorpinghaus)

MDOR steering committee spent a significant amount of time on social media this past year

Steering committee is interested in exploring other communication channels, possibly a newsletter

Considering free MDOR-sponsored webinars led by MDOR members

New on the website: A Google calendar showcasing upcoming events of interest throughout the community

Metadata presentations 5:50-6:30

"What are We Thinking? Using Faceted Classification and Tagging to Enhance Subject Access to the Public Mind" by Elise Dunham, Metadata Production Specialist, Roper Center for Public Opinion Research, University of Connecticut

Overview of the Roper center: data sets and polls from thinktanks, media outlets, etc.

Homegrown systems for backup and management

Roper Express: primary goal to require the data sets from public opinion surveys

ASKI downloads

iPOLL database:

Question text, response categories, and marginal – from secondary sources, after press releases hit the web about a poll

Now they are publishing their own secondary research material “The Power of iPOLL”

All of this content in the three systems historically managed/cataloged by separate teams with different descriptive practices

Free keywording

Static lists of topics with terms strung together

Clear connections between content but lack of descriptive consistency

Goals:

Develop system for concept-based classification of manageable content

Implement workflows for identifying links between content at the point of acquisition/creation

Benefits of classification and tagging

Flexible, agile – be responsive to events

Indexer AND end-user friendly: student workers are familiar with tagging

Post-coordinated systems are quicker – but it is “quick” relative to other things

Iterative project

Had no formal taxonomy training previously so had to read up

“The Accidental Taxonomist” by Heather Hedden

Controlled vocabularies by other adjacent fields like NYTimes topics vocabulary

Conceptual: LC FAST, etc.

Develop an “aboutness” model to develop facets

Controlled set of tags by analyzing various data sources

Looking into infrastructure

Giant step: assigning tags!

Aboutness model

Roper Center expanding thinking of aboutness beyond “topics”

Challenges

Content is controversial. Pollsters ask us things that divide us – sensitive to our own biases

When to sacrifice theoretical purity for implementation? E.g. “etc.” facet

Moving forward

Tag-at-thon – give students controlled tags and pizza; assess the results for accuracy, consistency

Peer review – call on fields from social sciences, archives, etc.

Formal user testing and analysis (once site comes togeether)

Priority to “be ready” for linked data; have erelationships standardized and manageable

"What do Users Want: Enhancing Metadata Using Google Analytics" by Jackie Couture, Eastern Kentucky University Special Collections and Archives

Just got off the plane from Kentucky! She made it!!

Context

Medium-sized regional university; 16,000 students and 3 FTE in archives

1,000 accessions and 300 processed collections

4,000 cu ft of materials

2012 – all finding aids into Archon; moved everything from flat data to structured data

Google Analytics added for the first time

 We can see what people are searching for and finding in our collections!

Archon also generates pages every time people make a search

Take the spreadsheets from Gooogle Analytics and analyze the data

Frustration with Google Analytics: changes every day!

Findings - Personal names, organizational names, place names

Took one of the most popular search terms and started analyzing the matches when doing searches in the database; example was users using an acronym and not finding subject headings

Tried to make the records more findable, regardless of the shorthand terms used

Added variant names based on web statistics findings; eventually would like to do full EAC-CPF records for theses

Findings – subjects

WWI is called: WWI, WW1, World War I, World War One, etc….

Not sure about the solution for this

Hoping there will be more variation and nuance in the terms for future versions of Archon (ArchiveSpace)

Next:

Add more organization acronyms

Add more name variants, creator records

Add topical subject headings

"Encoded Archival Context – Challenges, Possibilities, and Future (EAC-CPF)" by Iris Lee, Project Metadata Analyst, American Museum of Natural History Library & Nick Krabbenhoeft, Project Data Specialist, American Museum of Natural History Library

CLIR Hidden Collections project – awarded 2012

3-year grant to described Expeditionary Field Work for archival docs in the science dept

Finding aids

Entity records

Challenge #1:

Traditionally sent people throughout the field

Now, in the museum there are different divisions administered independently

Sometimes people throughout the museum are unaware of related collections

Inventory a few years back showed lots of connections that could happen virtually

Possibility #1: EAC-CPF

Descriptive standard for the creators of archival material

A great way to reconnect collections and increase resouorce discovery

Plus we could re-use the bioghist notes across finding aids!

Vision: 5 finding aids all linking to one EAC record

User could go to vertebrate zoology finding aid related to an EAC record on the leader of the expedition

Then if they wanted to find related material, by looking at the EAC record they might find more

 Another way EAC leads to resource discovery

<relation> and <date> fields of an EAC record can continue to build more context for collections and specific items

Looked at best practices of other people using EAC

Challenge 2: incorporate linked data

Solution #2: adding links

Define controlled vocabularies

Using controlled headings: Getty, LC, VIAF

Challenge #3: create XML

EAC easily in XML, but lots of data

After a lot of assessment of best system (oXgen, etc.), ended up using Excel

 Macro: finds a parent node, child node, finds all data – it figured it all out

Then goes row-by-row and codes all data wwith tags

200+ files now, 2,000+ by the end

Challenge #4:

Lots and lots of files on their server

Single EAC files linked together, ,plus there will be new ones coming along; how to manage?

Ideally want to manage them all in a database

Future goals – not necessarily in project timeline

Use xEAC (pronounced “zeke”) – editor and system they’re testing right now

Connect EAC to the library – DSpace, etc.

Connect EAC to the World – public access points through website, authority cooperatives

Q&A

Any info or instructional docs on how to use Google Analytics for archives, in way described here?

Not really - hard because it’s always changing, and not a lot out there in the wya of resources

Great idea for an MDOR webinar!

Management Presentations 6:30-7:15

"The Blue Devil is in the Details: Digital Collections Workflows at Duke University Libraries" by Molly Bragg, Digital Collections Program Manager, Duke University Libraries

High level details of the digital collections workflow

“Digital collections” are the stuff they scan and put online

Digitization at Duke “cooking with gas” in 2005 when center was established

2007 built their own interface called “tripod”: amazing collections on their site – you won’t be let down!

Cast of characters

Most projects run out of the production services department, part of IT

2 developers and other staff

Do all theh user interfaces for all aspects of the library website, finding aids, anything else “digital”

Not just digital collections focused

Stakeholders:

Rare Book and manuscript library, conservation, etc

Project management Chapterr 1: inspiration strikes

Anyone in the library can propose a collection

Go over the project with digital production services

Draft a proposal

Chapter 2:

Vet the proposal

Advisory council vets and approves, or returns it with corrections, or it’s not the right time

A project champion later becomes an advisor and consultant

Chapter 3: implementation

A team of people; but only one full-time

Basecamp used

Consult with project champion

(Before digitization the collection is processed and described)

Derivatives and preservation copies created by the program and sent to their respective servers

Chapter 4: Promotion! Yay!

but that’s not the end

Chapter 5: the aftermath

“post-mortem” of sorts

Time tracking

Project summary report – mainly for internal purposes, don’t make the same mistakes next time

Does it work? Mostly

Proposal process can be complicated

Sharing resources between departments can be hard; conflicting priorities

Requires persistent management

Legacy projects

Benefits

Lots of stakeholders have a lot of energy and resources invested, and they get a voice

An opportunity for anyone in the community to bring out an idea

Successful outcomes

Promotion is great!

Next up

Integrating with a digital repository and things will change

Streamlining publication process

Making the case to add more resources

Helping the legacy projects

"Versioning in Digital Archives: A Workflow" by Laura Alagna, Digital Accessions Specialist, University of Chicago Library

Briefly discuss the complications that arise from versioning of digital objects

“LDR”: dark archive, permanent, long-terrm; 20 TB

Issues from versioning:

How do you replace digital objects

What happens to obsolete version?

How do you communicate changes to the version

Example: The Chicagoan

(This sounds like an awesome magazine! Check it out!)

They have the most complete run, but not all the issues and some have preservation issues (e.g. portions torn or missing)

NYPL had some more complete versions, but they had already scanned the old ones

Came up with a workflow:

Quarantine the obsolete versions! Create a mirrored directory

Created a plain-text including all info about the versioning event

Example: showed all the background info about old Chicagoan issue and where the new one was coming from

Copy the record and move it to the mirror director

Cope the new version files to the directory

Update other associated metadata

So, they save the old versions along with the plain text doc

The repository points to the “old” directory to alert users to the change

"An Institution-Wide Approach to Digital Preservation" by Edward M. Corrado, Director of Library Technology, Binghamton University Libraries & Rachel Jaffe, Binghamton University Libraries

Unlike previous presentations, at Binghamton, use the same repository for preservation and access

Rosetta – a preservation program from exlibris

Previously used CONTENTdm for collections and a partner with ContentPro

Used these as “learning experience”

Decided they needed a full-blown preservation program and to “get serious” about their workflows

Needed to collaborate more across the libraries and the university

In CoNTENTdm land, didn’t involve anyone else outside of the library, especially special collections and university archives

Once they moved to new system, this had to change

Challenges:

Setting priorities for metadata

Determining who’s responsible for each task

Keeping in mind long-term sustainability

First step is deciding what to digitally preserve

Reviewed by subject specialists and department heads to determine if it meets collection development criteria

If determined to be appropriate: IP rights

Copyright, access, use

Sensitive materials

Access to objects and metadata for some collections must be tightly controlled and/or restricted

Enough value to libraries to be worth the effort?

Technical requirements at this stage: size of files, formats, types of objects, etc.

Step 3: decision making

Proposal goes forward to the administrators

Along with the dean, will make the decision to move the project forward

Staff and tasks discussed

Step 4: MOU for all parties

Work with folks to create one and get it signed

Typically outlines details like licenses, rights, access policies, responsibilities, funding, other expectations

Step 5: digitization

Also look into grants at this time

Step 6: metadata

Metadata librarian works with other stakeholders

Does other metadata exist already? What should go in?

Which of the library’s Dublin core fields should be used?

In many projects, metadata librarian does not actually create metadata but gives the stakeholders the skills they need to do so.

Step 7: loading SIPs into test instance

Step 8: check data in the test instance

System is great at digital preservation, but not as great for DAMS

Can be hard to get stuff out because of this

Step 9: check display

Web services librarian checks the front-end, identified issues

Step 10: Final actions

Load metadata and digital objects into production server

Another test, but hopefully no changes needed!

Digital preservation is never complete; can’t just put it there and forget about it.

Must continually check for file fixity, obsolescence, usefulness, etc.

Maybe you want to weed a digital collection

2 takeaways:

Digital preservation is institution-wide

Digital preservation is ongoing

Q&A

Molly: who is the project champion?

Usually from special collections

Great if they can provide some metadata

Molly: Elaborate on CONTENTdm for creating metadata?

Previously work wasn’t scalable to divide work among folks or go back to metadata to update it, nor an OCR display

CONTENTdm allows them to do these things, they have an internal place to see this and do all the internal work they need, but then they spit out the metadata and put that in their display system

Kinda funky but has worked well

Molly, Rachel, Edward: resistance from someone who doesn’t respect processes, committees, etc.?

When Binghamton had CONTENTdm, special collections ran it by themselves and could do lots of specific things

Have been some issues standardizing things and unrolling it more broadly, but seem to have smoothed things out and the benefits are more apparent

Can’t see it as “Edward’s system”; needs to be seen as the library’s system

At Duke: when Molly came on board there was a project management tool she didn’t like much; with this and other things she tried to show the benefits of other options. E.g. “too much email,” “too many meetings,” etc.

Program management for digitization management: is there a frame for that at your institutions?

Laura: importance of goals for a particular project, but also the broader program

Edward: really successful with external projects for the long-term goal. Also for born-digital objects where there may be some more pushback for long-term preservation of digitized objects (since we’re not tossing the physical materials after digitization)

Example of a faculty member that lost a letter and had the digital copy as a backup

Projects that are successful haven’t just followed the library’s strategic plan, but also the university’s strategic plan

Repetition also helps, and documenting the process and making people consciously aware of the different phases of a process

Identifying the different types of projects that come through the door and saying “you’re coming through the preservation program now” is also helpful

 Molly had mentioned a digitization guide and audience wanted to know more

This is created by digitization specialists. Sometimes starts with itemized spreadsheet by special collections

It’s an item-by-item list of what was digitized, what format, who digitized it, when, etc.; plus columns for QC

This guide is saved as part of the record-keeping

They put the guide in the file system in a folder for each project; eventually that kind of thing would go in the repository

Why maintain the quarantine/obsolete items, Laura?

Suspects that if they were taking up a lot of space, they would revisit that policy

So far has required only 4-5 events and resources are not large enough to cause issues

When developing the workflow, decision that for the long-run a good idea to keep them just in case

Instances (outside of the Chicagoan example) where there may be a need to refer back to the obsolete version