CIL 2010: Conversations with the Archivist of the United States

Speakers: “Collector in Chief” David Ferriero interviewed by Paul Holdengräber

Many people don’t know what the archivist does. They often think that the National Archives are a part of the Library of Congress. In fact, the agency is separate.

Ferriero is the highest ranking librarian in the administration. It’s usually a historian or someone with connections to the administration. He was surprised to get the appointment, and had been expecting to head the IMLS instead.

He is working to create a community around the records and how they are being used. His blog talks about creating citizen archivists. In addition, he is working to declassify 100 million documents a year. There is an enormous backlog of these documents going back to WWII. Each record must be reviewed by the agency who initially classified them, and there are 2400 classification guides that are supposed to be reviewed every five years, but around 50% of them have not.

You can’t have an open government if you don’t have good records. When records are created, they need to be ready to migrate formats as needed. There will be a meeting between the chief information officers and the record managers to talk about how to tackle this problem. These two groups have historically not communicated very well.

He’s also working to open up the archives to groups that we don’t often think of being archive users. There will be programs for grade school groups, and more than just tours.

Large digitization projects with commercial entities lock up content for periods of time, including national archives. He recognizes the value that commercial entities bring to the content, but he’s concerned about the access limitations. This may be a factor in what is decided when the contract with is up.

“It’s nice having a boss down the street, but not, you know, in my face.” (on having not yet met President Obama)

Ferriero thinks we need to save smarter and preserve more digital content.

ER&L 2010: We’ve Got Issues! Discovering the right tool for the job

Speaker: Erin Thomas

The speaker is from a digital repository, so the workflow and needs may be different than your situation. Their collections are very old and spread out among several libraries, but are still highly relevant to current research. They have around 15 people who are involved in the process of maintaining the digital collection, and email got to be too inefficient to handle all of the problems.

The member libraries created the repository because they have content than needed to be shared. They started with the physical collections, and broke up the work of scanning among the holding libraries, attempting to eliminate duplications. Even so, they had some duplication, so they run de-duplication algorithms that check the citations. The Internet Archive is actually responsible for doing the scanning, once the library has determined if the quality of the original document is appropriate.

The low-cost model they are using does not produce preservation-level scans; they’re focusing on access. The user interface for a digital collection can be more difficult to browse than the physical collection, so libraries have to do more and different kinds of training and support.

This is great, but it caused more workflow problems than they expected. So, they looked at issue tracking problems. Their development staff already have access to Gemini, so they went with that.

The issues they receive can be assigned types and specific components for each problem. Some types already existed, and they were able to add more. The components were entirely customized. Tasks are tracked from beginning to end, and they can add notes, have multiple user responses, and look back at the history of related issues.

But, they needed a more flexible system that allowed them to drill-down to sub-issues, email v. no email, and a better user interface. There were many other options out there, so they did a needs assessment and an environmental scan. They developed a survey to ask the users (library staff) what they wanted, and hosted demos of options. And, in the end, Gemini was the best system available for what they needed.

Ithaka’s What to Withdraw tool

Have you seen the tool that Ithaka developed to determine what print scholarly journals you could withdraw (discard/store) that are already in your digital collections? It’s pretty nifty for a spreadsheet. About 10-15 minutes of playing with it and a list of our print holdings resulted in giving me a list of around 200 or so actionable titles in our collection, which I passed on to our subject liaison librarians.

The guys who designed it are giving some webinar sessions, and I just attended one. Here are my notes, for what it’s worth. I suggest you participate in a webinar if you’re interested in it. The next one is tomorrow and there’s one on February 10th as well.


  • They have an organizational commitment to preservation: JSTOR, Portico, and Ithaka S+R
  • Libraries are under pressure to both decrease their print collections and to maintain some print copies for the library community as a whole
  • Individual libraries are often unable to identify materials that are sufficiently well-preserved elsewhere
  • The What to Withdraw framework is for general collections of scholarly journals, not monographs, rare books, newspapers, etc.
  • The report/framework is not meant to replace the local decision-making process

What to Withdraw Framework

  • Why do we need to preserve the print materials once we have a digital version?
    • Fix errors in the digital versions
    • Replace poor quality scans or formats
    • Inadequate preservation of the digital content
    • Unreliable access to the digital content
    • Also, local politics or research needs might require access to or preservation of the print
  • Once they developed the rationales, they created specific preservation goals for each category of preservation and then determined the level of preservation needed for each goal.
    • Importance of images in journals (the digitization standards for text is not the same as for images, particularly color images)
    • Quality of the digitization process
    • Ongoing quality assurance processes to fix errors
    • Reliability of digital access (business model, terms & conditions)
    • Digital preservation
  • Commissioned Candace Yano (operations researcher at UC Berkeley) to develop a model for copies needed to meet preservation goals, with the annual loss rate of 0.1% for a dark archive.
    • As a result, they found they needed only two copies to have a >99% confidence than they will still have remaining copies left in twenty years.
    • As a community, this means we need to be retaining at least two copies, if not more.

Decision-Support Tool (proof of concept)

  • JSTOR is an easy first step because many libraries have this resource and many own print copies of the titles in the collections and Harvard & UC already have dim/dark archives of JSTOR titles
  • The tool provides libraries information to identify titles held by Harvard & UC libraries which also have relatively few images

Future Plans

  • Would like to apply the tool to other digital collections and dark/dim archives, and they are looking for partners in this
  • Would also like to incorporate information from other JSTOR repositories (such as Orbis-Cascade)