Digital preservation – eclectic librarian

Plenary Sessions of the Charleston Conference at the Gaillard Center (Charleston, South Carolina) - November 3, 2016 — Anja Smit at Charleston Conference

Speaker: Anja Smit, Utrecht University

Ancient scholars would not recognize our modern libraries. There are new services (via the internet) that replace some of the services of library, and we need to continual re-evaluate what value we are adding.

For example, we are putting a lot of effort into locally managed discovery services, and yet a majority of sources referring users to content are Google and Google Scholar. For some disciplines, the library plays a very small role in discovery of content, so the Dutch have focused on providing access to content over discovery.

But, what if OA becomes the publication model of the future? What if Google does digitize all the books? What if users organize access themselves?

The Dutch consortia is flipping some pricing models. In two of the licenses they currently hold, they are paying for the cost of publication rather than the rights for access, and they are making the Dutch scholarly work OA globally. However, they have found perpetual access, or preservation, has not been an easy thing to negotiate or prioritize.

Librarians have been trying to find a solution for long-term preservation since the dawn of digital publication. There are some promising initiatives.

France has built a repository that includes access (not just a dark archive). How do we scale this kind of thing globally? Funding is local. We will never have a global system, so we need local systems based on a standard that will connect them.

Libraries do not own the digital content. We can collect it, but we tend to collect what our community needs rather than the output of our researchers.

Libraries can put things on the agenda of other stakeholders. OA and Open Science is on the agenda of politicians and governments because of libraries.

To-do:

Make perpetual access to knowledge the top priority on our agenda.
Get perpetual access to knowledge on the agenda of relevant stakeholders as quickly as possible. Collectively.
Find partners to develop longer term preservation infrastructure.

We can leave the rest to Google.

Q&A

Q: Dutch presidency of EU and Dutch proposals for OA – what do you think of the Dutch policies in this area?
A: We are all trying to find solutions to further and advance access to knowledge. That is our common goal. This is such a complicated issue — all the stakeholders have to work together to do this.

Q: Libraries have not done as well a job of preserving media. Not as concerned about the availability of scholarly journals and books in the future — what happens to the emails and other media forms that are getting lost?
A: Documented knowledge is at the core of libraries. The other areas have much bigger problems. That is such a huge area that she would not presume to have ideas or suggestions for solutions.

Q: Libraries are being pressured to collect and manage raw faculty research, without additional support, so it’s taking away from collecting in traditional areas.
A: Some say that this will become the new knowledge — data will trump publication. Libraries are best positioned to help researchers manage their data in a consultancy role, and let IT handle the storage of the data. We could spend a little less on collection development to do this.

Q: What will happen when Google is no longer freely accessible and there’s a cost?
A: It doesn’t help if we keep pointing people to local collections. Our users use Google, so we need to help them find what they are not able to find there themselves.

Have you seen the tool that Ithaka developed to determine what print scholarly journals you could withdraw (discard/store) that are already in your digital collections? It’s pretty nifty for a spreadsheet. About 10-15 minutes of playing with it and a list of our print holdings resulted in giving me a list of around 200 or so actionable titles in our collection, which I passed on to our subject liaison librarians.

The guys who designed it are giving some webinar sessions, and I just attended one. Here are my notes, for what it’s worth. I suggest you participate in a webinar if you’re interested in it. The next one is tomorrow and there’s one on February 10th as well.

Background

They have an organizational commitment to preservation: JSTOR, Portico, and Ithaka S+R
Libraries are under pressure to both decrease their print collections and to maintain some print copies for the library community as a whole
Individual libraries are often unable to identify materials that are sufficiently well-preserved elsewhere
The What to Withdraw framework is for general collections of scholarly journals, not monographs, rare books, newspapers, etc.
The report/framework is not meant to replace the local decision-making process

What to Withdraw Framework

Why do we need to preserve the print materials once we have a digital version?
- Fix errors in the digital versions
- Replace poor quality scans or formats
- Inadequate preservation of the digital content
- Unreliable access to the digital content
- Also, local politics or research needs might require access to or preservation of the print
Once they developed the rationales, they created specific preservation goals for each category of preservation and then determined the level of preservation needed for each goal.
- Importance of images in journals (the digitization standards for text is not the same as for images, particularly color images)
- Quality of the digitization process
- Ongoing quality assurance processes to fix errors
- Reliability of digital access (business model, terms & conditions)
- Digital preservation
Commissioned Candace Yano (operations researcher at UC Berkeley) to develop a model for copies needed to meet preservation goals, with the annual loss rate of 0.1% for a dark archive.
- As a result, they found they needed only two copies to have a >99% confidence than they will still have remaining copies left in twenty years.
- As a community, this means we need to be retaining at least two copies, if not more.

Decision-Support Tool (proof of concept)

JSTOR is an easy first step because many libraries have this resource and many own print copies of the titles in the collections and Harvard & UC already have dim/dark archives of JSTOR titles
The tool provides libraries information to identify titles held by Harvard & UC libraries which also have relatively few images

Future Plans

Would like to apply the tool to other digital collections and dark/dim archives, and they are looking for partners in this
Would also like to incorporate information from other JSTOR repositories (such as Orbis-Cascade)

Samantha Brennan on I’ve been published!November 30, 2020
What a fascinating sport. We'd love to have you back anytime! Welcome!
FY19 conferences, an update – eclectic librarian on FY19 conferencesJanuary 4, 2019
[…] was very excited to finally have approval to attend the Timberline Acquisitions Institute this year, but turns out […]
quantified self, an addendum – eclectic librarian on the quantified selfMarch 27, 2018
[…] I shared a list of apps and tools I’m using to monitor and track things, mainly health-related. Well, my…

Tag: Digital preservation

Charleston 2016: You Can’t Preserve What You Don’t Have – Or Can You? Libraries as Infrastructure for Perpetual Access to Intellectual Output

Ithaka’s What to Withdraw tool