NASIG 2015 – Building a Social Compact for Preserving E-Journals

locksscat
LOCKSS Cat

Speaker: Anne Kenney, University Librarian, Cornell University

30 years ago when NASIG began, we wouldn’t have been worrying about the preservation of ejournals. But we do. The digital-first ecology disrupts traditional roles and responsibilities, with publishers now being responsible for preserving the journal content rather than libraries. We’re still trying to figure out how to manage the preservation and access.

60% of Cornell’s collections expenditures goes to online resources. An Ithaka survey shows that most institution types spend more on online and ongoing resources than on any other collection format. The same survey found that Doctoral institutions are far more interested in preservation of online materials than Masters and Baccalaureate schools.

A study of library directors identified several issues for libraries:

  • sense of urgency
  • need for trusted independent archiving
  • content coverage and access conditions
  • resource commitment and competing priorities
  • need for collective response

There was a need for a non-profit preservation program separate from publisher projects, with a focus on scholarly journals. Portico, Scholar’s Portal, and CLOCKSS are the three main programs still existing that meet the needs of ejournal preservation. They are being supported to varying degrees by ARLs.

The coverage in these three programs is uneven and it’s difficult to create a definitive list. The major publishers are represented, and there is significant duplication across the services. She’s not losing sleep over the preservation of Elsevier journals, for example. STM literature in English is very well preserved.

The Keepers Registry attempts to gather and maintain digital content information from repositories archiving ejournals. KBART could be useful for keeping this data clean and updated.

2CUL did a study in 2011 to see how well their content was being preserved in LOCKSS and/or Portico, and only 13-16% of their titles were preserved. Most are those that have ISSNs or eISSNs, which is only about half of the titles held by the schools. They expanded to include Duke in 2012 and looked at all the preservation sources in the Keepers Registry. Only 23-27% of the ejournals with e/ISSNs were preserved, and there was considerable redundancy across the preservation programs.

Vulnerable content that is not being preserved well includes third-party content, aggregator content, small publishers, open access titles, and historical publications. They are looking to create some best practices for OA journal preservation.

The preservation programs need better coordination to identify what redundancy is necessary and how to incorporate more unique content. Right now, they see themselves more as competitors than collaborators, and that needs to change.

All of the scholarly record must be preserved, and it’s the responsibility of everyone in the scholarly communication world, not just libraries. Much of the content is at risk and no one can do this alone. No single archiving program will meet all needs, and they need more transparency and cooperation. License terms are inadequate for most preservation needs, and maybe we need legislation to cover legal deposits. We need clearer and broader triggers for when we can access the preserved content (there is a concern for the long-term financial sustainability of a dark archive).

Libraries need to acknowledge this is a shared responsibility, regardless of size and type of library. Publishers are not the enemies in this domain. Participate in at least one initiative. Move beyond a registry of archived publications to identify at-risk materials critical to scholarship.

Publishers need to enter into relationships with one or more ejournal archiving programs. Provide adequate information and data to archivers on coverage. Extend liberal archiving rights in license agreements, and consider new terms for access.

Archiving programs need to expand coverage to include vulnerable materials. Be explicit about coverage, and secure access rights.

NASIG can raise awareness of this issue. Endorse power of collective action. Consider a set of principles and actions, such as the KBART standard and revising SERU to include better terms for archiving. Foment international cooperation with other organizations and funding bodies.

NASIG 2015 – Somewhere To Run To, Nowhere To Hide

info free fridge
information wants to be free?

Speaker: Stephen Rhind-Tutt, President, Alexander Street Press

His perspective is primary source collections, mostly video and audio, created by a small company of 100 or so people.

There are billions and trillions of photos, videos, and audio files being added to the Internet every year, and it’s growing year over year. We’re going to need a bigger boat.

He reviewed past presentations at NASIG, and there are reoccurring nightmares of OA replacing publishers, Wikipedia replacing reference sources, vendors will bypass libraries and go direct to faculty, online learning will replace universities, etc.

All technologies evolve and die. Many worry about the future, many hold onto the past, and we’re not responding quickly enough to the user. Dispense with the things that are less relevant. Users don’t want to search, they want to find.

You can project the future, and not just by guessing. You don’t have to know how it’s going to happen, but you can look at what people want and project from that.

Even decades after the motor car was developed, we were still framing it within the context and limitations of the horse-drawn carriage. We’re doing that with our ebooks and ejournals today. If we look to the leaders in the consumer space, we can guess where the information industry is heading.

If we understand the medium, we can understand how best to use it. Louis Kahn says, “Honor the material you use.” The medium of electronic publications favors small pieces (articles, clips) and is infinitely pliable, which means it can be layered and made more complex. Everything is interconnected with links, and the links are more important than the destination. We are fighting against the medium when we put DRM on content, limit the simultaneous use, and hide the metadata.

“I don’t know how long it will take, but I truly believe information will become free.”

Video is a terrible medium for information if you want it fast — 30 min of video can be read in 5 minutes. ASP has noticed that the use of the text content is on par with the use of the associated video content.

Mobile is becoming very important.

Linking — needs to work going out and coming in. The metadata for linking must be made free so that it can be used broadly and lead users to the content.

The researcher wants every piece of information created on every topic for free. From where he is as a publisher, he’s seeing better content moving more and more to open access. And, as a result of that, ASP is developing an open music library that will point to both fee and free content, to make it shareable with other researchers.

In the near future, publishers will be able to make far more money developing the research process ecosystem than by selling one journal.

NASIG 2015 – Ain’t Nobody’s Business If I Do (Read Serials)

Speaker: Dorothea Salo, Faculty Associate, University of Wisconsin – Madison

Publishers and providers are collecting massive amounts of user data, and Salo is not happy about this. ALA code of ethics is not happy about this, either.

Why does privacy matter?

The gizmos that have ticked along for ages without being connected are now connected to the internet. It can be very handy, like smart thermostats, or a little too snoopy like the smart TV that listens in on your conversations. The FTC is scrutinizing the Internet of Things very closely, because it’s easy to cause some real harm with the data from these devices.

Thermostat data, for example, tells you a lot about when someone is at home or not, which can be useful for thieves, law enforcement, and marketers. And this is information that wasn’t available when the thermostat was offline.

Eresource use is being snooped on, too. Adobe is collecting reader behavior information from Adobe Digital Editions, even when it’s coming from library sources. They got caught because they were transmitting that information unencrypted, which they fixed, but they aren’t not doing it anymore.

Readers cannot trust content providers. Librarians cannot trust content providers. We have to assume you’re behaving like Adobe, until you prove otherwise. It’s easy, then, to lump eresources into the Internet of Things. Back in the day, journals and books weren’t online, but now they are ways to collect data on reader behavior.

Generally speaking, content providers have very little out there in a code of practice for reader privacy, including the relevant associations. Not even the open access publications and associations. Most journal privacy policies do not measure up to library standards, including those that are OA. 16 of the top 20 research journals let ad networks track readers.

There’s no conspiracy theory here. It’s mostly accidental. In the age of print, reader privacy wasn’t an issue. Readers could do whatever they wanted with the content. Content providers need to address this now that they are capable of collecting and using all sorts of data they couldn’t before.

NISO is working on a framework for this, and the NASIG community needs to be engaged.

The ALA code of ethics doesn’t say that you shouldn’t collect data when it’s convenient — there are no exceptions. Same goes for “improving services”.

The question, “Would we do this in a physical space with people around us?” is a useful gague of the creep factor. Physical library users and digital library users should have the same privacy rights.

It’s easy to feel helpless in this. It’s easy to give up and think no user cares about their privacy. Just because it’s easy and convenient to ignore privacy, that doesn’t make it right.

Libraries and content providers need to live up to Article III of the ALA Code of Ethics: “…protect each reader’s right to privacy and confidentiality with respect to information sought or received and resources consulted, borrowed, acquired or transmitted.”

How do we do this? Understand the risks and mitigate them. Risks: personally identifying information (sometimes this is used as a smoke screen to hide what is being collected when this is not), long tail information (uncommon enough to identify individuals, even without PII), and behavior trails (highly specific time stamps, etc.). Libraries deal with this by tracking the stuff instead of the people. Libraries keep proxy server logs only long enough to identify use that violates TOS.

Determine who wants to know and why: data omnivores (NSA, Google, Facebook), data opportunists (academic researchers, usability wonks, assessment experts, readers who want to reuse their own data), and data paparazzi (doxxers, stalkers, politicians). Worry less about the opportunists and omnivores, worry a lot about the paparazzi.

What should we do or not do? No ostriching — heads out of the sand, please. The Library Freedom Project has lots of resources. Industry-level advocacy is needed — those who take the high road on privacy is afraid of being out-competed by those who don’t.

We’re not helpless. Don’t give up. License negotiation time is when we can ask the hard questions — use our Benjamins wisely. Assess mindfully, being aware of data leakage and compromised privacy.

Not even the greediest data omnivore, the most clueless data opportunist, or the most evil data paparazzi can abuse data that isn’t there. Don’t collect reader data unless there is a clear and reasonable reason to do it.

SSP/NASIG – What Do All of these Changes Mean for Vendors?

Data storage - old and new
data sharing

Speaker: Caitlin Trasande, Head of Research Policy, Digital Science

Social impact is the emerging bacon.

Digital Science supports and funds startups that build software for research. The scope is the full life cycle of research, ranging from reading literature to planning and conducting experiments to publishing and sharing the data. The disgrunterati are those who decided to be the last to complain about broken processes and build better products and models.

[insert overview of several projects funded by Digital Science]

Information may want to be free, but it needs to be accessible and understandable.

SSP/NASIG – Data Wranglers in LibraryLand—Finding Opportunities in the Changing Policy Landscape

All You Can Eat Bacon!
all you can eat…data?

Speaker: T. Scott Plutchak, Director of Digital Data Curation Strategies, The University of Alabama at Birmingham

Data is the new bacon. Data is the hot buzzword in scholarly publishing. He is working on the infrastructure, services, and policies needed to manage data on an institutional level.

Concern about data has been around for a long time. NIH developed their first policy in 2003, but it was pretty weak. Things got serious when the public access policy became mandatory in 2009. NSF developed a data management policy in 2011, which got a little more attention.

A scholarly publishing roundtable was created in 2009, reporting in 2010, made up of university administrators, librarians, publishers, and researchers. They recommended flexible policies for each agency, developed in collaboration with their consitutencies.

Libraries should be thinking about how and where and what kinds of data they should store and manage.

My small liberal arts university probably will have to do some things with this, but not to the extent he’s talking about. This is an R1 library problem, not a library problem at large. Yet.