NASIG 2015 – Ain’t Nobody’s Business If I Do (Read Serials)

Speaker: Dorothea Salo, Faculty Associate, University of Wisconsin – Madison

Publishers and providers are collecting massive amounts of user data, and Salo is not happy about this. ALA code of ethics is not happy about this, either.

Why does privacy matter?

The gizmos that have ticked along for ages without being connected are now connected to the internet. It can be very handy, like smart thermostats, or a little too snoopy like the smart TV that listens in on your conversations. The FTC is scrutinizing the Internet of Things very closely, because it’s easy to cause some real harm with the data from these devices.

Thermostat data, for example, tells you a lot about when someone is at home or not, which can be useful for thieves, law enforcement, and marketers. And this is information that wasn’t available when the thermostat was offline.

Eresource use is being snooped on, too. Adobe is collecting reader behavior information from Adobe Digital Editions, even when it’s coming from library sources. They got caught because they were transmitting that information unencrypted, which they fixed, but they aren’t not doing it anymore.

Readers cannot trust content providers. Librarians cannot trust content providers. We have to assume you’re behaving like Adobe, until you prove otherwise. It’s easy, then, to lump eresources into the Internet of Things. Back in the day, journals and books weren’t online, but now they are ways to collect data on reader behavior.

Generally speaking, content providers have very little out there in a code of practice for reader privacy, including the relevant associations. Not even the open access publications and associations. Most journal privacy policies do not measure up to library standards, including those that are OA. 16 of the top 20 research journals let ad networks track readers.

There’s no conspiracy theory here. It’s mostly accidental. In the age of print, reader privacy wasn’t an issue. Readers could do whatever they wanted with the content. Content providers need to address this now that they are capable of collecting and using all sorts of data they couldn’t before.

NISO is working on a framework for this, and the NASIG community needs to be engaged.

The ALA code of ethics doesn’t say that you shouldn’t collect data when it’s convenient — there are no exceptions. Same goes for “improving services”.

The question, “Would we do this in a physical space with people around us?” is a useful gague of the creep factor. Physical library users and digital library users should have the same privacy rights.

It’s easy to feel helpless in this. It’s easy to give up and think no user cares about their privacy. Just because it’s easy and convenient to ignore privacy, that doesn’t make it right.

Libraries and content providers need to live up to Article III of the ALA Code of Ethics: “…protect each reader’s right to privacy and confidentiality with respect to information sought or received and resources consulted, borrowed, acquired or transmitted.”

How do we do this? Understand the risks and mitigate them. Risks: personally identifying information (sometimes this is used as a smoke screen to hide what is being collected when this is not), long tail information (uncommon enough to identify individuals, even without PII), and behavior trails (highly specific time stamps, etc.). Libraries deal with this by tracking the stuff instead of the people. Libraries keep proxy server logs only long enough to identify use that violates TOS.

Determine who wants to know and why: data omnivores (NSA, Google, Facebook), data opportunists (academic researchers, usability wonks, assessment experts, readers who want to reuse their own data), and data paparazzi (doxxers, stalkers, politicians). Worry less about the opportunists and omnivores, worry a lot about the paparazzi.

What should we do or not do? No ostriching — heads out of the sand, please. The Library Freedom Project has lots of resources. Industry-level advocacy is needed — those who take the high road on privacy is afraid of being out-competed by those who don’t.

We’re not helpless. Don’t give up. License negotiation time is when we can ask the hard questions — use our Benjamins wisely. Assess mindfully, being aware of data leakage and compromised privacy.

Not even the greediest data omnivore, the most clueless data opportunist, or the most evil data paparazzi can abuse data that isn’t there. Don’t collect reader data unless there is a clear and reasonable reason to do it.

NASIG 2009: What Color Is Your Paratext?

Presenter: Geoffrey Bilder, CrossRef

The title is in reference to a book that is geared towards preparing for looking for a new job or changing careers, which is relevant to what the serials world is facing, both personnel and content. Paratext is added content that prepares the audience/reader for the meat of the document. We are very good at controlling and evaluating credibility, which is important with conveying information via paratext.

The internet is fraught with false information, which undermines credibility. The publisher’s value is being questioned because so much of their work can be done online at little or no cost, and what can’t be done cheaply is being questioned. Branding is increasingly being hidden by layers like Google which provide content without indicating the source. The librarian’s problem is similar to the publisher’s. Our value is being questioned when the digital world is capable of managing some of our work through distributed organizational structures.

“Internet Trust Anti-Pattern” — a system starts out as being a self-selected core of users with an understanding of trust, but as it grows, that can break down unless there is a structure or pervasive culture that maintains the trust and authority.

Local trust is that which is achieved through personal acquaintance and is sometimes transitive. Global trust extends through proxy, which transitively extends trust to “strangers.” Local is limited and hard to expand, and global increases systemic risk.

Horizontal trust occurs among equals with little possibility of coercion. Vertical trust occurs within a hierarchy, and coercion can be used to enforce behavior, which could lead to abuse.

Internet trust is in the local and horizontal quadrant. Scholarly trust falls in the vertical and global quadrant. It’s no wonder we’re having trouble figuring out how to do scholarship online!

Researchers have more to read and less time to read it, and it’s increasing rapidly. We need to remember that authors and readers are the same people. The amazing ways that technology has opened up communication is also causing the overload. We need something to help identify credible information.

Dorothea Salo wrote that for people who put a lot of credibility in authoritative information, we don’t do a very good job of identifying it. She blames librarians, but publishers have a responsibility, too. Heuristics are important in knowing who the intended audience is meant to be.

If you find a book at a bargain store, the implication is that it is going to be substantially less authoritative than a book from a grand, old library. (There are commercial entities selling leather bound books by the yard for buyers to use to add gravitas to their offices and personal libraries.) Scholarly journals are dull and magazines are flashy & bright. Books are traditionally organized with all sorts of content that tells academics whether or not they need to read them (table of contents, index, blurbs, preface, bibliography, etc.).

If you were to black out the text of a scholarly document, you would still be able to identify the parts displayed. You can’t do that very well with a webpage.

When we evaluate online content, we look at things like the structure of the URL and where it is linked from. In the print world, citations and footnotes were essential clues to following conversations between scholars. Linking can do that now, but the convention is still more formal. Logos can also tell us whether or not to put trust in content.

Back in the day, authors were linked to printers, but that lead to credibility problems, so publishers stepped in. Authors and readers could trust that the content was accurate and properly presented. Now it’s not just publishers — titles have become brands. A journal reputation is almost more important than who is publishing it.

How do we help people learn and understand the heuristics in identifying scholarly information? The processes for putting out credible information is partially hidden — the reader or librarian doesn’t know or see the steps involved. We used to not want to know, but now we do, particularly since it allows us to differentiate between the good players and the bad players.

The idea of the final version of a document needs to be buried. Even in the print world (with errata and addenda) we were deluding ourselves in thinking that any document was truly finished.

Why don’t we have a peer reviewed logo? Why don’t we have something that assures the reader that the document is credible? Peer review isn’t necessarily perfect or the only way.

How about a Version of Record record? Show us what was done to a document to get it to where it is now. For example, look at Creative Commons. They have a logo that indicates something about the process of creating the document which leads to machine-readable coding. How about a CrossMark that indicates what a publisher has done with a document, much like what a CC logo will lead to? created a Firefox plugin to monitor content and provides icons that flags companies and websites for different reasons. Oncode is a way of identifying organizations that have signed a code of conduct. We could do this for scholarly content.

Tim Berners Lee is actively advocating for ways to overlay trust measures on the internet. It was originally designed by academics who didn’t need it, but like the internet anti-trust pattern, the “unwashed masses” have corrupted that trust.

What can librarians and publishers do to recreate the heuristics that have been effective in print? We are still making facsimiles of print in electronic format. How are we going to create the tools that will help people evaluate digital information?

I got skillz and I know how to use them

What I wouldn’t give for a pre-conference workshop on XML or SQL or some programming language that I could apply to my daily work!

Recently, Dorothea Salo was bemoaning the lack of technology skills among librarians. I hear her, and I agree, but I don’t think that the library science programs have as much blame as she wants to assign to them.

Librarianship has created an immense Somebody Else’s Problem field around computers. Unlike reference work, unlike cataloguing, unlike management, systems is all too often not considered a librarian specialization. It is therefore not taught at a basic level in some library schools, not offered as a clear specialization track, and not recruited for as it needs to be. And it is not often addressed in a systematic fashion by continuing-education programs in librarianship.

I guess my program, eight years ago, was not one of those library schools that doesn’t teach basic computer technology. Considering that my program was not a highly ranked program, nor one known for being techie, I’m surprised to learn that we had a one-up on some other library science programs. Not only were there several library tech (and basic tech) courses available, everyone was required to take at least one computer course to learn hardware and software basics, as well as rudimentary HTML.

That being said, I suspect that the root of Salo’s ire is based in what librarians have done with the tech knowledge they were taught. In many cases, they have done nothing, letting those who are interested or have greater aptitude take over the role of tech guru in their libraries. Those of us who are interested in tech in general, and library tech in specific, have gone on to make use of what we were taught, and have added to our arsenal of skills.

My complaint, and one shared by Salo, is that we are not given very many options for learning more through professional continuing education venues that cover areas considered to be traditional librarian skills. What I wouldn’t give for a pre-conference workshop on XML or SQL or some programming language that I could apply to my daily work!