NASIG 2015 – Ain’t Nobody’s Business If I Do (Read Serials)

Speaker: Dorothea Salo, Faculty Associate, University of Wisconsin – Madison

Publishers and providers are collecting massive amounts of user data, and Salo is not happy about this. ALA code of ethics is not happy about this, either.

Why does privacy matter?

The gizmos that have ticked along for ages without being connected are now connected to the internet. It can be very handy, like smart thermostats, or a little too snoopy like the smart TV that listens in on your conversations. The FTC is scrutinizing the Internet of Things very closely, because it’s easy to cause some real harm with the data from these devices.

Thermostat data, for example, tells you a lot about when someone is at home or not, which can be useful for thieves, law enforcement, and marketers. And this is information that wasn’t available when the thermostat was offline.

Eresource use is being snooped on, too. Adobe is collecting reader behavior information from Adobe Digital Editions, even when it’s coming from library sources. They got caught because they were transmitting that information unencrypted, which they fixed, but they aren’t not doing it anymore.

Readers cannot trust content providers. Librarians cannot trust content providers. We have to assume you’re behaving like Adobe, until you prove otherwise. It’s easy, then, to lump eresources into the Internet of Things. Back in the day, journals and books weren’t online, but now they are ways to collect data on reader behavior.

Generally speaking, content providers have very little out there in a code of practice for reader privacy, including the relevant associations. Not even the open access publications and associations. Most journal privacy policies do not measure up to library standards, including those that are OA. 16 of the top 20 research journals let ad networks track readers.

There’s no conspiracy theory here. It’s mostly accidental. In the age of print, reader privacy wasn’t an issue. Readers could do whatever they wanted with the content. Content providers need to address this now that they are capable of collecting and using all sorts of data they couldn’t before.

NISO is working on a framework for this, and the NASIG community needs to be engaged.

The ALA code of ethics doesn’t say that you shouldn’t collect data when it’s convenient — there are no exceptions. Same goes for “improving services”.

The question, “Would we do this in a physical space with people around us?” is a useful gague of the creep factor. Physical library users and digital library users should have the same privacy rights.

It’s easy to feel helpless in this. It’s easy to give up and think no user cares about their privacy. Just because it’s easy and convenient to ignore privacy, that doesn’t make it right.

Libraries and content providers need to live up to Article III of the ALA Code of Ethics: “…protect each reader’s right to privacy and confidentiality with respect to information sought or received and resources consulted, borrowed, acquired or transmitted.”

How do we do this? Understand the risks and mitigate them. Risks: personally identifying information (sometimes this is used as a smoke screen to hide what is being collected when this is not), long tail information (uncommon enough to identify individuals, even without PII), and behavior trails (highly specific time stamps, etc.). Libraries deal with this by tracking the stuff instead of the people. Libraries keep proxy server logs only long enough to identify use that violates TOS.

Determine who wants to know and why: data omnivores (NSA, Google, Facebook), data opportunists (academic researchers, usability wonks, assessment experts, readers who want to reuse their own data), and data paparazzi (doxxers, stalkers, politicians). Worry less about the opportunists and omnivores, worry a lot about the paparazzi.

What should we do or not do? No ostriching — heads out of the sand, please. The Library Freedom Project has lots of resources. Industry-level advocacy is needed — those who take the high road on privacy is afraid of being out-competed by those who don’t.

We’re not helpless. Don’t give up. License negotiation time is when we can ask the hard questions — use our Benjamins wisely. Assess mindfully, being aware of data leakage and compromised privacy.

Not even the greediest data omnivore, the most clueless data opportunist, or the most evil data paparazzi can abuse data that isn’t there. Don’t collect reader data unless there is a clear and reasonable reason to do it.

css.php