SSP/NASIG – What Do All of these Changes Mean for Vendors?

Data storage - old and new
data sharing

Speaker: Caitlin Trasande, Head of Research Policy, Digital Science

Social impact is the emerging bacon.

Digital Science supports and funds startups that build software for research. The scope is the full life cycle of research, ranging from reading literature to planning and conducting experiments to publishing and sharing the data. The disgrunterati are those who decided to be the last to complain about broken processes and build better products and models.

[insert overview of several projects funded by Digital Science]

Information may want to be free, but it needs to be accessible and understandable.

SSP/NASIG – Data Wranglers in LibraryLand—Finding Opportunities in the Changing Policy Landscape

All You Can Eat Bacon!
all you can eat…data?

Speaker: T. Scott Plutchak, Director of Digital Data Curation Strategies, The University of Alabama at Birmingham

Data is the new bacon. Data is the hot buzzword in scholarly publishing. He is working on the infrastructure, services, and policies needed to manage data on an institutional level.

Concern about data has been around for a long time. NIH developed their first policy in 2003, but it was pretty weak. Things got serious when the public access policy became mandatory in 2009. NSF developed a data management policy in 2011, which got a little more attention.

A scholarly publishing roundtable was created in 2009, reporting in 2010, made up of university administrators, librarians, publishers, and researchers. They recommended flexible policies for each agency, developed in collaboration with their consitutencies.

Libraries should be thinking about how and where and what kinds of data they should store and manage.

My small liberal arts university probably will have to do some things with this, but not to the extent he’s talking about. This is an R1 library problem, not a library problem at large. Yet.

#ERcamp13 at George Washington University

“The law of two feet” by Deb Schultz

This is going to be long and not my usual style of conference notetaking. Because this was an unconference, there really wasn’t much in the way of prepared presentations, except for the lightening talks in the morning. What follows below the jump is what I captured from the conversations, often simply questions posed that were left open for anyone to answer, or at least consider.

Some of the good aspects of the unconference style was the free-form nature of the discussions. We generally stayed on topic, but even when we didn’t, it was about a relevant or important thing that lead to the tangents, so there were still plenty of things to take away. However, this format also requires someone present who is prepared to seed the conversation if it lulls or dies and no one steps in to start a new topic.

Also, if a session is designed to be a conversation around a topic, it will fall flat if it becomes all about one person or the quirks of their own institution. I had to work pretty hard on that one during the session I led, particularly when it seemed that the problem I was hoping to discuss wasn’t an issue for several of the folks present because of how they handle the workflow.

Some of the best conversations I had were during the gathering/breakfast time as well as lunch, lending even more to the unconference ethos of learning from each other as peers.

Anyway, here are my notes.

Continue reading “#ERcamp13 at George Washington University”

NASIG 2013: Knowledge and Dignity in the Era of Big Data

CC BY 2.0 2013-06-10
“Big Data” by JD Hancock

Speaker: Siva Vaidhyanathan

Don’t try to write a book about fast moving subjects.

He was trying to capture the nature of our relationship to Google. It provides us with a services that are easy to use, fairly dependable, and well designed. However, that level of success can breed hubris. He was interested in how this drives the company to its audacious goals.

It strikes him that what Google claims to be doing is what librarians have been doing for hundreds of years already. He found himself turning to the core practices of librarians as a guideline for assessing Google.

Why is Google interested in so much stuff? What is the payoff to organizing the world’s information and making it accessible?

Big data is not a phrase that they use much, but the notion is there. More and faster equals better. Google is in the prediction/advertising business. The Google books project is their attempt to reverse engineer the sentence. Knowing how sentences work, they can simulate how to interpret and create sentences, which would be a simulation of artificial intelligence.

The NSA’s deals that give them a backdoor to our data services creates data insecurity, because if they can get in, so can the bad guys. Google keeps data about us (and has to turn it over when asked) because it benefits their business model, unlike libraries who don’t keep patron records in order to protect their privacy.

Big data means more than a lot of data. It means that we have so many instruments to gather data, cheap/ubiquitous cameras and microphones, GPS devices that we carry with us, credit card records, and more. All of these ways of creating feed into huge servers that can store the data with powerful algorithms that can analyze it. Despite all of this, there is no policy surrounding this, nor conversations about best ways to manage this in light of the impact on personal privacy. There is no incentive to curb big data activities.

Scientists are generally trained to understand that correlation is not causation. We seem to be happy enough to draw pictures with correlation and move on to the next one. With big data, it is far too easy to stop at correlation. This is a potentially dangerous way of understanding human phenomenon. We are autonomous people.

The panopticon was supposed to keep prisoners from misbehaving because they assumed they were always being watched. Foucault described the modern state in the 1970s as the panopticon. However, at this point, it doesn’t quite match. We have a cryptopticon, because we aren’t allowed to know when we are being watched. It wants us to be on our worst behavior. How can we inject transparency and objectivism into this cryptopticon?

Those who can manipulate the system will, but those who don’t know how or that it is happening will be negatively impacted. If bad credit can get you on the no-fly list, what else may be happening to people who make poor choices in one aspect of their lives that they don’t know will impact other aspects? There is no longer anonymity in our stupidity. Everything we do, or nearly so, is online. Mistakes of teenagers will have an impact on their adult lives in ways we’ve never experienced before. Our inability to forget renders us incapable of looking at things in context.

Mo Data, Mo Problems

my presentation for Internet Librarian 2012

Apologies for the delay. It took longer than I expected to have the file and a stable internet connection at the same time. You’ll find the notes on the SlideShare page.

being a student is time-consuming

I need to find a happy medium between self-paced instruction and structured instruction.

What have I done!?
“What have I done!?” by Miguel Angel

I signed up for a Coursera class on statistics for social science researchers because I wanted to learn how to better make use of library data and also how to use the open source program for statistical computing, R. The course information indicated I’d need to plan for 4-6 hours per week, which seemed doable, until I got into it.

The course consists of several lecture videos, most of which include a short “did you get the main concepts” multiple-choice quiz at the end. Each week there is an assignment and graded quiz, and of course a midterm and final.

It didn’t help that I started off behind, getting through only a lecture or two before the end of the first week, and missing the deadline for having the first assignment and quiz graded. I scrambled to catch up the second week, but once again couldn’t make it through the lectures in time.

That’s when I realized that it was going to take much longer than projected to keep up with this course. A 20-30 min lecture would take me 45-60 min to get through because I was constantly having to pause and write notes before the lecturer went on to the next concept. And since I was using Microsoft OneNote to keep and organize my notes, anything that involved a formula took longer to copy down.

By the end of the third week, I was still a few lectures away from finishing the second week, and I could see that it would take more time than I had to keep going, but I decided to go another week and do what I could.

That was this week, and I haven’t had time to make any more progress than where I was last week. With no prospect of catching up before the midterm deadline, I decided to withdraw from the course.

This makes me both disappointed in myself and in the structure of the course. I hate quitting, and I really want to learn the stuff. But, as I fell further and further behind, it became easier to put it off and focus on other overdue items on my task list, and thus compounding the problem.

The instructor for the course was easy to follow, and I like his lecture style, but when it came time to do the graded quiz and assignment, I realized I clearly had not understood everything, or he expected me to have more of a background in the field than a novice. It also seemed like the content was geared towards a 12 week course and with this being only 8 weeks, rather than reduce the content accordingly, he was cramming it all into those 8 weeks.

Having deadlines was a great motivation to keep up with the course, which I haven’t had when I’ve tried to learn on my own. It was the volume of content to absorb between those deadlines that tripped me up. I need to find a happy medium between self-paced instruction and structured instruction.

ER&L 2012: New ARL Best Practices in Fair Use

laptop lid comic relief + Dawn and Drew
photo by YayAdrian

Speaker: Brandon Butler (Peter Jaszi was absent)

The purpose of copyright is to promote the creation of culture. It is not to ensure that authors get a steady stream of income no matter what, or to pay them back for the hard work they do, or to show our respect for the value they add to society. It’s about getting the stuff into the culture, and giving the creators enough incentive to do it.

One way it does it is to give creators exclusive rights for a limited period of time. The limit encourages new makers to use and remix existing culture.

Fair use is the biggest balancing feature of copyright. It ensures that the rights provided to the creators don’t become oppressive to the users. Fair use is the legal, unauthorized use of copyrighted material… under some circumstances. And we’ve spent generations trying to figure out which circumstances apply.

Fair use is a space for creativity. It gives you the leeway to take the culture around you and incorporate it into your work. It allows you to quote other scholarship in your research. It allows you to incorporate art into new works.

There are four factors of fair use. Every judge should consider the reason for the use, the kind of work used, the amount used, and the effect on the market. But it doesn’t tell the judges how much to consider or which is more important. The good news is that judges love balancing features, and the Supreme Court has determined that fair use protects free speech. However, since copyright is automatically conferred as soon as the creation is fixed, the fair use judicial interpretations have shifted greatly since 1990 to be more in the balance of the users in certain circumstances.

Without fair use, copyright would be in conflict with the 1st Amendment.

Judges want to know if the use is transformative (i.e. for a new purpose, context, audience, insight) and if you used the right amount in that transformative process. For example, parody is making fun of the original work, not just reusing it. An appropriate amount can refer to both the quantity of the original in the transformative work, and also the audience who received your transformative work. For example, the many photographic memes that take pictures and alter them to fit a theme, like One Tiny Hand.

Judges care about you and what you think is fair. There is a pattern of judges deferring to the well-articulated norms of a practice community.

Best practices codes are a logical outgrowth of the things the communities have articulated as their values and the things they would consider to be legitimate transformative works. Documentary filmmakers, scholars, media literacy teachers, online video, dance collections, open course ware, poets… many groups are creating best practices for fair use.

The documentary filmmakers have had a code of best practice for a long time. They realized that without it, they were limiting themselves too much in what they could create. Once they codified their values, more broadcast sources were willing to take films and new kinds of films were being made. Insurers of errors and omissions insurance were able to accept fair use claims, and lawyers use the Statement to build their own best practices in the relevant areas.

Keep in mind, though, that these are best practices and not guidelines. Principles, not rules. Limitations, not bans. Reasoning, not rote. The numerical limits we once followed are not the law, and we need to keep them fresh to be relevant.

Licensing is a different thing all together. This means you may have less rights in some instances, and more rights in others, regardless of fair use.

For libraries, fair use enables our mission to serve knowledge past, present, and future. We have a duty to make copyrighted works real and accessible in the way people use things now. What will libraries be in the future? How will we stay relevant? We need to have some flexibility with the stuff we have in our collections.

Many librarians are discouraged. Insecurity and hesitation equal staff costs to hire someone to clear copyright questions. Fair use would help, but it’s underused. Risk aversion subsumes fair use analysis.

The ARL document took a lot of people from diverse institutions and many hours of discussion to create it, and it was reviewed by several legal experts. It’s not risk-free, since it would need to stand up in court first (and there are always lawsuit-happy people), but it seems okay based on past judgement.

They hope it will put legal risks into perspective, and will give librarians a tool to go to general counsels and administrations and let them know things are changing. It considered the views of librarians and their values, and they also hope that people will speak out publicly that they support the Code.

Fair use applies in these common situtations:

  • course reserves — digital access to teaching materials for students and faculty, although it should be limited to access by only the appropriate audience
  • both physical and virtual exhibits — if it highlights a theme or commonality, you’re doing something new to help people understand what’s in your library
  • digitizing to preserve at-risk items — you’re not a publisher or scam artist, you’re a librarian making sure the things are accessible over time (like VHS tapes)
  • digitizing special collections and archives — you’re keeping it alive
  • access to research and teaching materials for disabled users — i.e. Daisy
  • institutional repositories
  • creation of search indexes
  • making topically-based collections of web-based materials

Practice makes practice. It won’t work if you don’t use it.

ejournal use by subject

A couple of weeks ago I blogged about an idea I had that involved combining subject data from SerialsSolutions with use data for our ejournals to get a broad picture of ejournal use by subject. It took a bit of tooling around with Access tables and queries, including making my first crosstab, but I’ve finally got the data put together in a useful way.

It’s not quite comprehensive, since it only covers ejournals for which SerialsSolutions has assigned a subject, which also have ISSNs, and are available through sources that provide COUNTER or similar use statistics. But, it’s better than nothing.