SSP/NASIG – What Do All of these Changes Mean for Vendors?

Data storage - old and new
data sharing

Speaker: Caitlin Trasande, Head of Research Policy, Digital Science

Social impact is the emerging bacon.

Digital Science supports and funds startups that build software for research. The scope is the full life cycle of research, ranging from reading literature to planning and conducting experiments to publishing and sharing the data. The disgrunterati are those who decided to be the last to complain about broken processes and build better products and models.

[insert overview of several projects funded by Digital Science]

Information may want to be free, but it needs to be accessible and understandable.

SSP/NASIG – Data Wranglers in LibraryLand—Finding Opportunities in the Changing Policy Landscape

All You Can Eat Bacon!
all you can eat…data?

Speaker: T. Scott Plutchak, Director of Digital Data Curation Strategies, The University of Alabama at Birmingham

Data is the new bacon. Data is the hot buzzword in scholarly publishing. He is working on the infrastructure, services, and policies needed to manage data on an institutional level.

Concern about data has been around for a long time. NIH developed their first policy in 2003, but it was pretty weak. Things got serious when the public access policy became mandatory in 2009. NSF developed a data management policy in 2011, which got a little more attention.

A scholarly publishing roundtable was created in 2009, reporting in 2010, made up of university administrators, librarians, publishers, and researchers. They recommended flexible policies for each agency, developed in collaboration with their consitutencies.

Libraries should be thinking about how and where and what kinds of data they should store and manage.

My small liberal arts university probably will have to do some things with this, but not to the extent he’s talking about. This is an R1 library problem, not a library problem at large. Yet.

#ERcamp13 at George Washington University

“The law of two feet” by Deb Schultz

This is going to be long and not my usual style of conference notetaking. Because this was an unconference, there really wasn’t much in the way of prepared presentations, except for the lightening talks in the morning. What follows below the jump is what I captured from the conversations, often simply questions posed that were left open for anyone to answer, or at least consider.

Some of the good aspects of the unconference style was the free-form nature of the discussions. We generally stayed on topic, but even when we didn’t, it was about a relevant or important thing that lead to the tangents, so there were still plenty of things to take away. However, this format also requires someone present who is prepared to seed the conversation if it lulls or dies and no one steps in to start a new topic.

Also, if a session is designed to be a conversation around a topic, it will fall flat if it becomes all about one person or the quirks of their own institution. I had to work pretty hard on that one during the session I led, particularly when it seemed that the problem I was hoping to discuss wasn’t an issue for several of the folks present because of how they handle the workflow.

Some of the best conversations I had were during the gathering/breakfast time as well as lunch, lending even more to the unconference ethos of learning from each other as peers.

Anyway, here are my notes.

Continue reading “#ERcamp13 at George Washington University”

NASIG 2013: Knowledge and Dignity in the Era of Big Data

CC BY 2.0 2013-06-10
“Big Data” by JD Hancock

Speaker: Siva Vaidhyanathan

Don’t try to write a book about fast moving subjects.

He was trying to capture the nature of our relationship to Google. It provides us with a services that are easy to use, fairly dependable, and well designed. However, that level of success can breed hubris. He was interested in how this drives the company to its audacious goals.

It strikes him that what Google claims to be doing is what librarians have been doing for hundreds of years already. He found himself turning to the core practices of librarians as a guideline for assessing Google.

Why is Google interested in so much stuff? What is the payoff to organizing the world’s information and making it accessible?

Big data is not a phrase that they use much, but the notion is there. More and faster equals better. Google is in the prediction/advertising business. The Google books project is their attempt to reverse engineer the sentence. Knowing how sentences work, they can simulate how to interpret and create sentences, which would be a simulation of artificial intelligence.

The NSA’s deals that give them a backdoor to our data services creates data insecurity, because if they can get in, so can the bad guys. Google keeps data about us (and has to turn it over when asked) because it benefits their business model, unlike libraries who don’t keep patron records in order to protect their privacy.

Big data means more than a lot of data. It means that we have so many instruments to gather data, cheap/ubiquitous cameras and microphones, GPS devices that we carry with us, credit card records, and more. All of these ways of creating feed into huge servers that can store the data with powerful algorithms that can analyze it. Despite all of this, there is no policy surrounding this, nor conversations about best ways to manage this in light of the impact on personal privacy. There is no incentive to curb big data activities.

Scientists are generally trained to understand that correlation is not causation. We seem to be happy enough to draw pictures with correlation and move on to the next one. With big data, it is far too easy to stop at correlation. This is a potentially dangerous way of understanding human phenomenon. We are autonomous people.

The panopticon was supposed to keep prisoners from misbehaving because they assumed they were always being watched. Foucault described the modern state in the 1970s as the panopticon. However, at this point, it doesn’t quite match. We have a cryptopticon, because we aren’t allowed to know when we are being watched. It wants us to be on our worst behavior. How can we inject transparency and objectivism into this cryptopticon?

Those who can manipulate the system will, but those who don’t know how or that it is happening will be negatively impacted. If bad credit can get you on the no-fly list, what else may be happening to people who make poor choices in one aspect of their lives that they don’t know will impact other aspects? There is no longer anonymity in our stupidity. Everything we do, or nearly so, is online. Mistakes of teenagers will have an impact on their adult lives in ways we’ve never experienced before. Our inability to forget renders us incapable of looking at things in context.

Mo Data, Mo Problems