NASIG 2013: Knowledge and Dignity in the Era of Big Data

CC BY 2.0 2013-06-10
“Big Data” by JD Hancock

Speaker: Siva Vaidhyanathan

Don’t try to write a book about fast moving subjects.

He was trying to capture the nature of our relationship to Google. It provides us with a services that are easy to use, fairly dependable, and well designed. However, that level of success can breed hubris. He was interested in how this drives the company to its audacious goals.

It strikes him that what Google claims to be doing is what librarians have been doing for hundreds of years already. He found himself turning to the core practices of librarians as a guideline for assessing Google.

Why is Google interested in so much stuff? What is the payoff to organizing the world’s information and making it accessible?

Big data is not a phrase that they use much, but the notion is there. More and faster equals better. Google is in the prediction/advertising business. The Google books project is their attempt to reverse engineer the sentence. Knowing how sentences work, they can simulate how to interpret and create sentences, which would be a simulation of artificial intelligence.

The NSA’s deals that give them a backdoor to our data services creates data insecurity, because if they can get in, so can the bad guys. Google keeps data about us (and has to turn it over when asked) because it benefits their business model, unlike libraries who don’t keep patron records in order to protect their privacy.

Big data means more than a lot of data. It means that we have so many instruments to gather data, cheap/ubiquitous cameras and microphones, GPS devices that we carry with us, credit card records, and more. All of these ways of creating feed into huge servers that can store the data with powerful algorithms that can analyze it. Despite all of this, there is no policy surrounding this, nor conversations about best ways to manage this in light of the impact on personal privacy. There is no incentive to curb big data activities.

Scientists are generally trained to understand that correlation is not causation. We seem to be happy enough to draw pictures with correlation and move on to the next one. With big data, it is far too easy to stop at correlation. This is a potentially dangerous way of understanding human phenomenon. We are autonomous people.

The panopticon was supposed to keep prisoners from misbehaving because they assumed they were always being watched. Foucault described the modern state in the 1970s as the panopticon. However, at this point, it doesn’t quite match. We have a cryptopticon, because we aren’t allowed to know when we are being watched. It wants us to be on our worst behavior. How can we inject transparency and objectivism into this cryptopticon?

Those who can manipulate the system will, but those who don’t know how or that it is happening will be negatively impacted. If bad credit can get you on the no-fly list, what else may be happening to people who make poor choices in one aspect of their lives that they don’t know will impact other aspects? There is no longer anonymity in our stupidity. Everything we do, or nearly so, is online. Mistakes of teenagers will have an impact on their adult lives in ways we’ve never experienced before. Our inability to forget renders us incapable of looking at things in context.

Mo Data, Mo Problems

conference tweeting etiquette

“Tiny birds in my hand..” by ~Ilse

Conference season, or at least the part of it that appeals to my area of librarianship, is starting soon.

Up first for me is Computers in Libraries in DC, where I won’t be attending, but instead vacationing nearby (since it is so close) and visiting with colleagues and friends who will be attending. I’d go, but I already have funding this year for three conferences, and it didn’t seem fair to ask for another.

Next,  I fly to Austin for the Electronic Resources & Libraries conference. From the venue to the content, this is becoming my favorite conference. I’ve had to actively introduce more diversity to the sessions I choose to attend, otherwise I would spend the whole conference geeking out about use data and spreadsheets and such.

Finally, I head to Buffalo for the conference that shaped me into the librarian I became: NASIG (North American Serials Interest Group). I like this one because I’ve known many of the attendees for the entirety of my relatively short career, and because it works very hard to not be just a librarian conference, but rather an industry-wide discussion of all things serial in libraryland.

It was in the context of thinking about these upcoming conferences that I read the latest Prof Hacker blog post from The Chronicle of Higher Education. Ryan Cordell writes about his experiences with conference tweeting and the recent revelations he has had regarding the impact this can have on the presenters, whether they are active participants on Twitter or not. Many things he wrote resonated with me, and reminded me that Twitter — as well as other popular social media platforms — is no longer the private back-channel of a few techie friends, but is a global platform that can have a broader impact than any of us may know.

I suggest reading the whole article, but I would like to quote here the Principles of Conference Tweeting that Cordell offers, as something for us all to keep in mind:

  1. I will post praise generously, sharing what I find interesting about presentations.
  2. Likewise, I will share pertinent links to people and projects, in order to bring attention to my colleagues’ work.
  3. When posting questions or critiques, I will include the panelist’s username (an @ mention) whenever possible.
  4. If the panelist does not have a username—or if I cannot find it—I will do my best to alert them when I post questions or critiques, rather than leaving them to discover those engagements independently.
  5. I will not post questions to Twitter that I would not ask in the panel Q&A.
  6. I will not use a tone on Twitter that I would not use when speaking to the scholar in person.
  7. I will avoid “crosstalk”—joking exchanges only tangentially related to the talk—unless the presenter is explicitly involved in the chatter.
  8. I will refuse to post or engage with posts that comment on the presenter’s person, rather than the presenter’s ideas.

IL 2012: The Next Big Thing

Moving on
“Moving on” by Craig Allen

Speaker: Dave Hesse & Brian Pichman

They used a Lazer Tag like system to set up “Hunger Games” nights in the library. They also used a bunch of interactive tech toys for different kinds of game nights.

They’re mounting tables as shelf labels that show the range in sleep mode, but when activated will display reviews and other information about books in the range, as well as other interactive multimedia.

Speaker: Sarah Houghton

Cutting stuff. Cutting lots of things out of the budget, services, etc. All of these things we learn about take time and money, and we can’t do all of them. She’s making everyone in her library earn their pet program. It has to show some sort of ROI (not specifically financial). Make business decisions about what we do and why.

Q: What did you cut that you didn’t want to?
A: Magnatune deal — really wanted to do it, but didn’t have the staff time and a negative amount of money to dedicate to anything.

Speaker: Ben Bizzle

We are doing a really poor job of marketing ourselves to our communities, and we’re wasting money on old methods and tools to do it. There are more cost-effective ways to do this, particularly for public libraries. Facebook is a really cost-effective way to market to your community over and over again, and running ads to get people in your community to like your Facebook page has been shown to be very effective. Be part of the stream without being disruptive. Facebook events invitations are disruptive and ineffective.

Next big things from the audience:

  • Would like to have a better way to provide remote authentication for users from anywhere, regardless of the speed of the connection (i.e. 3G mobile phone or a hotel wireless connection).
  • Focusing on programming that brings the Spanish-speaking and English-speaking communities together.
  • Integrating local self-published creators’ content within the rest of the library’s electronic content.
  • Trying to find better metrics to measure success for ROI.
  • Developing community investors from FOL and active volunteers.
  • Giving up paper flyers/posters and moving to digital.
  • Moving social media effort to marketing department.
  • Looking at duplicate efforts and winnowing them down.
  • Learning how to code.
  • Hiring part-time and hiring non-librarians.
  • FRBR. RDA. Say no more.
  • Advocacy. Facetime with politicians and other sources of funding.
  • Would like to hear more from public libraries on ‘bring your own device’ initiatives that could be applied in the academic library setting.
  • Gamification of library resources and services.
  • Wikipedia – we should be creating more content there.
  • Better relationships with publishers.
  • The next level of life-long learning like Coursera and making the library a hub for it.
  • Downloadble database of music by local musicians.
  • Copyright, curations, folksonomies, and other issues of creating communities.
  • Podcasting.
  • Digitization projects that engage specific communities.
  • Keeping my head above water. Migrating to a more self-service model while maintaining a high level of service.
  • Moving to a new ILS. Proprietary or open source?
  • Reaching out to atypical non-users. Running ads in local for sale magazines.
  • Lock-in gaming nights.

IL 2012: Discovery Systems

Space Shuttle Discovery Landing At Washington DC
“Space Shuttle Discovery Landing At Washington DC” by Glyn Lowe

Speaker: Bob Fernekes

The Gang of Four: Google, Apple, Amazon, & Facebook

Google tends to acquire companies to grow the capabilities of it. We all know about Apple. Amazon sells more ebooks than print books now. Facebook is… yeah. That.

And then we jump to selecting a discovery service. You would do that in order to make the best use of the licensed content. This guy’s library did a soft launch in the past year of the discovery service they chose, and it’s had an impact on the instruction and tools (i.e. search boxes) he uses.

And I kind of lost track of what he was talking about, in part because he jumped from one thing to the next, without much of a transition or connection. I think there was something about usability studies after they implemented it, although they seemed to focus on more than just the discovery service.

Speaker: Alison Steinberg Gurganus

Why choose a discovery system? You probably already know. Students lack search skills, but they know how to search, so we need to give them something that will help them navigate the proprietary stuff we offer out on the web.

The problem with the discovery systems is that they are very proprietary. They don’t quite play fairly or nicely with competitor’s content yet.

Our users need to be able to evaluate, but they also need to find the stuff in the first place. A great discovery service should be self-explanatory, but we don’t have that yet.

We have students who understand Google, which connects them to all the information and media they want. We need something like that for our library resources.

When they were implementing the discovery tool, they wanted to make incremental changes to the website to direct users to it. They went from two columns, with the left column being text links to categories of library resources and services, to three columns, with the discover search box in the middle column.

When they were customizing the look of the discovery search results, they changed the titles of items to red (from blue). She notes that users tend to ignore the outside columns because that’s where Google puts advertisements, so they are looking at ways to make that information more visible.

I also get the impression that she doesn’t really understand how a discovery service works or what it’s supposed to do.

Speaker: Athena Hoeppner

Hypothesis: discovery includes sufficient content of high enough quality, with full text, and …. (didn’t type fast enough).

Looked at final papers from a PhD level course (34), specifically the methodology section and bibliography. Searched for each item in the discovery search as well as one general aggregator database and two subject-specific databases. The works cited were predominately articles, with a significant number of web sources that were not available through library resources. She was able to find more citations in the discovery search than in Google Scholar or any of the other library databases.

Clearly the discovery search was sufficient for finding the content they needed. Then they used a satisfaction survey of the same students that covered familiarity and frequency of use for the subject indexes, discovery search, and Google Scholar. Ultimately, it came down that the students were satisfied and happy with the subject indexes, and too few respondents to get a sense of satisfaction with the discovery search or Google Scholar.

Conclusions: Students are unfamiliar with the discovery system, but it could support their research needs. However, we don’t know if they can find the things they are looking for in it (search skills), nor do we know if they will ultimately be happy with it.

NASIG 2012: Why the Internet is More Attractive Than the Library

Speaker: Dr. Lynn Silipigni Connaway, OCLC

Students, particularly undergraduates, find Google search results to make more sense than library database search results. In the past, these kinds of users had to work around our services, but now we need to make our resources fit their workflow.

Connaway has tried to compare 12 different user behavior studies in the UK and the US to draw some broad conclusions, and this has informed her talk today.

Convenience is number one, and it changes. Context and situation are very important, and we need to remember that when asking questions about our users. Sometimes they just want the answer, not instruction on how to do the research.

Most people power browse these days: scan small chunks of information, view first few pages, no real reading. They combine this with squirreling — short, basic searches and saving the content for later use.

Students prefer keyword searches. This is supported by looking at the kinds of terms used in the search. Experts use broad terms to cover all possible indexing, novices use specific terms. So why do we keep trying to get them to use the “advance” search in our resources?

Students are confident with information discovery tools. They mainly use their common sense for determining the credibility of a site. If a site appears to have put some time into the presentation, then they are more likely to believe it.

Students are frustrated with navigating library websites, the inconvenience of communicating with librarians face to face, and they tend to associate libraries only with books, not with other information. They don’t recognize that the library is who is providing them with access to online content like JSTOR and the things they find in Google Scholar.

Students and faculty often don’t realize they can ask a question of a librarian in person because we look “busy” staring at our screens at the desk.

Researchers don’t understand copyright, or what they have signed away. They tend to be self-taught in discovery, picking up the same patterns as their graduate professors. Sometimes they rely on the students to tell them about newer ways of finding information.

Researchers get frustrated with the lack of access to electronic backfiles of journals, discovering non-English content, and unavailable content in search results (dead links, access limitation). Humanities researchers feel like there is a lack of good, specialized search engines for them (mostly for science). They get frustrated when they go to the library because of poor usability (i.e. signs) and a lack of integration between resources.

Access is more important than discovery. They want a seamless transition from discovery to access, without a bunch of authentication barriers.

We should be improving our OPACs. Take a look at Trove and Westerville Public Library. We need to think more like startups.

tl;dr – everything you’ve heard or read about what our users really do and really need, but we still haven’t addressed in the tools and services we offer to them