NASIG 2013: Knowledge and Dignity in the Era of Big Data

CC BY 2.0 2013-06-10
“Big Data” by JD Hancock

Speaker: Siva Vaidhyanathan

Don’t try to write a book about fast moving subjects.

He was trying to capture the nature of our relationship to Google. It provides us with a services that are easy to use, fairly dependable, and well designed. However, that level of success can breed hubris. He was interested in how this drives the company to its audacious goals.

It strikes him that what Google claims to be doing is what librarians have been doing for hundreds of years already. He found himself turning to the core practices of librarians as a guideline for assessing Google.

Why is Google interested in so much stuff? What is the payoff to organizing the world’s information and making it accessible?

Big data is not a phrase that they use much, but the notion is there. More and faster equals better. Google is in the prediction/advertising business. The Google books project is their attempt to reverse engineer the sentence. Knowing how sentences work, they can simulate how to interpret and create sentences, which would be a simulation of artificial intelligence.

The NSA’s deals that give them a backdoor to our data services creates data insecurity, because if they can get in, so can the bad guys. Google keeps data about us (and has to turn it over when asked) because it benefits their business model, unlike libraries who don’t keep patron records in order to protect their privacy.

Big data means more than a lot of data. It means that we have so many instruments to gather data, cheap/ubiquitous cameras and microphones, GPS devices that we carry with us, credit card records, and more. All of these ways of creating feed into huge servers that can store the data with powerful algorithms that can analyze it. Despite all of this, there is no policy surrounding this, nor conversations about best ways to manage this in light of the impact on personal privacy. There is no incentive to curb big data activities.

Scientists are generally trained to understand that correlation is not causation. We seem to be happy enough to draw pictures with correlation and move on to the next one. With big data, it is far too easy to stop at correlation. This is a potentially dangerous way of understanding human phenomenon. We are autonomous people.

The panopticon was supposed to keep prisoners from misbehaving because they assumed they were always being watched. Foucault described the modern state in the 1970s as the panopticon. However, at this point, it doesn’t quite match. We have a cryptopticon, because we aren’t allowed to know when we are being watched. It wants us to be on our worst behavior. How can we inject transparency and objectivism into this cryptopticon?

Those who can manipulate the system will, but those who don’t know how or that it is happening will be negatively impacted. If bad credit can get you on the no-fly list, what else may be happening to people who make poor choices in one aspect of their lives that they don’t know will impact other aspects? There is no longer anonymity in our stupidity. Everything we do, or nearly so, is online. Mistakes of teenagers will have an impact on their adult lives in ways we’ve never experienced before. Our inability to forget renders us incapable of looking at things in context.

Mo Data, Mo Problems

conference tweeting etiquette

“Tiny birds in my hand..” by ~Ilse

Conference season, or at least the part of it that appeals to my area of librarianship, is starting soon.

Up first for me is Computers in Libraries in DC, where I won’t be attending, but instead vacationing nearby (since it is so close) and visiting with colleagues and friends who will be attending. I’d go, but I already have funding this year for three conferences, and it didn’t seem fair to ask for another.

Next,  I fly to Austin for the Electronic Resources & Libraries conference. From the venue to the content, this is becoming my favorite conference. I’ve had to actively introduce more diversity to the sessions I choose to attend, otherwise I would spend the whole conference geeking out about use data and spreadsheets and such.

Finally, I head to Buffalo for the conference that shaped me into the librarian I became: NASIG (North American Serials Interest Group). I like this one because I’ve known many of the attendees for the entirety of my relatively short career, and because it works very hard to not be just a librarian conference, but rather an industry-wide discussion of all things serial in libraryland.

It was in the context of thinking about these upcoming conferences that I read the latest Prof Hacker blog post from The Chronicle of Higher Education. Ryan Cordell writes about his experiences with conference tweeting and the recent revelations he has had regarding the impact this can have on the presenters, whether they are active participants on Twitter or not. Many things he wrote resonated with me, and reminded me that Twitter — as well as other popular social media platforms — is no longer the private back-channel of a few techie friends, but is a global platform that can have a broader impact than any of us may know.

I suggest reading the whole article, but I would like to quote here the Principles of Conference Tweeting that Cordell offers, as something for us all to keep in mind:

  1. I will post praise generously, sharing what I find interesting about presentations.
  2. Likewise, I will share pertinent links to people and projects, in order to bring attention to my colleagues’ work.
  3. When posting questions or critiques, I will include the panelist’s username (an @ mention) whenever possible.
  4. If the panelist does not have a username—or if I cannot find it—I will do my best to alert them when I post questions or critiques, rather than leaving them to discover those engagements independently.
  5. I will not post questions to Twitter that I would not ask in the panel Q&A.
  6. I will not use a tone on Twitter that I would not use when speaking to the scholar in person.
  7. I will avoid “crosstalk”—joking exchanges only tangentially related to the talk—unless the presenter is explicitly involved in the chatter.
  8. I will refuse to post or engage with posts that comment on the presenter’s person, rather than the presenter’s ideas.

IL 2012: The Next Big Thing

Moving on
“Moving on” by Craig Allen

Speaker: Dave Hesse & Brian Pichman

They used a Lazer Tag like system to set up “Hunger Games” nights in the library. They also used a bunch of interactive tech toys for different kinds of game nights.

They’re mounting tables as shelf labels that show the range in sleep mode, but when activated will display reviews and other information about books in the range, as well as other interactive multimedia.

Speaker: Sarah Houghton

Cutting stuff. Cutting lots of things out of the budget, services, etc. All of these things we learn about take time and money, and we can’t do all of them. She’s making everyone in her library earn their pet program. It has to show some sort of ROI (not specifically financial). Make business decisions about what we do and why.

Q: What did you cut that you didn’t want to?
A: Magnatune deal — really wanted to do it, but didn’t have the staff time and a negative amount of money to dedicate to anything.

Speaker: Ben Bizzle

We are doing a really poor job of marketing ourselves to our communities, and we’re wasting money on old methods and tools to do it. There are more cost-effective ways to do this, particularly for public libraries. Facebook is a really cost-effective way to market to your community over and over again, and running ads to get people in your community to like your Facebook page has been shown to be very effective. Be part of the stream without being disruptive. Facebook events invitations are disruptive and ineffective.

Next big things from the audience:

  • Would like to have a better way to provide remote authentication for users from anywhere, regardless of the speed of the connection (i.e. 3G mobile phone or a hotel wireless connection).
  • Focusing on programming that brings the Spanish-speaking and English-speaking communities together.
  • Integrating local self-published creators’ content within the rest of the library’s electronic content.
  • Trying to find better metrics to measure success for ROI.
  • Developing community investors from FOL and active volunteers.
  • Giving up paper flyers/posters and moving to digital.
  • Moving social media effort to marketing department.
  • Looking at duplicate efforts and winnowing them down.
  • Learning how to code.
  • Hiring part-time and hiring non-librarians.
  • FRBR. RDA. Say no more.
  • Advocacy. Facetime with politicians and other sources of funding.
  • Would like to hear more from public libraries on ‘bring your own device’ initiatives that could be applied in the academic library setting.
  • Gamification of library resources and services.
  • Wikipedia – we should be creating more content there.
  • Better relationships with publishers.
  • The next level of life-long learning like Coursera and making the library a hub for it.
  • Downloadble database of music by local musicians.
  • Copyright, curations, folksonomies, and other issues of creating communities.
  • Podcasting.
  • Digitization projects that engage specific communities.
  • Keeping my head above water. Migrating to a more self-service model while maintaining a high level of service.
  • Moving to a new ILS. Proprietary or open source?
  • Reaching out to atypical non-users. Running ads in local for sale magazines.
  • Lock-in gaming nights.

IL 2012: Discovery Systems

Space Shuttle Discovery Landing At Washington DC
“Space Shuttle Discovery Landing At Washington DC” by Glyn Lowe

Speaker: Bob Fernekes

The Gang of Four: Google, Apple, Amazon, & Facebook

Google tends to acquire companies to grow the capabilities of it. We all know about Apple. Amazon sells more ebooks than print books now. Facebook is… yeah. That.

And then we jump to selecting a discovery service. You would do that in order to make the best use of the licensed content. This guy’s library did a soft launch in the past year of the discovery service they chose, and it’s had an impact on the instruction and tools (i.e. search boxes) he uses.

And I kind of lost track of what he was talking about, in part because he jumped from one thing to the next, without much of a transition or connection. I think there was something about usability studies after they implemented it, although they seemed to focus on more than just the discovery service.

Speaker: Alison Steinberg Gurganus

Why choose a discovery system? You probably already know. Students lack search skills, but they know how to search, so we need to give them something that will help them navigate the proprietary stuff we offer out on the web.

The problem with the discovery systems is that they are very proprietary. They don’t quite play fairly or nicely with competitor’s content yet.

Our users need to be able to evaluate, but they also need to find the stuff in the first place. A great discovery service should be self-explanatory, but we don’t have that yet.

We have students who understand Google, which connects them to all the information and media they want. We need something like that for our library resources.

When they were implementing the discovery tool, they wanted to make incremental changes to the website to direct users to it. They went from two columns, with the left column being text links to categories of library resources and services, to three columns, with the discover search box in the middle column.

When they were customizing the look of the discovery search results, they changed the titles of items to red (from blue). She notes that users tend to ignore the outside columns because that’s where Google puts advertisements, so they are looking at ways to make that information more visible.

I also get the impression that she doesn’t really understand how a discovery service works or what it’s supposed to do.

Speaker: Athena Hoeppner

Hypothesis: discovery includes sufficient content of high enough quality, with full text, and …. (didn’t type fast enough).

Looked at final papers from a PhD level course (34), specifically the methodology section and bibliography. Searched for each item in the discovery search as well as one general aggregator database and two subject-specific databases. The works cited were predominately articles, with a significant number of web sources that were not available through library resources. She was able to find more citations in the discovery search than in Google Scholar or any of the other library databases.

Clearly the discovery search was sufficient for finding the content they needed. Then they used a satisfaction survey of the same students that covered familiarity and frequency of use for the subject indexes, discovery search, and Google Scholar. Ultimately, it came down that the students were satisfied and happy with the subject indexes, and too few respondents to get a sense of satisfaction with the discovery search or Google Scholar.

Conclusions: Students are unfamiliar with the discovery system, but it could support their research needs. However, we don’t know if they can find the things they are looking for in it (search skills), nor do we know if they will ultimately be happy with it.

NASIG 2012: Why the Internet is More Attractive Than the Library

Speaker: Dr. Lynn Silipigni Connaway, OCLC

Students, particularly undergraduates, find Google search results to make more sense than library database search results. In the past, these kinds of users had to work around our services, but now we need to make our resources fit their workflow.

Connaway has tried to compare 12 different user behavior studies in the UK and the US to draw some broad conclusions, and this has informed her talk today.

Convenience is number one, and it changes. Context and situation are very important, and we need to remember that when asking questions about our users. Sometimes they just want the answer, not instruction on how to do the research.

Most people power browse these days: scan small chunks of information, view first few pages, no real reading. They combine this with squirreling — short, basic searches and saving the content for later use.

Students prefer keyword searches. This is supported by looking at the kinds of terms used in the search. Experts use broad terms to cover all possible indexing, novices use specific terms. So why do we keep trying to get them to use the “advance” search in our resources?

Students are confident with information discovery tools. They mainly use their common sense for determining the credibility of a site. If a site appears to have put some time into the presentation, then they are more likely to believe it.

Students are frustrated with navigating library websites, the inconvenience of communicating with librarians face to face, and they tend to associate libraries only with books, not with other information. They don’t recognize that the library is who is providing them with access to online content like JSTOR and the things they find in Google Scholar.

Students and faculty often don’t realize they can ask a question of a librarian in person because we look “busy” staring at our screens at the desk.

Researchers don’t understand copyright, or what they have signed away. They tend to be self-taught in discovery, picking up the same patterns as their graduate professors. Sometimes they rely on the students to tell them about newer ways of finding information.

Researchers get frustrated with the lack of access to electronic backfiles of journals, discovering non-English content, and unavailable content in search results (dead links, access limitation). Humanities researchers feel like there is a lack of good, specialized search engines for them (mostly for science). They get frustrated when they go to the library because of poor usability (i.e. signs) and a lack of integration between resources.

Access is more important than discovery. They want a seamless transition from discovery to access, without a bunch of authentication barriers.

We should be improving our OPACs. Take a look at Trove and Westerville Public Library. We need to think more like startups.

tl;dr – everything you’ve heard or read about what our users really do and really need, but we still haven’t addressed in the tools and services we offer to them

musings on web-scale discovery systems

photo by Pascal

My library is often on the forefront of innovation, having the advantage of a healthy budget and staff size, yet small enough to be nimble. Frequently, when my colleagues return from conferences and give their reports, they’ll conclude with something along the lines of “we’re already doing most of the things they talked about.” At a recent conference report session, that was repeated again, with one exception: we have not implemented a web-scale discovery system.

I’m of two minds about web-scale discovery systems. In theory, they’re pretty awesome, allowing users to discover all of the content available to them from the library, regardless of the source or format. But in reality, they’re hamstrung by exclusive deals and coding limitations. The initial buzz was that they caused a dramatic increase in the use of library resources, but a few years in, and I’m hearing conflicting reports and grumblings.

We held off on buying a web-scale discovery system for two main reasons: one, we didn’t have the funding secured, and two, most of the reference librarians felt indifferent to outright dislike towards the systems out there at the time. We’re now in the process of reviewing and evaluating the current systems available, after many discussions about which problems we are hoping they will solve.

In the end, they really aren’t “Google for Libraries.” We think that our users want a single search box, but do they really? I heard an anecdote about how the library had spent a lot of time teaching users where to find their web-scale discovery system, making sure it was visible on the main library page, etc. After a professor assigned the same students to find a known article (gave them the full citation) using the web-scale discovery system (called it by name), the most frequent question the library got was, “How do I google the <name of web-scale discovery system>?”

I wonder if the ROI really is significant enough to implement and promote a web-scale discovery system? These systems are not cheap, and they take a bit of labor to maintain them. And, frankly, if the battle over exclusive content continues to be waged, it won’t be easy to pick the best one for our collection/users and know that it will stay that way for more than six months or a year.

Does your library have a web-scale discovery system? Is it everything you thought it would be? Would you pick the same one if you had to choose again?

my twitter infographic

my twitter infographicIt’s a mashup of two of my favorite things — data visualization and social media. Of course I’m going to make one.

The interesting thing is that for some reason I come across as a gamer according to the algorithms. Unless you count solitaire, sudoku, and Words with Friends, I’m not really a gamer at all. The PS2, games, and accessories I bought from my sister last November that is are sitting in a corner unassembled are also a testament to how little I game.

Anyway, click on the image to get the full-sized view, and if you make your own, be sure to share the link in the comments.

library day in the life round 6

I plan on using Twitter and Flickr to capture my week this time. CoverItLive will show the tweets below, and you can follow my Flickr feed or check the widget under CiL to see what I’ve posted there.

www.flickr.com

eclecticlibrarian's items tagged with libday6 More of eclecticlibrarian’s stuff tagged with libday6

Delicious is still tasty to me

I can’t help feeling disappointed in how quickly folks jumped ship and stayed on the raft even when it became clear that it was just a leaky faucet and not a hole in the hull.

I’ve been seeing many of my friends and peers jump ship and move their social/online bookmarks to other services (both free and paid) since the Yahoo leak about Delicious being in the sun-setting category of products. Given the volume of outcry over this, I was pretty confident that either Yahoo would change their minds or someone would buy Delicious or someone would replicate Delicious. So, I didn’t worry. I didn’t freak out. I haven’t even made a backup of my bookmarks, although I plan to do that soon just because it’s good to have backups of data.

Now the word is that Delicious will be sold, which is probably for the best. Yahoo certainly didn’t do much with it after they acquired it some years ago. But, honestly, I’m pretty happy with the features Delicious has now, so really don’t care that it hasn’t changed much. However, I do want it to go to someone who will take care of it and continue to provide it to users, whether it remains free or becomes a paid service.

I looked at the other bookmark services out there, and in particular those recommended by Lifehacker. Frankly, I was unimpressed. I’m not going to pay for a service that isn’t as good as Delicious, and I’m not going to use a bookmarking service that isn’t integrated into my browser. I didn’t have much use for Delicious until the Firefox extension, and now it’s so easy to bookmark and tag things on the fly that I use it quite frequently as a universal capture tool for websites and gift/diy ideas.

The technorati are a fickle bunch. I get that. But I can’t help feeling disappointed in how quickly they jumped ship and stayed on the raft even when it became clear that it was just a leaky faucet and not a hole in the hull.

LibFest: Telling your Story with Usage Statistics — Making data work

presenter: Jamene Brooks-Kieffer

She won’t be talking about complex tools or telling you to hire more staff. Rather, she’ll be looking at ways we can use what we have to do it better.

Right now, we have too much data from too many sources, and we don’t have enough time or staff to deal with it. And, nobody cares about it anyway. Instead of feeling blue about this, change your attitude.

Start by looking at smaller chunks. Look at all of the data types and sources, then choose one to focus on. Don’t stress about the rest. How to pick which one? Select data that has been consistently collected over time. If it’s focused on a specific activity, it’ll be easier to create a story about it. And finally, the data should be both interesting and accessible to you.

By selecting only one source of data, you have reduced the stress on time. You also need to acknowledge your limits in order to move forward. You can’t work miracles, but you can show enough impact to get others on board. Tie the data to your organizational goals. Analyze the data using the tools you already have (i.e. Excel), and then publicize the results of your work.

Why use Excel? It’s pretty universal, and there are free alternatives for spreadsheets if you need them. Three useful Excel tools: import & manipulate files of various formats (CSV files), consolidate similar information (total annual data from monthly worksheets), and conditional formatting (identify cost/use over thresholds).

The spreadsheets are for you, not the stakeholders. Stop relying on them to communicate your data. The trouble with spreadsheets is that although they contain a lot of data, it’s challenging for those unfamiliar with the sources to understand the meaning of the data. Sending a summary/story will get your message across faster and more clearly.

Data has context, settings, complexities, and conflicts. One of the best ways of communicating it is through a story. Give stakeholders the context to hang the numbers on and a way to remember why they are important. Write what you know, focus on the important things, and keep it brief and meaningful. Here is an example: Data Stories: A dirty job.

Data stories are everywhere. It’s not strictly for usage or financial data. If you have a specific question you want answered through data, it makes it easier to compose the story.

Convince yourself to act; your actions will persuade others.

presenter: Katy Silberger

She will be showing three scenarios for observing user behavior through statistics: looking at the past with vendor supplied statistics, assessing current user behavior with Google Analytics, and anticipating user behavior with Google Analytics.

They started looking at usage patterns before and after implementing federated searching. It was hard to answer the question of how federated searching changed user behavior. They used vendor usage reports and website visits to calculate the number of articles retrieved per website visit and articles retrieved per search. They found that the federated search tool generated an increase in article/use. The ratios take into account the fluctuation in user populations.

Google Analytics could be used to identify use from students abroad. It’s also helpful for identifying trends in mobile web access.