my presentation for Internet Librarian 2012

Apologies for the delay. It took longer than I expected to have the file and a stable internet connection at the same time. You’ll find the notes on the SlideShare page.

ejournal use by subject

A couple of weeks ago I blogged about an idea I had that involved combining subject data from SerialsSolutions with use data for our ejournals to get a broad picture of ejournal use by subject. It took a bit of tooling around with Access tables and queries, including making my first crosstab, but I’ve finally got the data put together in a useful way.

It’s not quite comprehensive, since it only covers ejournals for which SerialsSolutions has assigned a subject, which also have ISSNs, and are available through sources that provide COUNTER or similar use statistics. But, it’s better than nothing.

my twitter infographic

my twitter infographicIt’s a mashup of two of my favorite things — data visualization and social media. Of course I’m going to make one.

The interesting thing is that for some reason I come across as a gamer according to the algorithms. Unless you count solitaire, sudoku, and Words with Friends, I’m not really a gamer at all. The PS2, games, and accessories I bought from my sister last November that is are sitting in a corner unassembled are also a testament to how little I game.

Anyway, click on the image to get the full-sized view, and if you make your own, be sure to share the link in the comments.

IL 2010: Dashboards, Data, and Decisions

[I took notes on paper because my netbook power cord was in my checked bag that SFO briefly lost on the way here. This is an edited transfer to electronic.]

presenter: Joseph Baisano

Dashboards pull information together and make it visible in one place. They need to be simple, built on existing data, but expandable.

Baisano is at SUNY Stonybrook, and they opted to go with Microsoft SharePoint 2010 to create their dashboards. The content can be made visible and editable through user permissions. Right now, their data connections include their catalog, proxy server, JCR, ERMS, and web statistics, and they are looking into using the API to pull license information from their ERMS.

In the future, they hope to use APIs from sources that provide them (Google Analytics, their ERMS, etc.) to create mashups and more on-the-fly graphs. They’re also looking at an open source alternative to SharePoint called Pentaho, which already has many of the plugins they want and comes in free and paid support flavors.

presenter: Cindi Trainor

[Trainor had significant technical difficulties with her Mac and the projector, which resulted in only 10 minutes of a slightly muddled presentation, but she had some great ideas for visualizations to share, so here’s as much as I captured of them.]

Graphs often tell us what we already know, so look at it from a different angle to learn something new. Gapminder plots data in three dimensions – comparing two components of each set over time using bubble graphs. Excel can do bubble graphs as well, but with some limitations.

In her example, Trainor showed reference transactions along the x-axis, the gate count along the y-axis, and the size of the circle represented the number of circulation transactions. Each bubble represented a campus library and each graph was for the year’s totals. By doing this, she was able to suss out some interesting trends and quirks to investigate that were hidden in the traditional line graphs.

sleep & caffeine, part 2

My friend Brent suggested that measuring caffeine intake by the ounces of the beverages consumed isn’t a good calculation, and he’s right. So, I went back and used this chart to determine the milligrams of caffeine per ounce depending on the beverage consumed. Here’s how the totals break down:

I found it interesting that while I drank about 35 more ounces of diet soda than coffee, coffee was clearly the major source of my caffeine intake.

Comparing the new set of data regarding my caffeine intake with the hours of sleep during the same time period, the chart looks a little different:

I gave the hours of sleep a multiplier of 100 so that it would be easier to compare them visually. There are definitely some points where the hours of sleep decrease and the milligrams of caffeine increase.

sleep & caffeine

I’ve been meaning to share this here. Back in January & February, I started using Daytum to keep track of the hours of sleep and ounces of caffeine I consumed each day. I’m not sure how much being aware of the data gathering influenced my decision-making, but it felt about like normal, so this is probably a decent snapshot.

You can see more visualization options and analysis on the Daytum page, if you are so inclined.

CIL 2010: Library Engagement Through Open Data

Speakers: Oleg Kreymer & Dan Lipcan

Library data is meaningless in and of itself – you need to interpret it to give it meaning. Piotr Adamczyk did much of the work for the presentation, but was not able to attend today due to a schedule conflict.

They created the visual dashboard for many reasons, including a desire to expose the large quantities of data they have collected and stored, but in a way that is interesting and explanatory. It’s also a handy PR tool for promoting the library to benefactors, and to administrators who are often not aware of the details of where and how the library is being effective and the trends in the library. Finally, the data can be targeted to the general public in ways that catch their attention.

The dashboard should also address assessment goals within the library. Data visualization allows us to identify and act upon anomalies. Some visualizations are complex, and you should be sensitive to how you present it.

The ILS is a great source of circulation/collections data. Other statistics can come from the data collected by various library departments, often in spreadsheet format. Google Analytics can capture search terms in catalog searches as well as site traffic data. Download/search statistics from eresources vendors can be massaged and turned into data visualizations.

The free tools they used included IMA Dashboard (local software, Drupal Profile) and IBM Many Eyes and Google Charts (cloud software). The IMA Dashboard takes snapshots of data and publishes it. It’s more of a PR tool.

Many Eyes is a hosted collection of data sets with visualization options. One thing I like was that they used Google Analytics to gather the search terms used on the website and presented that as a word cloud. You could probably do the same with the titles of the pages in a page hit report.

Google Chart Tools are visualizations created by Google and others, and uses Google Spreadsheets to store and retrieve the data. The motion charts are great for showing data moving over time.

Lessons learned… Get administrative support. Identify your target audience(s). Identify the stories you want to tell. Be prepared for spending a lot of time manipulating the data (make sure it’s worth the time). Use a shared repository for the data documents. Pull from data your colleagues are already harvesting. Try, try, and try again.