NASIG 2011: Reporting on Collections

Speakers: Sandy Hurd, Tina Feick, & John Smith

Development begins with internal discussion, a business case, and a plan for how the data will be harvested. And discussion may need to include the vendors who house or supply the data, like your ILS or ERM.

Product development on the vendor side can be prompted by several things, including specific needs, competition, and items in an RFP. When customers ask for reports, they need to determine if it is a one-time thing, something that can be created by enhancing what they already have, or something they aren’t doing yet. There may be standards, but collaborative data is still custom development between two entities, every time.

Have you peeked under the rug? The report is only as good as the data you have. How much cleanup are you willing to do? How can your vendor help? Before creating reports, think about what you have to solve and what you wish you could solve, statistics you need, the time available to generate them, and whether or not you can do it yourself.

There are traditional reporting tools like spreadsheets, and increasingly there are specialized data storage and analysis tools. We are looking at trends, transactional data, and projections, and we need this information on demand and more frequently than in the past. And the data needs to be interoperable. (Dani Roach is quietly shouting, “CORE! CORE!”) Ideally, we would be able to load relevant data from our ERMS, acquisitions modules, and other systems.

One use of the data can be to see who is using what, so properly coded patron records are important. The data can also be essential for justifying the redistribution of resources. People may not like what they hear, but at least you have the data to back it up.

The spreadsheets are not the reports. They are the data.

data-crunching librarian

Officially, my title is Electronic Resources Librarian, but lately I’ve been spending more of my time and energy on gathering and crunching data about our eresources than on anything else. It’s starting to bleed over into the print world, as well. Since we don’t have someone dedicated to managing our print journals, I’ve taken on the responsibility of directing discussions about their future, as well as gathering and providing e-only options to the selectors.

I like this work, but I’ve also been feeling a bit like my role is evolving and changing in ways I’m not entirely cognizant of, and that worries me. I came into this job without clear direction and made it my own, and even though I have a department head now, I still often feel like I’m the driver. This has both positives and negatives, and lately I’ve been wishing I could have more outside direction, in part so I don’t feel so much like I’m doing things that may not have much value to the people for whom I am doing them.

However, on Monday, something clicked. A simple comment about using SAS to analyze the print book collection use over time set all sorts of things firing away in my head. About all I know with SAS is that it’s some sort of data analysis tool, but I realized that I had come up with several of my professional goals for the next year in that moment.

For one, I want to explore whether or not I can learn and use SAS (or SPSS) effectively to analyze our collections (not just print books, as in the example above). For another, I want to explore whether or not I can learn R to more effectively visualize the data I gather.

Maybe some day down the road my title won’t be Electronic Resources Librarian anymore. Maybe some day it will be Data-Crunching Librarian.

Sounds good to me.

dreaming about the future of data in libraries

I spent most of the past two months downloading, massaging, and uploading to our ERMS a wide variety of COUNTER and non-COUNTER statistics. At times it is mind-numbing work, but taken in small doses, it’s interesting stuff.

The reference librarians make most of the purchasing decisions and deliver instruction to students and faculty on the library’s resources, but in the end, it’s the research needs of the students and faculty that dictate what they use. Then, every year, I get to look at what little information we have about their research choices.

Sometimes I’ll look at a journal title and wonder who in the world would want to read anything from that, but as it turns out, quite a number of someones (or maybe just one highly literate researcher) have read it in the past year.

Depending on the journal focus, it may be easy to identify where we need to beef up our resources based on high use, but for the more general things, I wish we had more detail about the use. Maybe not article-level, but perhaps a tag cloud — or something in that vein — pulled together from keywords or index headings. There’s so much more data floating around out there that could assist in collection development that we don’t have access to.

And then I think about the time it takes me to gather the data we have, not to mention the time it takes to analyze it, and I’m secretly relieved that’s all there is.

But, maybe someday when our ERMS have CRM-like data analysis tools and I’m not doing it all manually using Excel spreadsheets… Maybe then I’ll be ready to delve deeper into what exactly our students and faculty are using to meet their research needs.

ER&L: Making Data Work

Speaker: Jamene Brooks-Kieffer

Spreadsheets are not usable information to most everyone else. It is not a communication tool. A textual summary or data story or info graphic conveys the information found in spreadsheets in ways they are easier to understand.

Every audience has diverse needs. Consider the scope appropriate for the story you need to tell.

Data stories are created with the tools you already have. You don’t need special funding or resources — use what you already have.

Example: In order to understand how the link resolver data is different from publisher data, she started a blog to explain it to internal users.

Speaker: John McDonald

Graphics can simplify the telling of complex stories. But make sure your graphic tells the right story.

Know your audience. Showing the drop in library funding compared to market trend to faculty will get them up in arms, but administrators see it as a correction to an out of control market and maybe we don’t need all those resources.

Collaborate with other people to improve your presentation. You might understand the data, but you are not your audience.

Speaker: Michael Levine-Clark

Guess what? You need to know your audience! And spreadsheets don’t tell the story to everyone.

Take the example of moving collections to storage. Faculty need reassurance that the things they browse will remain in the library. Some disciplines want more specifics about what is going and what is staying. Architects need space planning data and they don’t care about the reasons. Administrators need justification for retaining the materials, regardless of where they end up, and the cost of retrieving materials from storage. Board of Trustees need information about the value of paper collections and being a little vague about the specifics (talking about low use rather than no use).

IL 2010: Dashboards, Data, and Decisions

[I took notes on paper because my netbook power cord was in my checked bag that SFO briefly lost on the way here. This is an edited transfer to electronic.]

presenter: Joseph Baisano

Dashboards pull information together and make it visible in one place. They need to be simple, built on existing data, but expandable.

Baisano is at SUNY Stonybrook, and they opted to go with Microsoft SharePoint 2010 to create their dashboards. The content can be made visible and editable through user permissions. Right now, their data connections include their catalog, proxy server, JCR, ERMS, and web statistics, and they are looking into using the API to pull license information from their ERMS.

In the future, they hope to use APIs from sources that provide them (Google Analytics, their ERMS, etc.) to create mashups and more on-the-fly graphs. They’re also looking at an open source alternative to SharePoint called Pentaho, which already has many of the plugins they want and comes in free and paid support flavors.

presenter: Cindi Trainor

[Trainor had significant technical difficulties with her Mac and the projector, which resulted in only 10 minutes of a slightly muddled presentation, but she had some great ideas for visualizations to share, so here’s as much as I captured of them.]

Graphs often tell us what we already know, so look at it from a different angle to learn something new. Gapminder plots data in three dimensions – comparing two components of each set over time using bubble graphs. Excel can do bubble graphs as well, but with some limitations.

In her example, Trainor showed reference transactions along the x-axis, the gate count along the y-axis, and the size of the circle represented the number of circulation transactions. Each bubble represented a campus library and each graph was for the year’s totals. By doing this, she was able to suss out some interesting trends and quirks to investigate that were hidden in the traditional line graphs.

ER&L 2010: Usage Statistics for E-resources – is all that data meaningful?

Speaker: Sally R. Krash, vendor

Three options: do it yourself, gather and format to upload to a vendor’s collection database, or have the vendor gather the data and send a report (Harrassowitz e-Stats). Surprisingly, the second solution was actually more time-consuming than the first because the library’s data didn’t always match the vendor’s data. The third is the easiest because it’s coming from their subscription agent.

Evaluation: review cost data; set cut-off point ($50, $75, $100, ILL/DocDel costs, whatever); generate list of all resources that fall beyond that point; use that list to determine cancellations. For citation databases, they want to see upward trends in use, not necessarily cyclical spikes that average out year-to-year.

Future: Need more turnaway reports from publishers, specifically journal publishers. COUNTER JR5 will give more detail about article requests by year of publication. COUNTER JR1 & BR1 combined report – don’t care about format, just want download data. Need to have download information for full-text subscriptions, not just searches/sessions.

Speaker: Benjamin Heet, librarian

He is speaking about University of Notre Dame’s statistics philosophy. They collect JR1 full text downloads – they’re not into database statistics, mostly because fed search messes them up. Impact factor and Eigen factors are hard to evaluate. He asks, “can you make questionable numbers meaningful by adding even more questionable numbers?”

At first, he was downloading the spreadsheets monthly and making them available on the library website. He started looking for a better way, whether that was to pay someone else to build a tool or do it himself. He went with the DIY route because he wanted to make the numbers more meaningful.

Avoid junk in junk out: HTML vs. PDF downloads depends on the platform setup. Pay attention to outliers to watch for spikes that might indicate unusual use by an individual. The reports often have bad data or duplicate data on the same report.

CORAL Usage Statistics – local program gives them a central location to store user names & passwords. He downloads reports quarterly now, and the public interface allows other librarians to view the stats in readable reports.

Speaker: Justin Clarke, vendor

Harvesting reports takes a lot of time and requires some administrative costs. SUSHI is a vehicle for automating the transfer of statistics from one source to another. However, you still need to look at the data. Your subscription agent has a lot more data about the resources than just use, and can combine the two together to create a broader picture of the resource use.

Harrassowitz starts with acquisitions data and matches the use statistics to that. They also capture things like publisher changes and title changes. Cost per use is not as easy as simple division – packages confuse the matter.

High use could be the result of class assignments or hackers/hoarders. Low use might be for political purchases or new department support. You need a reference point of cost. Pricing from publishers seems to have no rhyme or reason, and your price is not necessarily the list price. Multi-year analysis and subject-based analysis look at local trends.

Rather than usage statistics, we need useful statistics.

IL2009: Mashups for Library Data

Speakers: Nicole Engard

Mashups are easy ways to provide better services for our patrons. They add value to our websites and catalogs. They promote our services in the places our patrons frequent. And, it’s a learning experience.

We need to ask our vendors for APIs. We’re putting data into our systems, so we should be able to get it out. Take that data and mash it up with popular web services using RSS feeds.

Yahoo Pipes allows you to pull in many sources of data and mix it up to create something new with a clean, flow chart like interface. Don’t give up after your first try. Jody Fagan wrote an article in Computers in Libraries that inspired Engard to go back and try again.

Reading Radar takes the NYT Bestseller lists and merges it with data from Amazon to display more than just sales information (ratings, summaries, etc.). You could do that, but instead of having users go buy the book, link it to your library catalog. The New York Times has opened up a tremendous amount of content via APIs.

Bike Tours in CA is a mashup of Google Maps and ride data. Trulia, Zillow, and HousingMaps use a variety of sources to map real estate information. This We Know pulls in all sorts of government data about a location. Find more mashups at ProgrammableWeb.

What mashups should libraries be doing? First off, if you have multiple branches, create a Google Maps mashup of library locations. Share images of your collection on Flickr and pull that into your website (see Access Ceramics), letting Flickr do the heavy lifting of resizing the images and pulling content out via machine tags. Delicious provides many options for creating dynamically updating lists with code snippets to embed them in your website.

OPAC mashups require APIs, preferably those that can generate JavaScript, and finally you’ll need a programmer if you can’t get the information out in a way you can easily use it. LexisNexis Academic, WorldCat, and LibraryThing all have APIs you can use.

Ideas from Librarians: Mashup travel data from circulation data and various travel sources to provide better patron services. Grab MARC location data to plot information on a map. Pull data about media collection and combine it with IMDB and other resources. Subject RSS feeds from all resources for current articles (could do that already with a collection of journals with RSS feeds and Yahoo Pipes).

Links and more at her book website.

NASIG 2009: Informing Licensing Stakeholders

Towards a More Effective Negotiation

Presenters: Lisa Sibert, Micheline Westfall, Selden Lamoreux, Clint Chamberlain (moderator), Vida Damijonaitis, and Brett Rubinstein

Licensing as a process has not been improving very much. Some publishers are willing to negotiate changes, but some are still resistant. It often takes months to a year to receive fully signed licenses from publishers, which can tie up access or institutional processes. Negotiation time is, of course, a factor, but it should not effect the time it takes for both parties to sign and distribute copies once the language is agreed upon. One panelist noted that larger publishers are often less willing to negotiate than smaller ones. Damijonaitis stated that licenses are touched at fourteen different points in the process on their end, which plays into the length of time.

Publishers are concerned with the way the content is being used and making sure that it is not abused (without consequences). Is it necessary to put copyright violation language in licenses or can it live on purchase orders? Springer has not had any copyright violations that needed to be enforced in the past five or six years. They work with their customers to solve any problems as they come up, and libraries have been quick to deal with the situation. On the library side, some legal departments are not willing to allow libraries to participate in SERU.

Deal breakers: not allowing walk-ins, adjunct faculty, interlibrary loan, governing law, and basic fair use provisions. Usage statistics and uptime guarantees are important and sometimes difficult to negotiate. LibLicense is useful for getting effective language that publishers have agreed to in the past.

It’s not the libraries who tend to be the abusers of license terms or copyright, it’s the users. Libraries are willing to work with publishers, but if the technology has grown to the point where it is too difficult for the library to police use, then some other approach is needed. When we work with publishers that don’t require licenses or use just purchase orders, there is less paperwork, but it also doesn’t indemnify the institution, which is critical in some cases.

Bob Boissy notes that no sales person gets any benefit from long negotiations. They want a sale. They want an invoice. Libraries are interested in getting the content as quickly as possible. I think we all are coming at this with the same desired outcome.