ER&L 2010: Usage Statistics for E-resources – is all that data meaningful?

Speaker: Sally R. Krash, vendor

Three options: do it yourself, gather and format to upload to a vendor’s collection database, or have the vendor gather the data and send a report (Harrassowitz e-Stats). Surprisingly, the second solution was actually more time-consuming than the first because the library’s data didn’t always match the vendor’s data. The third is the easiest because it’s coming from their subscription agent.

Evaluation: review cost data; set cut-off point ($50, $75, $100, ILL/DocDel costs, whatever); generate list of all resources that fall beyond that point; use that list to determine cancellations. For citation databases, they want to see upward trends in use, not necessarily cyclical spikes that average out year-to-year.

Future: Need more turnaway reports from publishers, specifically journal publishers. COUNTER JR5 will give more detail about article requests by year of publication. COUNTER JR1 & BR1 combined report – don’t care about format, just want download data. Need to have download information for full-text subscriptions, not just searches/sessions.

Speaker: Benjamin Heet, librarian

He is speaking about University of Notre Dame’s statistics philosophy. They collect JR1 full text downloads – they’re not into database statistics, mostly because fed search messes them up. Impact factor and Eigen factors are hard to evaluate. He asks, “can you make questionable numbers meaningful by adding even more questionable numbers?”

At first, he was downloading the spreadsheets monthly and making them available on the library website. He started looking for a better way, whether that was to pay someone else to build a tool or do it himself. He went with the DIY route because he wanted to make the numbers more meaningful.

Avoid junk in junk out: HTML vs. PDF downloads depends on the platform setup. Pay attention to outliers to watch for spikes that might indicate unusual use by an individual. The reports often have bad data or duplicate data on the same report.

CORAL Usage Statistics – local program gives them a central location to store user names & passwords. He downloads reports quarterly now, and the public interface allows other librarians to view the stats in readable reports.

Speaker: Justin Clarke, vendor

Harvesting reports takes a lot of time and requires some administrative costs. SUSHI is a vehicle for automating the transfer of statistics from one source to another. However, you still need to look at the data. Your subscription agent has a lot more data about the resources than just use, and can combine the two together to create a broader picture of the resource use.

Harrassowitz starts with acquisitions data and matches the use statistics to that. They also capture things like publisher changes and title changes. Cost per use is not as easy as simple division – packages confuse the matter.

High use could be the result of class assignments or hackers/hoarders. Low use might be for political purchases or new department support. You need a reference point of cost. Pricing from publishers seems to have no rhyme or reason, and your price is not necessarily the list price. Multi-year analysis and subject-based analysis look at local trends.

Rather than usage statistics, we need useful statistics.

NASIG 2009: What Color Is Your Paratext?

Presenter: Geoffrey Bilder, CrossRef

The title is in reference to a book that is geared towards preparing for looking for a new job or changing careers, which is relevant to what the serials world is facing, both personnel and content. Paratext is added content that prepares the audience/reader for the meat of the document. We are very good at controlling and evaluating credibility, which is important with conveying information via paratext.

The internet is fraught with false information, which undermines credibility. The publisher’s value is being questioned because so much of their work can be done online at little or no cost, and what can’t be done cheaply is being questioned. Branding is increasingly being hidden by layers like Google which provide content without indicating the source. The librarian’s problem is similar to the publisher’s. Our value is being questioned when the digital world is capable of managing some of our work through distributed organizational structures.

“Internet Trust Anti-Pattern” — a system starts out as being a self-selected core of users with an understanding of trust, but as it grows, that can break down unless there is a structure or pervasive culture that maintains the trust and authority.

Local trust is that which is achieved through personal acquaintance and is sometimes transitive. Global trust extends through proxy, which transitively extends trust to “strangers.” Local is limited and hard to expand, and global increases systemic risk.

Horizontal trust occurs among equals with little possibility of coercion. Vertical trust occurs within a hierarchy, and coercion can be used to enforce behavior, which could lead to abuse.

Internet trust is in the local and horizontal quadrant. Scholarly trust falls in the vertical and global quadrant. It’s no wonder we’re having trouble figuring out how to do scholarship online!

Researchers have more to read and less time to read it, and it’s increasing rapidly. We need to remember that authors and readers are the same people. The amazing ways that technology has opened up communication is also causing the overload. We need something to help identify credible information.

Dorothea Salo wrote that for people who put a lot of credibility in authoritative information, we don’t do a very good job of identifying it. She blames librarians, but publishers have a responsibility, too. Heuristics are important in knowing who the intended audience is meant to be.

If you find a book at a bargain store, the implication is that it is going to be substantially less authoritative than a book from a grand, old library. (There are commercial entities selling leather bound books by the yard for buyers to use to add gravitas to their offices and personal libraries.) Scholarly journals are dull and magazines are flashy & bright. Books are traditionally organized with all sorts of content that tells academics whether or not they need to read them (table of contents, index, blurbs, preface, bibliography, etc.).

If you were to black out the text of a scholarly document, you would still be able to identify the parts displayed. You can’t do that very well with a webpage.

When we evaluate online content, we look at things like the structure of the URL and where it is linked from. In the print world, citations and footnotes were essential clues to following conversations between scholars. Linking can do that now, but the convention is still more formal. Logos can also tell us whether or not to put trust in content.

Back in the day, authors were linked to printers, but that lead to credibility problems, so publishers stepped in. Authors and readers could trust that the content was accurate and properly presented. Now it’s not just publishers — titles have become brands. A journal reputation is almost more important than who is publishing it.

How do we help people learn and understand the heuristics in identifying scholarly information? The processes for putting out credible information is partially hidden — the reader or librarian doesn’t know or see the steps involved. We used to not want to know, but now we do, particularly since it allows us to differentiate between the good players and the bad players.

The idea of the final version of a document needs to be buried. Even in the print world (with errata and addenda) we were deluding ourselves in thinking that any document was truly finished.

Why don’t we have a peer reviewed logo? Why don’t we have something that assures the reader that the document is credible? Peer review isn’t necessarily perfect or the only way.

How about a Version of Record record? Show us what was done to a document to get it to where it is now. For example, look at Creative Commons. They have a logo that indicates something about the process of creating the document which leads to machine-readable coding. How about a CrossMark that indicates what a publisher has done with a document, much like what a CC logo will lead to?

Knowmore.org created a Firefox plugin to monitor content and provides icons that flags companies and websites for different reasons. Oncode is a way of identifying organizations that have signed a code of conduct. We could do this for scholarly content.

Tim Berners Lee is actively advocating for ways to overlay trust measures on the internet. It was originally designed by academics who didn’t need it, but like the internet anti-trust pattern, the “unwashed masses” have corrupted that trust.

What can librarians and publishers do to recreate the heuristics that have been effective in print? We are still making facsimiles of print in electronic format. How are we going to create the tools that will help people evaluate digital information?

NASIG 2009: Managing Electronic Resource Statistics

Presenter: Nancy Beals

We have the tools and the data, now we need to use them to the best advantage. Statistics, along with other data, can create a picture of how our online resources are being used.

Traditionally, we have gathered stats by counting when re-shelving, ILL, gate counts, circulation, etc. Do these things really tell us anything? Stats from eresources can tell us much more, in conjunction with information about the paths we create to them.

Even with standards, we can run into issues with collecting data. Data can be “unclean” or incorrectly reported (or late). And, not all publishers are using the standards (i.e. COUNTER).

After looking at existing performance indicators, applying them to electronic resources, then we can look at trends with our electronic resources. This can help us with determining the return on investment in these resources.

Keep a master list of stats in order to plan out how and when to gather them. Keep the data in a shared location. Be prepared to supply data in a timely fashion for collection development decision-making.

When you are comparing resources, it’s up to individual institutions to determine what is considered low or high use. Look at how the resources stack up within the over-all collection.

When assessing the value of a resource, Beals and her colleagues are looking at 2-3 years of use data, 10% cost inflation, and the cost of ILL. In addition, they make use of overlap analysis tools to determine where they have multiple formats or sources that could be eliminated based on which platforms are being used.

Providing readily accessible data in a user-friendly format empowers selectors to do analysis and make decisions.

gathering statistics

For the past couple of weeks, the majority of my work day has been spent on tracking down and massaging usage statistics reports from the publishers of the online products we purchase. I am nearly half-way through the list, and I have a few observations based on this experience:

1. There are more publishers not following the COUNTER code of practice than those who are. Publishers in traditionally library-dominated (and in particular, academic library-dominated) markets are more likely to provide COUNTER-compliant statistics, but that is not a guarantee.

2. Some publishers provide usage statistics, and even COUNTER-compliant usage statistics, but only for the past twelve months or some other short period of time. This would be acceptable only if a library had been saving the reports locally. Otherwise, a twelve month period is not long enough to use the data to make informed decisions.

3. We are not trying to use these statistics to find out which resources to cancel. On the contrary, if I can find data that shows an increase in use over time, then my boss can use it to justify our annual budget request and maybe even ask for more money.

Update: It seems that the conversation regarding my observations is happening over on FriendFeed. Please feel free to join in there or leave your thoughts here.