usage statistics – eclectic librarian

giving SUSHI another try

(It's just) Kate's sushi! photo by Cindi Blyberg — photo by Cindi Blyberg

I’m going to give SUSHI another try this year. I had set it up for some of our stuff a few years back with mix results, so I removed it and have been continuing to manually retrieve and load reports into our consolidation tool. I’m still doing that for the 2017 reports, because the SUSHI harvesting tool I have won’t let me go back and pull from before, only monthly moving forward now.

I’ve spent a lot of time making sure titles in reports matched up with our ERMS so that consolidation would work (it’s matching on title, ugh), and despite my efforts, any reports generated still need cleanup. What is the value of my effort there? Not much anymore. Especially since ingesting cost data for journals/books is not a simple process to maintain, either. So, if all that matters less to none, might as well take whatever junk is passed along in the SUSHI feed as well and save myself some time for other work in 2019.

Charleston 2016: COUNTER Release 5 — Consistency, Clarity, Simplification and Continuous Maintenance

Speakers: Lorraine Estelle (Project COUNTER), Anne Osterman (VIVA – The Virtual Library of Virginia), Oliver Pesch (EBSCO Information Services)

COUNTER has had very minimal updates over the years, and it wasn’t until release 4 that things really exploded with report types and additional useful data. Release 5 attempts to reduce complexity so that all publishers and content providers are able to achieve compliance.

They are seeking consistency in the report layout, between formats, and in vocabulary. Clarity in metric types and qualifying action, processing rules, and formatting expectations.

The standard reports will be fewer, but more flexible. The expanded reports will introduce more data, but with flexibility.

A transaction will have different attributes recorded depending on the item type. They are also trying to get at intent — items investigated (abstract) vs. items requested (full-text). Searches will now distinguish between whether it was on a selected platform, a federated search, a discovery service search, or a search across a single vendor platform. Unfortunately, the latter data point will only be reported on the platform report, and still does not address teasing that out at the database level.

The access type attribute will indicate when the usage is on various Open Access or free content as well as licensed content. There will be a year of publication (YOP) attribution, which was not in any of the book reports and only included in Journal Report 5.

Consistent, standard header for each report, with additional details about the data. Consistent columns for each report. There will be multiple rows per title to cover all the combinations, making it more machine-friendly, but you can create filters in Excel to make it more human-friendly.

They expect to have release 5 published by July 2017 with compliance required by January 2019.

Q&A
Q: Will there eventually be a way to account for anomalies in data (abuse of access, etc.)?
A: They are looking at how to address use triggered by robot activity. Need to also be sensitive of privacy issues.

Q: Current book reports do not include zero use entitlements. Will that change?
A: Encouraged to provide KBART reports to get around that. The challenge is that DDA/PDA collections are huge and cumbersome to deliver reports. Will also be dropping the zero use reporting on journals, too.

Q: Using DOI as a unique identifier, but not consistently provided in reports. Any advocacy to include unique identifiers?
A: There is an initiative associated with KBART to make sure that data is shared so that knowledgbases are updated so that users find the content so that there are fewer zero use titles. Publisher have motivation to do this.

Q: How do you distinguish between unique uses?
A: Session based data. Assign a session ID to activity. If no session tracking, a combination of IP address and user agent. The user agent is helpful when multiple users are coming through one IP via the proxy server.

Slides

community site for usage statistics

Usus is an independent community website developed to help librarians, library consortium administrators, publishers, aggregators, etc. communicate around topics related to usage statistics. From problem-solving to workflow tips to calling out bad actors, this site hopes to be the hub of all things usage.

Do you have news to share or a problem you can’t figure out? Do you have really cool workflows you want to share? Drop us a note!

mapping ejournal use to subject areas

I had a thought last night as I was trying to fall asleep: what if I took our data on demand file that includes subjects and mashed it up with our consolidated JR1 use statistics? Could I get a better picture of the disciplines at my institution that are using ejournals? It’s definitely something worth looking at.

LibFest: Telling your Story with Usage Statistics — Making data work

presenter: Jamene Brooks-Kieffer

She won’t be talking about complex tools or telling you to hire more staff. Rather, she’ll be looking at ways we can use what we have to do it better.

Right now, we have too much data from too many sources, and we don’t have enough time or staff to deal with it. And, nobody cares about it anyway. Instead of feeling blue about this, change your attitude.

Start by looking at smaller chunks. Look at all of the data types and sources, then choose one to focus on. Don’t stress about the rest. How to pick which one? Select data that has been consistently collected over time. If it’s focused on a specific activity, it’ll be easier to create a story about it. And finally, the data should be both interesting and accessible to you.

By selecting only one source of data, you have reduced the stress on time. You also need to acknowledge your limits in order to move forward. You can’t work miracles, but you can show enough impact to get others on board. Tie the data to your organizational goals. Analyze the data using the tools you already have (i.e. Excel), and then publicize the results of your work.

Why use Excel? It’s pretty universal, and there are free alternatives for spreadsheets if you need them. Three useful Excel tools: import & manipulate files of various formats (CSV files), consolidate similar information (total annual data from monthly worksheets), and conditional formatting (identify cost/use over thresholds).

The spreadsheets are for you, not the stakeholders. Stop relying on them to communicate your data. The trouble with spreadsheets is that although they contain a lot of data, it’s challenging for those unfamiliar with the sources to understand the meaning of the data. Sending a summary/story will get your message across faster and more clearly.

Data has context, settings, complexities, and conflicts. One of the best ways of communicating it is through a story. Give stakeholders the context to hang the numbers on and a way to remember why they are important. Write what you know, focus on the important things, and keep it brief and meaningful. Here is an example: Data Stories: A dirty job.

Data stories are everywhere. It’s not strictly for usage or financial data. If you have a specific question you want answered through data, it makes it easier to compose the story.

Convince yourself to act; your actions will persuade others.

presenter: Katy Silberger

She will be showing three scenarios for observing user behavior through statistics: looking at the past with vendor supplied statistics, assessing current user behavior with Google Analytics, and anticipating user behavior with Google Analytics.

They started looking at usage patterns before and after implementing federated searching. It was hard to answer the question of how federated searching changed user behavior. They used vendor usage reports and website visits to calculate the number of articles retrieved per website visit and articles retrieved per search. They found that the federated search tool generated an increase in article/use. The ratios take into account the fluctuation in user populations.

Google Analytics could be used to identify use from students abroad. It’s also helpful for identifying trends in mobile web access.

NASIG 2010: What Counts? Assessing the Value of Non-Text Resources

Presenters: Stephanie Krueger, ARTstor and Tammy S. Sugarman, Georgia State University

Anyone who does anything with use statistics or assessment knows why use statistics are important and the value of standards like COUNTER. But, how do we count the use of non-text content that doesn’t fit in the categories of download, search, session, etc.? What does it mean to “use” these resources?

Of the libraries surveyed that collect use stats for non-text resources, they mainly use them to report to administrators and determine renewals. A few use it to evaluate the success of training or promote the resource to the user community. More than a third of the respondents indicated that the stats they have do not adequately meet the needs they have for the data.

ARTstor approached COUNTER and asked that the technical advisory group include representatives from vendors that provide non-text content such as images, video, etc. Currently, the COUNTER reports are either about Journals or Databases, and do not consider primary source materials. One might think that “search” and “sessions” would be easy to track, but there are complexities that are not apparent.

Consider the Database 1 report. With a primary source aggregator like ARTstor, who is the “publisher” of the content? For ARTstor, search is only 27% of the use of the resource. 47% comes from image requests (includes thumbnail, full-size, printing, download, etc.) and the rest is from software utilities within the resource (creation of course folders, passwords creation, organizing folders, annotations of images, emailing content/URLs, sending information to bibliographic management tools, etc.).

The missing metric is the non-text full content unit request (i.e. view, download, print, email, stream, etc.). There needs to be some way of measuring this that is equivalent to the full-text download of a journal article. Otherwise, cost per use analysis is skewed.

What is the equivalent of the ISSN? Non-text resources don’t even have DOIs assigned to them.

On top of all of that, how do you measure the use of these resources beyond the measurable environment? For example, once an image is downloaded, it can be included in slides and webpages for classroom use more than once, but those uses are not counted. ARTstor doesn’t use DRM, so they can’t track that way.

No one is really talking about how to assess this kind of usage, at least not in the professional library literature. However, the IT community is thinking about this as well, so we may be able to find some ideas/solutions there. They are being asked to justify software usage, and they have the same lack of data and limitations. So, instead of going with the traditional journal/database counting methods, they are attempting to measure the value of the services provided by the software. The IT folk identify services, determine the cost of those services, and identify benchmarks for those costs.

A potential report could have the following columns: collection (i.e. an art collection within ARTstor, or a university collection developed locally), content provider, platform, and then the use numbers. This is basic, and can increase in granularity over time.

There are still challenges, even with this report. Time-based objects need to have a defined value of use. Resources like data sets and software-like things are hard to define as well (i.e. SciFinder Scholar). And, it will be difficult to define a report that is one size fits all.

NASIG 2010: Integrating Usage Statistics into Collection Development Decisions

Presenters: Dani Roach, University of St. Thomas and Linda Hulbert, University of St. Thomas

As with most libraries, they are faced with needing to downsize their purchases in order to fit within reduced budgets, so good tools must be employed to determine which stuff to remove or acquire.

The statistics for impact factor means little to librarians, since the “best” journals may not be appropriate for the programs the library supports. Quantitative data like cost per use, historical trends, and ILL data are more useful for libraries. Combine these with reviews, availability, features, user feedback, and the dust layer on the materials, and then you have some useful information for making decisions.

Usage statistics are just one component that we can use to analyze the value of resources. There are other variables than cost and other methods than cost per use, but these are what we most often apply.

Other variables can include funds/subjects, format, and identifiers like ISSN. Cost needs to be defined locally, as libraries manage them differently for annual subscriptions, multiple payments/funds, one-time archive fees, hosting fees, and single title databases or ebooks. Use is also tricky. A PDF download in a JR1 report is different from a session count in a DB1 report is different from a reshelve count for a bound journal. Local consistency with documentation is best practice for sorting this out.

Library-wide SharePoint service allows them to drop documents with subscription and analysis information into one location for liaisons to use. [We have a shared network folder that I do some of this with — I wonder if SharePoint would be better at managing all of the files?]

For print statistics, they track separately bound volume use versus new issue use, scanning barcodes into their ILS to keep a count. [I’m impressed that they have enough print journal use to do that rather than hash marks on a sheet of paper. We had 350 reshelved in last year, including ILL use, if I remember correctly.]

Once they have the data, they use what they call a “fairness factor” formula to normalize the various subject areas to determine if materials budgets are fairly allocated across all disciplines and programs. Applying this sort of thing now would likely shock budgets, so they decided to apply new money using the fairness factor, and gradually underfunded areas are being brought into balance without penalizing overfunded areas.

They have stopped trying to achieve a balance between books and periodicals. They’ve left that up to the liaisons to determine what is best for their disciplines and programs.

They don’t hide their cancellation list, and if any of the user community wants to keep something, they’ve been willing to retain it. However, they get few requests to retain content, and they think it is in part because the user community can see the cost, use, and other factors that indicate the value of the resource for the local community.

They have determined that it costs them around $52 a title to manage a print subscription, and over $200 a title to manage an online subscription, mainly because of the level of expertise involved. So, there really are no “free” subscriptions, and if you want to get into the cost of binding/reshelving, you need to factor in the managerial costs of electronic titles, as well.

Future trends and issues: more granularity, more integration of print and online usage, interoperability and migration options for data and systems, continued standards development, and continued development of tools and systems.

Anything worth doing is worth overdoing. You can gather Ulrich’s reports, Eigen factors, relative price indexes, and so much more, but at some point, you have to decide if the return is worth the investment of time and resources.

ER&L 2010: Comparison Complexities – the challenges of automating cost-per-use data management

Speakers: Jesse Koennecke & Bill Kara

We have the use reports, but it’s harder to pull in the acquisitions information because of the systems it lives in and the different subscription/purchase models. Cornell had a cut in staffing and an immediate need to assess their resources, so they began to triage statistics cost/use requests. They are not doing systematic or comprehensive reviews of all usage and cost per use.

In the past, they have tried doing manual preparation of reports (merging files, adding data), but that’s time-consuming. They’ve had to set up processes to consistently record data from year to year. Some vendor solutions have been partially successful, and they are looking to emerging options as well. Non-publisher data such as link resolver use data and proxy logs might be sufficient for some resources, or for adding a layer to the COUNTER information to possibly explain some use. All of this has required certain skill sets (databases, spreadsheets, etc.)

Currently, they are working on managing expectations. They need to define the product that their users (selectors, administrators) can expect on a regular basis, what they can handle on request, and what might need a cost/benefit decision. In order to get accurate time estimates for the work, they looked at 17 of their larger publisher-based accounts (not aggregated collections) to get an idea of patterns and unique issues. As an unfortunate side effect, every time they look at something, they get an idea of even more projects they will need to do.

The matrix they use includes: paid titles v. total titles, differences among publishers/accounts, license period, cancellations/swaps allowed, frontfile/backfile, payment data location (package, title, membership), and use data location and standard. Some of the challenges with usage data include non-COUNTER compliance or no data at all, multiple platforms for the same title, combined subscriptions and/or title changes, titles transferred between publishers, and subscribed content v. purchased content. Cost data depends on the nature of the account and the nature of the package.

For packages, you can divide the single line item by the total use, but that doesn’t help the selectors assess the individual subset of titles relevant to their areas/budgets. This gets more complicated when you have packages and individual titles from a single publisher.

Future possibilities: better automated matching of cost and use data, with some useful data elements such as multiple cost or price points, and formulas for various subscription models. They would also like to consolidate accounts within a single publisher to reduce confusion. Also, they need more documentation so that it’s not just in the minds of long-term staff.

ER&L 2010: Beyond Log-ons and Downloads – meaningful measures of e-resource use

Speaker: Rachel A. Flemming-May

What is “use”? Is it an event? Something that can be measured (with numbers)? Why does it matter?

We spend a lot of money on these resources, and use is frequently treated as an objective for evaluating the value of the resource. But, we don’t really understand what use is.

A primitive concept is something that can’t be boiled down to anything smaller – we just know what it is. Use is frequently treated like a primitive concept – we know it when we see it. To measure use we focus on inputs and outputs, but what do those really say about the nature/value of the library?

This gets more complicated with electronic resources that can be accessed remotely. Patrons often don’t understand that they are using library resources when they use them. “I don’t use the library anymore, I get most of what I need from JSTOR.” D’oh.

Funds are based on assessments and outcomes – how do we show that? The money we spend on electronic resources is not going to get any smaller. ROI is focused more on funded research, but not electronic resources as a whole.

Use is not a primitive concept. When we talk about use, it can be an abstract concept that covers all use of library resources (physical and virtual). Our research often doesn’t specify what we are measuring as use.

Use as a process is the total experience of using the library, from asking reference questions to finding a quiet place to work to accessing resources from home. It is the application of library resources/materials to complete a complex/multi-stage process. We can do observational studies of the physical space, but it’s hard to do them for virtual resources.

Most of our research tends to focus on use as a transaction – things that can be recorded and quantified, but are removed from the user. When we look only at the transaction data, we don’t know anything about why the user viewed/downloaded/searched the resource. Because they are easy to quantify, we over-rely on vendor-supplied usage statistics. We think that COUNTER assures some consistency in measures, but there are still many grey areas (i.e. database time-outs equal more sessions).

We need to shift from focusing on isolated instances of downloads and ref desk questions, but focus on the aggregate of the process from the user perspective. Stats are only one component of this. This is where public services and technical services need to work together to gain a better understanding of the whole. This will require administrative support.

John Law’s study of undergraduate use of resources is a good example of how we need to approach this. Flemming-May thinks that the findings from that study have generated more progress than previous studies that were focused on more specific aspects of use.

How do we do all of this without invading on the privacy of the user? Make sure that your studies are thought-out and pass approval from your institution’s review board.

Transactional data needs to be combined with other information to make it valuable. We can see that a resource is being used or not used, but we need to look deeper to see why and what that means.

As a profession, are we prepared to do the kind of analysis we need to do? Some places are using anthropologists for this. A few LIS programs are requiring a research methods course, but it’s only one class and many don’t get it. This is a great continuing education opportunity for LIS programs.

ER&L 2010: We’ve Got Data – Now What Do We Do With It? Applying Standards to Assess Information Resources

Speakers: Mary Feeney, Ping Situ, and Jim Martin

They had a budget cut (surprise surprise), so they had to asses what to cut using the data they had. Complicating this was a change in organizational structure. In addition, they adopted the BYU project management model. Also, they had to sort out a common approach to assessment across all of the disciplines/resources.

They used their ILLs to gather stats about print resource use. They hired Scholarly Stats to gather their online resource stats, and for publishers/vendors not in Scholarly Stats, they gathered data directly from the vendors/publishers. Their process involved creating spreadsheets of resources by type, and then divided up the work of filling in the info. Potential cancellations were then provided to interested parties for feedback.

Quality standards:

60% of monographs need to show at least one use in the last four years – this was used to apply cuts to the firm orders book budget, which impacts the flexibility for making one-time purchases with remaining funds and the book money was shifted to serial/subscription lines
95% of individual journal titles need to show use in the last three years (both in-house and full-text downloads) – LJUR data was used to add to the data collected about print titles
dual format subscriptions required a hybrid approach, and they compared the costs with the online-only model – one might think that switching to online only would be a no-brainer, but licensing issues complicate the matter
cost per use of ejournal packages will not exceed twice the cost of ILL articles

One problem with their approach was with the existing procedures that resulted in not capturing data about all print journals. They also need to include local document delivery requests in future analysis. They need to better integrate the assessment of the use of materials in aggregator databases, particularly since users are inherently lazy and will go the easiest route to the content.

Aggregator databases are difficult to compare, and often the ISSN lists are incomplete. And, it’s difficult to compare based on title by title holdings coverage. It’s useful for long-term use comparison, but not this immediate project. Other problems with aggregator databases include duplication, embargos, and completeness of coverage of a title. They used SerSol’s overlap analysis tool to get an idea of duplication. It’s a time-consuming project, so they don’t plan to continue with it for all of their resources.

What if you don’t have any data or the data you have doesn’t have a quality standard? They relied on subject specialists and other members of the campus to assess the value of those resources.

Samantha Brennan on I’ve been published!November 30, 2020
What a fascinating sport. We'd love to have you back anytime! Welcome!
FY19 conferences, an update – eclectic librarian on FY19 conferencesJanuary 4, 2019
[…] was very excited to finally have approval to attend the Timberline Acquisitions Institute this year, but turns out […]
quantified self, an addendum – eclectic librarian on the quantified selfMarch 27, 2018
[…] I shared a list of apps and tools I’m using to monitor and track things, mainly health-related. Well, my…