ER&L 2016: Access Denied!

“outliers” by Robert S. Donovan

Speakers: Julie Linden, Angela Sidman, and Sarah Tudesco, Yale University

Vendors often use the data from COUNTER turn away reports as marketing tools to convince a library to purchase new content.

How are users being turned away from the content? How are they finding it in the first place? Google/Google Scholar, PUBMED, and publisher platforms that don’t allow for limiting to your content only are generally the sources.

Look for patterns in the turnaway data. Does it match the patterns in your use data and the academic year? Corroborate with examples from access issue reports. This can lead to a purchase decision. Or not.

Look for outliers in the turnaway data. What could have caused this? Platform changes, site outages (particularly for content you do license but appears on the turnaway report), reported security breaches, etc. You can ask for more granular data from the vendor such as turnaways by day or week, as well as IP address. You can ask the vendor for contextual information such as platform changes/issues, and more pointedly, do they think the turnaways are coming from real users.

Combine the data from the turnaway reports with ILL requests. Do they match up? This might mean that those titles are really in demand. However, bear in mind that many users will just give up and look for something else that’s just as good but available right now.

Analysis checklist:
IF you see a steady pattern:

  • Check holdings for access to the content
  • Consider the access model (MU/SU)

IF you see outliers:

  • Consider outside events

ASK the vendor for more information

  • Can you provide more granular data?
  • Can you provide contextual information?
  • Do you think this represents real users?

Audience Q&A:

Journal turnaways can include archival years for current subscriptions that aren’t included.

One very aggressive vendor used the library’s purchase request form to flood them with requests from users that don’t exist.

How are the outliers documented? Hard to do. Vendors certainly hang on to them, even when they acknowledge they know this isn’t legit.

ER&L 2016: COUNTER Point: Making the Most of Imperfect Data Through Statistical Modeling

score card
“score card” by AIBakker

Speakers: Jeannie Castro and Lindsay Cronk, University of Houston

Baseball statistics are a good place to start. There is over 100 years of data. Cronk was wishing that she could figure the WAR for eresources. What makes a good/strong resource? What indicators besides usage performance should we evaluate? Can statistical analysis tell us anything?

Castro suggested looking at the data as a time series. Cronk is not a statistician, so she relied on a lot of other folks who can do that stuff.

Statistical modeling is the application of a set of assumptions to data, typically paired data. There are several techniques that can be used. COUNTER reports are imperfect time series data sets. They don’t give us individual data points (day/time). They are clumped together by month, but aside from this, they are good for time series. There is equal spacing and time of consistently measured data points.

Decomposition provides a framework for segmented time series. Old data can be checked by newer data (i.e. 2010-2013 compared to 2014) without having to predict the future. Statistical testing is important in this. Exponential smoothing eliminates noise/outlier, and is very useful for anomalies in your COUNTER data due to access issues or unusual spikes.

Cronk really wanted to look at something other than cost/use, which was part of the motivation to do this. Usage by collection portion size is another method touted by Michael Levine-Clark. She needed 4+ years usage history for reverse predictive analysis. Larger numbers make analysis easier, so she went with large aggregator databases for DB and some large journal packages for JR.

She used Excel for data collection and clean-up, R (studio) for data analysis, and Tableau (public) for data visualization. R studio is a lot more user-friendly than the desktop. There are canned analysis packages that will do the heavy lifting. (There was a recommendation forRyan Womack’s video series for learning how to use R.) Tableau helped with visualization of the data, including some predictive indicators. We cannot see trends ourselves, so these visualization can help us make decisions. Usage can be predicted based on the past, she found.

They found that usage over time is consistent across the vendor platforms (for journal usage), even though some were used more than others.

The next level she looked at was the search to session ratio for databases. What is the average? Is that meaningful? When we look at usage, what is the baseline that would help us determine if this database is more useful than another? Downward trends might be indicators of outside factors.

community site for usage statistics

Usus is an independent community website developed to help librarians, library consortium administrators, publishers, aggregators, etc. communicate around topics related to usage statistics. From problem-solving to workflow tips to calling out bad actors, this site hopes to be the hub of all things usage.

Do you have news to share or a problem you can’t figure out? Do you have really cool workflows you want to share? Drop us a note!

ER&L 2014 — Beyond COUNTER: The changing definition of “usage” in an open access economy

Speakers: Kathy Perry (VIVA), Melissa Blaney (American Chemical Society), and Nan Butkovitch (Pennsylvania State University)

In 1998, ICOLC created guidelines for delivering usage information, and they have endorsed COUNTER and SUSHI. COUNTER works because all the players are involved and agree to reasonable timeframes.

COUNTER Code of Practice 4 now recognizes media and tracking of use through mobile devices.

PIRUS (Publisher and Institutional Repository Usage Statistics) is the next step, but they are going to drop the term and incorporate it as an optional report in COUNTER (Article Report 1). There is a code of practice and guidelines on the website.

Usage Factor metric as a tool for assessing journals that aren’t covered by impact factor. It won’t be comparable across subject groups because they are measuring different things.

If your publishers are not COUNTER compliant, ask them to do it.

ACS chose to go to COUNTER 4 in part because it covers all formats. They like being able to highlight usage of gold open access titles and denials due to lack of license. They also appreciated the requirement for the ability to provide JR5, which reports usage by year of publication.

Big increases in search can also mean that people aren’t finding what they want.

ACS notes that users are increasingly coming from Google, Mendeley, and other indexing sources, rather than the publisher’s site itself.

They hear a lot that users want platforms that allow sharing and collaborating across disciplines and institutions. Authors are wanting to measure the impact of their work in traditional and new ways.

Science librarian suggests using citation reports to expand upon the assessment of usage reports. If you have time for that sort of thing and only care about journals that are covered by ISI.

Chemistry authors have been resistant to open access publishing, particularly if they think they can make money off of a patent, etc. She thinks it will be useful to have OA article usage information, but needs to be put in the context of how many OA articles there are available.

What you want to measure in usage can determine your sources. Every measurement method has bias. Multiple usage measurements can have duplication. A new metric is just around the corner.

ISBNs, COUNTER reports, and Excel

One of my pet peeves with Excel is how it will reformat (or try to reformat) data in cells based on what it thinks it should be. For example, if you save a file as csv with a date in the format of mmm-yy, the next time you open it the dates will become d-mmm, where the year was transposed to the day of the month. Drives me crazy, because the only way I’ve found to prevent it is to make sure that the dates are mmm-yyyy before I save the file again, which means a lot of repetitive editing when I’m normalizing a large number of reports.

I haven’t been working much with COUNTER book reports for assessment of ebook use until recently, when it seems we’ve started to tip towards increasing ebook use (and purchasing) at my library, so now I’m looking at them and ingesting them into my use assessment tool.

ISBN is too long
ISBN is too long

I tended to avoid the COUNTER book reports before because if I needed to edit the file, it was a hassle to get it to open in Excel without converting the ISBNs to display (and subsequently save) as 978E+12 if they didn’t contain dashes or something else to indicate to Excel that it should be treated as text and not a long integer. (Don’t get me started on the publishers who remove the dash in the ISSN, screwing up all the numbers that begin with 0 when opened in Excel.)

One way to deal with this is to select the column and choose Text to Columns in the Data tab. Click on through the menu until you get to the end where you can select the format as text. Miraculously, the full numbers display in the column (regardless of column size) and won’t save as 978E+12 if you hadn’t done that.

Alternatively, I’ve started opening the files in a text editor first, then finding and replacing all “978” with “978-“. This forces Excel to automatically treat the data as text instead of a long integer, and doesn’t need to be corrected every subsequent time the file is opened and edited.

How do you handle this?