NASIG 2010: Integrating Usage Statistics into Collection Development Decisions

Presenters: Dani Roach, University of St. Thomas and Linda Hulbert, University of St. Thomas

As with most libraries, they are faced with needing to downsize their purchases in order to fit within reduced budgets, so good tools must be employed to determine which stuff to remove or acquire.

The statistics for impact factor means little to librarians, since the “best” journals may not be appropriate for the programs the library supports. Quantitative data like cost per use, historical trends, and ILL data are more useful for libraries. Combine these with reviews, availability, features, user feedback, and the dust layer on the materials, and then you have some useful information for making decisions.

Usage statistics are just one component that we can use to analyze the value of resources. There are other variables than cost and other methods than cost per use, but these are what we most often apply.

Other variables can include funds/subjects, format, and identifiers like ISSN. Cost needs to be defined locally, as libraries manage them differently for annual subscriptions, multiple payments/funds, one-time archive fees, hosting fees, and single title databases or ebooks. Use is also tricky. A PDF download in a JR1 report is different from a session count in a DB1 report is different from a reshelve count for a bound journal. Local consistency with documentation is best practice for sorting this out.

Library-wide SharePoint service allows them to drop documents with subscription and analysis information into one location for liaisons to use. [We have a shared network folder that I do some of this with — I wonder if SharePoint would be better at managing all of the files?]

For print statistics, they track separately bound volume use versus new issue use, scanning barcodes into their ILS to keep a count. [I’m impressed that they have enough print journal use to do that rather than hash marks on a sheet of paper. We had 350 reshelved in last year, including ILL use, if I remember correctly.]

Once they have the data, they use what they call a “fairness factor” formula to normalize the various subject areas to determine if materials budgets are fairly allocated across all disciplines and programs. Applying this sort of thing now would likely shock budgets, so they decided to apply new money using the fairness factor, and gradually underfunded areas are being brought into balance without penalizing overfunded areas.

They have stopped trying to achieve a balance between books and periodicals. They’ve left that up to the liaisons to determine what is best for their disciplines and programs.

They don’t hide their cancellation list, and if any of the user community wants to keep something, they’ve been willing to retain it. However, they get few requests to retain content, and they think it is in part because the user community can see the cost, use, and other factors that indicate the value of the resource for the local community.

They have determined that it costs them around $52 a title to manage a print subscription, and over $200 a title to manage an online subscription, mainly because of the level of expertise involved. So, there really are no “free” subscriptions, and if you want to get into the cost of binding/reshelving, you need to factor in the managerial costs of electronic titles, as well.

Future trends and issues: more granularity, more integration of print and online usage, interoperability and migration options for data and systems, continued standards development, and continued development of tools and systems.

Anything worth doing is worth overdoing. You can gather Ulrich’s reports, Eigen factors, relative price indexes, and so much more, but at some point, you have to decide if the return is worth the investment of time and resources.

NASIG 2010: It’s Time to Join Forces: New Approaches and Models that Support Sustainable Scholarship

Presenters: David Fritsch, JSTOR and Rachel Lee, University of California Press

JSTOR has started working with several university press and other small scholarly publishers to develop sustainable options.

UC Press is one of the largest university press in the US (36 journals in the humanities, biological & social sciences), publishing both UC titles and society titles. Their prices range from $97-422 for annual subscriptions, and they are SHERPA Green. One of the challenges they face on their own platform is keeping up with libraries expectations.

ITHAKA is a merger of JSTOR, ITHAKA, Portico, and Alkula, so JSTOR is now a service rather than a separate company. Most everyone here knows what the JSTOR product/service is, and that hasn’t changed much with the merger.

Scholar’s use of information is moving online, and if it’s not online, they’ll use a different resource, even if it’s not as good. And, if things aren’t discoverable by Google, they are often overlooked. More complex content is emerging, including multimedia and user-generated content. Mergers and acquisitions in publishing are consolidating content under a few umbrellas, and this threatens smaller publishers and university presses that can’t keep up with the costs on a smaller scale.

The serials crisis has impacted smaller presses more than larger ones. Despite good relationships with societies, it is difficult to retain popular society publications when larger publishers can offer them more. It’s also harder to offer the deep discounts expected by libraries in consortial arrangements. University presses and small publishers are in danger of becoming the publisher of last resort.

UC Press and JSTOR have had a long relationship, with JSTOR providing long-term archiving that UC Press could not have afforded to maintain on their own. Not all of the titles are included (only 22), but they are the most popular. They’ve also participated in Portico. JSTOR is also partnering with 18 other publishers that are mission-driven rather than profit-driven, with experience at balancing the needs of academia and publishing.

By partnering with JSTOR for their new content, UC Press will be able to take advantage of the expanded digital platform, sales teams, customer service, and seamless access to both archive and current content. There are some risks, including the potential loss of identity, autonomy, and direct communication with libraries. And then there is the bureaucracy of working within a larger company.

The Current Scholarship Program seeks to provide a solution to the problems outlined above that university presses and small scholarly publishers are facing. The shared technology platform, Portico preservation, sustainable business model, and administrative services potentially free up these small publishers to focus on generating high-quality content and furthering their scholarly communication missions.

Libraries will be able to purchase current subscriptions either through their agents or JSTOR (who will not be charging a service fee). However, archive content will be purchased directly from JSTOR. JSTOR will handle all of the licensing, and current JSTOR subscribers will simply have a rider adding title to their existing licenses. For libraries that purchase JSTOR collections through consortia arrangements, it will be possible to add title by title subscriptions without going through the consortia if a consortia agreement doesn’t make sense for purchase decisions. They will be offering both single-title purchases and collections, with the latter being more useful for large libraries, consortia, and those who want current content for titles in their JSTOR collections.

They still don’t know what they will do about post-cancellation access. Big red flag here for potential early adopters, but hopefully this will be sorted out before the program really kicks in.

Benefits for libraries: reasonable pricing, more efficient discovery, single license, and meaningful COUNTER-compliant statistics for the full run of a title. Renewal subscriptions will maintain access to what they have already, and new subscriptions will come with access to the first online year provided by the publisher, which may not be volume one, but certainly as comprehensive as what most publishers offer now.

UC Press plans to start transitioning in January 2011. New orders, claims, etc. will be handled by JSTOR (including print subscriptions), but UC Press will be setting their own prices. Their platform, Caliber, will remain open until June 30, 2011, but after that content will be only on the JSTOR platform. UC Press expects to move to online-only in the next few years, particularly as the number of print subscriptions are dwindling to the point where it is cost-prohibitive to produce the print issues.

There is some interest from the publishers to add monographic content as well, but JSTOR isn’t ready to do that yet. They will need to develop some significant infrastructure in order to handle the order processing of monographs.

Some in the audience are concerned that the cost of developing platform enhancements and other tools, mostly that these costs will be passed on in subscription prices. They will be, to a certain extent, only in that the publishers will be contributing to the developments and they set the prices, but because it is a shared system, the costs will be spread out and likely impact libraries no more than they have already.

One big challenge all will face is unlearning the mindset that JSTOR is only archive content and not current content.

NASIG 2010: Linked Data and Libraries

Presenter: Eric Miller, Zepheira, LCC

Nowadays, we understand what the web is and the impact it has had on information sharing, but before it was developed, it was in a “vague but exciting” stage and few understood it. When we got started with the web, we really didn’t know what we were doing, but more importantly, the web was being developed so that it was flexible enough for smarter and more creative people to do amazing things.

“What did your website look like when you were in the fourth grade?” Kids are growing up with the web and it’s hard for them to comprehend life without it. [Dang, I’m old.]

This talk will be about linked data, its legacy, and how libraries can lead linked data. We have a huge opportunity to weave libraries into the fabric of libraries, and vice versa.

About five years ago, the BBC started making their content available in a service that allowed others to use and remix the delivery of the content in new ways. Rather than developing alternative platforms and creating new spaces, they focus on generating good content and letting someone else frame it. Other sources like NPR, the World Bank, and Data.gov are doing the same sorts of things. Within the library community, these things are happening, as well. OCLC’s APIs are getting easier to use, and several national libraries are putting their OPACs on the web with APIs.

Obama’s open government initiative is another one of those “vague but exciting” things, and it charged agencies to come up with their own methods of making their content available via the web. Agencies are now struggling with the same issues and desires that libraries have been tackling for years. We need to recognize our potential role in moving this forward.

Linked data is a best practice for sharing data, connecting data, and uses the semantic web. Rather than leaving the data in their current formats, let’s put them together in ways they can be used on the wider web. It’s not the databases that make the web possible, it’s the web that makes the databases usable.

Human computation can be put to use in ways that assist computers to make information more usable. Captcha systems are great for blocking automated programs when needed, and by using human computation to decipher scanned text that is undecipherable by computers, ReCaptcha has been able to turn unusable data into a fantastic digital repository of old documents.

LEGOs have been around for decades, and their simple design ensures that new blocks work with old blocks. Most kids end up dumping all of their sets into one bucket, so no matter where the individual building blocks come from, they can be put together and rebuild in any way you can imagine. We could do this with our blocks of data, if they are designed well enough to fit together universally.

Our current applications, for the most part, are not designed to allow for the portability of data. We need to rethink application design so that the data becomes more portable. Web applications have, by neccesity, had to have some amount of portability. Users are becoming more empowered to use the data provided to them in their own way, and if they don’t get that from your service/product, then they go elsewhere.

Digital preservation repositories are discussing ways to open up their data so that users can remix and mashup data to meet their needs. This requires new ways of archiving, cataloging, and supplying the content. Allow users to select the facets of the data that they are interested in. Provide options for visualizing the raw data in a systematic way.

Linked data platforms create identifiers for every aspect of the data they contain, and these are the primary keys that join data together. Other content that is created can be combined to enhance the data generated by agencies and libraries, but we don’t share the identifiers well enough to allow others to properly link their content.

Web architecture starts with web identifiers. We can use URLs to identify things other than just documents, but we need to be consistent and we can’t change the URL structures if we want it to be persistent. A lack of trust in identifiers is slowing down linked data. Libraries have the opportunity to leverage our trust and data to provide control points and best practices for identifier curation.

A lot of work is happening in W3C. Libraries should be more involved in the conversation.

Enable human computation by providing the necessary identifiers back to data. Empower your users to use your data, and build a community around it. Don’t worry about creating the best system — wrap and expose your data using the web as a platform.

history channel on foursquare

What kinds of information (historical or otherwise) can your library share with visitors/users through location-based networks like Foursquare and Gowalla?

The History Channel is on Foursquare, promoting their series America The Story of Us by putting tips on particular historical locations that include history trivia about the location. It got me to think about how libraries, or at least local collections, could do something similar with tidbits from their archives.

There have already been some examples of libraries inserting references to their digital collections on Wikipedia, and this is an opportunity to add some enhancement and self-promotion to a social media network that seems almost designed for this sort of thing. What kinds of information (historical or otherwise) can your library share with visitors/users through location-based networks like Foursquare and Gowalla?

ER&L 2010: Step Right Up! Planning, Pitfalls, and Performance of an E-Resources Fair

Speakers: Noelle Marie Egan & Nancy G. Eagan

This got started because they had some vendors come in to demonstrate their resources. Elsevier offered to do a demo for students with food. The library saw that several good resources were being under-used, so they decided to try to put together an eresources demo with Elsevier and others. It was also a good opportunity to get usability feedback about the new website.

They decided to have ten tables total, representing the whole fair. They polled the reference librarians to get suggestions for who to invite, and they ended up with resources that crossed most of the major disciplines at the school. The fair was held in a high-traffic location of the library (so that they could get walk-in participation) and publicized in the student paper, posted it in the blog, and the librarians shared it on Facebook with student and faculty friends.

They had a raffle to gather information about the participants, and in the end, they had 64 undergraduates, 19 graduates, 6 faculty, 5 staff, and 2 alumni attend the fair over the four hours. By having the users fill out the raffle information, they were able to interact with library staff in a different way that wasn’t just about them coming for information or help.

After the fair, they looked at the sessions and searches of the resources that were represented at the fair, and compared the monthly stats from the previous year. However, there is no way to determine whether the fair had a direct impact on increases (and the few decreases).

In and of itself, the event created publicity for the library. And, because it was free (minus staff time), they don’t really need to provide solid support for the success (or failure) of the event.

Some of the vendors didn’t take it seriously and showed up late. They thought that it was a waste of their time to talk about only the resources the library already purchases, rather than pushing new sales, and it’s doubtful those vendors will be invited back. It may be better to try to schedule it around the time of your state library conference, if that happens nearby, so the vendors may already be close and not making a special trip.

ER&L 2010: We’ve Got Issues! Discovering the right tool for the job

Speaker: Erin Thomas

The speaker is from a digital repository, so the workflow and needs may be different than your situation. Their collections are very old and spread out among several libraries, but are still highly relevant to current research. They have around 15 people who are involved in the process of maintaining the digital collection, and email got to be too inefficient to handle all of the problems.

The member libraries created the repository because they have content than needed to be shared. They started with the physical collections, and broke up the work of scanning among the holding libraries, attempting to eliminate duplications. Even so, they had some duplication, so they run de-duplication algorithms that check the citations. The Internet Archive is actually responsible for doing the scanning, once the library has determined if the quality of the original document is appropriate.

The low-cost model they are using does not produce preservation-level scans; they’re focusing on access. The user interface for a digital collection can be more difficult to browse than the physical collection, so libraries have to do more and different kinds of training and support.

This is great, but it caused more workflow problems than they expected. So, they looked at issue tracking problems. Their development staff already have access to Gemini, so they went with that.

The issues they receive can be assigned types and specific components for each problem. Some types already existed, and they were able to add more. The components were entirely customized. Tasks are tracked from beginning to end, and they can add notes, have multiple user responses, and look back at the history of related issues.

But, they needed a more flexible system that allowed them to drill-down to sub-issues, email v. no email, and a better user interface. There were many other options out there, so they did a needs assessment and an environmental scan. They developed a survey to ask the users (library staff) what they wanted, and hosted demos of options. And, in the end, Gemini was the best system available for what they needed.

ER&L 2010: Adventures at the Article Level

Speaker: Jamene Brooks-Kieffer

Article level, for those familiar with link resolvers, means the best link type to give to users. The article is the object of pursuit, and the library and the user collaborate on identifying it, locating it, and acquiring it.

In 1980, the only good article-level identification was the Medline ID. Users would need to go through a qualified Medline search to track down relevant articles, and the library would need the article level identifier to make a fast request from another library. Today, the user can search Medline on their own; use the OpenURL linking to get to the full text, print, or ILL request; and obtain the article from the source or ILL. Unlike in 1980, the user no longer needs to find the journal first to get to the article. Also, the librarian’s role is more in providing relevant metadata maintenance to give the user the tools to locate the articles themselves.

In thirty years, the library has moved from being a partner with the user in pursuit of the article to being the magician behind the curtain. Our magic is made possible by the technology we know but that our users do not know.

Unique identifiers solve the problem of making sure that you are retrieving the correct article. CrossRef can link to specific instances of items, but not necessarily the one the user has access to. The link resolver will use that DOI to find other instances of the article available to users of the library. Easy user authentication at the point of need is the final key to implementing article-level services.

One of the library’s biggest roles is facilitating access. It’s not as simple as setting up a link resolver – it must be maintained or the system will break down. Also, document delivery service provides an opportunity to generate goodwill between libraries and users. The next step is supporting the users preferred interface, through tools like LibX, Papers, Google Scholar link resolver integration, and mobile devices. The latter is the most difficult because much of the content is coming from outside service providers and the institutional support for developing applications or web interfaces.

We also need to consider how we deliver the articles users need. We need to evolve our acquisitions process. We need to be ready for article-level usage data, so we need to stop thinking about it as a single-institutional data problem. Aggregated data will help spot trends. Perhaps we could look at the ebook pay-as-you-use model for article-level acquisitions as well?

PIRUS & PIRUS 2 are projects to develop COUNTER-compliant article usage data for all article-hosting entities (both traditional publishers and institutional repositories). Projects like MESUR will inform these kinds of ventures.

Libraries need to be working on recommendation services. Amazon and Netflix are not flukes. Demand, adopt, and promote recommendation tools like bX or LibraryThing for Libraries.

Users are going beyond locating and acquiring the article to storing, discussing, and synthesizing the information. The library could facilitate that. We need something that lets the user connect with others, store articles, and review recommendations that the system provides. We have the technology (magic) to make it available right now: data storage, cloud applications, targeted recommendations, social networks, and pay-per-download.

How do we get there? Cover the basics of identify>locate>acquire. Demand tools that offer services beyond that, or sponsor the creation of desired tools and services. We also need to stay informed of relevant standards and recommendations.

Publishers will need to be a part of this conversation as well, of course. They need to develop models that allow us to retain access to purchased articles. If we are buying on the article level, what incentive is there to have a journal in the first place?

For tenure and promotion purposes, we need to start looking more at the impact factor of the article, not so much the journal-level impact. PLOS provides individual article metrics.

ER&L 2010: Patron-driven Selection of eBooks – three perspectives on an emerging model of acquisitions

Speaker: Lee Hisle

They have the standard patron-driven acquisitions (PDA) model through Coutts’ MyiLibrary service. What’s slightly different is that they are also working on a pilot program with a three college consortia with a shared collection of PDA titles. After the second use of a book, they are charged 1.2-1.6% of the list price of the book for a 4-SU, perpetual access license.

Issues with ebooks: fair use is replaced by the license terms and software restrictions; ownership has been replaced by licenses, so if Coutts/MyiLibrary were to go away, they would have to renegotiate with the publishers; there is a need for an archiving solution for ebooks much like Portico for ejournals; ILL is not feasible for permissible; potential for exclusive distribution deals; device limitations (computer screens v. ebook readers).

Speaker: Ellen Safley

Her library has been using EBL on Demand. They are only buying 2008-current content within specific subjects/LC classes (history and technology). They purchase on the second view. Because they only purchase a small subset of what they could, the number of records they load fluxuates, but isn’t overwhelming.

After a book has been browsed for more than 10 minutes, the play-per-view purchase is initiated. After eight months, they found that more people used the book at the pay-per-view level than at the purchase level (i.e. more than once).

They’re also a pilot for an Ebrary program. They had to deposit $25,000 for the 6 month pilot, then select from over 100,000 titles. They found that the sciences used the books heavily, but there were also indications that the humanities were popular as well.

The difficulty with this program is an overlap between selector print order requests and PDA purchases. It’s caused a slight modification of their acquisitions flow.

Speaker: Nancy Gibbs

Her library had a pilot with Ebrary. They were cautious about jumping into this, but because it was coming from their approval plan vendor, it was easier to match it up. They culled the title list of 50,000 titles down to 21,408, loaded the records, and enabled them in SFX. But, they did not advertise it at all. They gave no indication of the purchase of a book on the user end.

Within 14 days of starting the project, they had spent all $25,000 of the pilot money. Of the 347 titles purchased, 179 of the purchased titles were also owned in print, but those print only had 420 circulations. The most popularly printed book is also owned in print and has had only two circulations. The purchases leaned more towards STM, political science, and business/economics, with some humanities.

The library tech services were a bit overwhelmed by the number of records in the load. The MARC records lacked OCLC numbers, which they would need in the future. They did not remove the records after the trial ended because of other more pressing needs, but that caused frustration with the users and they do not recommend it.

They were surprised by how quickly they went through the money. If they had advertised, she thinks they may have spent the money even faster. The biggest challenge they had was culling through the list, so in the future running the list through the approval plan might save some time. They need better match routines for the title loads, because they ended up buying five books they already have in electronic format from other vendors.

Ebrary needs to refine circulation models to narrow down subject areas. YBP needs to refine some BISAC subjects, as well. Publishers need to communicate better about when books will be made available in electronic format as well as print. The library needs to revise their funding models to handle this sort of purchasing process.

They added the records to their holdings on OCLC so that they would appear in Google Scholar search results. So, even though they couldn’t loan the books through ILL, there is value in adding the holdings.

They attempted to make sure that the books in the list were not textbooks, but there could have been some, and professors might have used some of the books as supplementary course readings.

One area of concern is the potential of compromised accounts that may result in ebook pirates blowing through funds very quickly. One of the vendors in the room assured us they have safety valves for that in order to protect the publisher content. This has happened, and the vendor reset the download number to remove the fraudulent downloads from the library’s account.

ER&L 2010: ERMS Success – Harvard’s experience implementing and using an ERM system

Speaker: Abigail Bordeaux

Harvard has over 70 libraries and they are very decentralized. This implementation is for the central office that provides the library systems services for all of the libraries. Ex Libris is their primary vendor for library systems, include the ERMS, Verde. They try to go with vended products and only develop in-house solutions if nothing else is available.

Success was defined as migrating data from old system to new, to improve workflows with improved efficiency, more transparency for users, and working around any problems they encountered. They did not expect to have an ideal system – there were bugs with both the system and their local data. There is no magic bullet. They identified the high-priority areas and worked towards their goals.

Phase I involved a lot of project planning with clearly defined goals/tasks and assessment of the results. The team included the primary users of the system, the project manager (Bordeaux), and a programmer. A key part of planning includes scoping the project (Bordeaux provided a handout of the questions they considered in this process). They had a very detailed project plan using Microsoft Project, and at the very least, the listing out of the details made the interdependencies more clear.

The next stage of the project involved data review and clean-up. Bordeaux thinks that data clean-up is essential for any ERM implementation or migration. They also had to think about the ways the old ERM was used and if that is desirable for the new system.

The local system they created was very close to the DLF recommended fields, but even so, they still had several failed attempts to map the fields between the two systems. As a result, they had a cycle of extracting a small set of records, loading them into Verde, reviewing the data, and then delete the test records out of Verde. They did this several times with small data sets (10 or so), and when they were comfortable with that, they increased the number of records.

They also did a lot of manual data entry. They were able to transfer a lot, but they couldn’t do everything. And some bits of data were not migrated because of the work involved compared to the value of it. In some cases, though, they did want to keep the data, so they entered it manually. Part of what they did to visualize the mapping process, they created screenshots with notes that showed the field connections.

Prior to this project, they were not using Aleph to manage acquisitions. So, they created order records for the resources they wanted to track. The acquisitions workflow had to be reorganized from the ground up. Oddly enough, by having everything paid out of one system, the individual libraries have much more flexibility in spending and reporting. However, it took some public relations work to get the libraries to see the benefits.

As a result of looking at the data in this project, they got a better idea of gaps and other projects regarding their resources.

Phase two began this past fall to begin incorporating the data from the libraries that did not participate in phase one. They now have a small group with folks representing the libraries. This group is coming up with best practices for license agreements and entering data into the fields.

ER&L 2010: Electronic Access and Research Efficiencies – Some Preliminary Findings from the U of TN Library’s ROI Analysis

Speaker: Gayle Baker, University of Tennessee – Knoxville

Phase one: Demonstrate the role of the library information in generating research grand incomes for the institution (i.e. the university spends X amount of money on the library which generates X amount of money in grant research and support).

To do this, they sent out emails to faculty with surveys that included incentives to respond (quantitative and qualitative questions). They gathered university-supplied data about grant proposals and income, and included library budget information. They also interviewed administrators to get a better picture of the priorities of the university.

UIUC’s model: Faculty with grant proposals using the library times the percentage of award success rate times the average grant income, then multiplied that by the grants expended and divided by the total library budget. The end result was that the model showed $4.38 grant income for every dollar invested in the library.

Phase two: Started by testing UIUC’s methodology across eight institutions in eight countries. Speaker didn’t elaborate, but went on to describe the survey they used and examples of survey responses. Interesting, but hard to convey relevance in this format, particularly since it’s so dependent on individual institutions. (On the up side, she has amusing anecdotes.) They used the ROI formula suggested by Carol Tenopir, which is slightly different than described above.

Phase three: IMLS grant for the next three years, headed by Tenopir and Paula Kaufman, and ARL and Syracuse will also be involved. They are trying to put a dollar value on things that are hard to do, such as student retention and success.

css.php