NASIG presentations

I’ve updated my summaries of the NASIG 2012 presentations with the slides, if the presenters have made them available to be posted in SlideShare. In past years, we’ve limited access to this kind of material to members only, and I’m happy that we’re now able to share more of the awesome content that happens at the conference every year.

I’ll keep adding more as they become available, and there are many other presentations I wasn’t able to attend who have posted their slides already.

NASIG 2012: Practical Applications of Do-it-Yourself Citation Analysis

Speaker: Steve Black, College of Saint Rose

Citation analysis is the study of patterns and frequencies of citations. You might want to do this because it is an objective and quantitative way of looking at a journal’s impact (i.e. how often it is cited). It can also be used to determine the impact of authors, institutions, nations, and the “best journals for ____.”

There are many different critiques of relying on impact factors, and the literature on this is vast. There could be problems in the citations themselves and how they make it into the databases. Some citations become famous for being famous. There are many ways of gaming the system. But the main one is the conceptual difference between impact, quality, and importance. And finally, global versus local impact. Because of all this, it is important to remember that impact factor is one of many considerations for collection development, tenure & promotion, and paper submissions.

Doing it yourself allows you to taylor it to specific needs not covered elsewhere. It can be quick & dirty, exhaustive, or something in between. And, of course, does not incur the same kind of costs as subscriptions to Scopus or Web of Knowledge.

First, select a target population (journals in a sub-discipline, researchers at your institution, or a single faculty member). Then select a sample that represents the target population. Compile the works and sort/count the works cited.

Google Scholar is good for identifying articles on a topic, but not so much on authority control and streamlining citation formats. Zotero is a great citation management tool, but it doesn’t work so well for citation analysis because of how complicated it is to extract data into Excel.

Black did this to look at frequently cited journals in forensic psychology. He used WorldCat to identify the most-held journals in the field, and gathered the citations from recently published volumes. He then used PSYCinfo in EBSCOhost to look up the citations and export them to RefWorks, creating folders for each issue’s works cited.

Then exported these to Excel, sorted by title, and corrected the discrepancies in title formats. Once the data was washed, he manually counted the number of citations for each journal by selecting the cells with the title name and using the Count total in the lower information bar of Excel. This information went into a new spreadsheet. (I asked why not use a pivot table, and he didn’t know how to use one, and wasn’t sure if it would account for the title variations he may not have caught.) Generally, the groupings of citations fall within the Bradford distribution.

There are two ways to measure the reliability of the rankings you discover. On a macro level, you can look at how well the ranked lists match from month of publication to month of publication. You can test the consistency of Spearman’s rho rank correlation coefficient. And then Black went off into statistical stuff that doesn’t make sense to me just sitting here. One issue of a journal isn’t enough to determine the rankings of the journals in the field, but several volumes of 3-5 journals would do it.

On the micro level, you use more statistical methods (coefficient of variation). A large coefficient of variation indicates how much the ranking of a journal is bouncing around.

To have a useful ranked list, you need to have at least 10,000 citations, and closer to 20,000 is better. Even with that many, different samples will yield different ranks, particularly further down the list. So, a journal’s rank must be taken as an approximation of it’s true rank, and is probably a snapshot in time.

With all the positives of doing this, keep in mind the weaknesses. It is very time consuming to do. You need a lot of citations, and even that isn’t definitive. Works cited may not be readily available. It’s recreating the (very expensive) wheel, and may be better to use Scopus or Web of Knowledge if you have them.

For collection development, you can use this to assess the impact of specialized journals. Don’t be surprised to find surprises, and don’t judge new titles with this criteria.

Practical applications include identifying top journals on a topic, supporting a new major, figuring out what a department really uses, and potentially publishing for tenure purposes.

NASIG 2012: Is the Journal Dead? Possible Futures for Serial Scholarship

Speaker: Rick Anderson, University of Utah

He started with an anecdote about a picture of his dog that he thought made her look like Jean Paul Sartre. He then went to find a picture of him on Google, and had absolutely no doubt he’d not only find one quickly, but that he would find one with the same expression. In a world where he can find that picture in less than a minute makes it absurd for us to think we can keep doing serial scholarship in the way we have always done it.

The latest version of Siri can identify a reference-type question and go to Wolfram-Alpha to find the answer. How far away are we from this kind of thing doing very specific article identification and retrieval?

When budgets are falling or flat, there is a rising impatience with waste in libraries. One of the most egregious waste is that we have now and always bought stuff that nobody wants, and we still hold onto those things.

Market saturation is becoming an increasing issue as more and more articles are being submitted, and rejecting them or publishing them costs more money. A landslide of data is being created, with more coming every year. Open access mandates (whether seriously enforced or not) are forcing authors to think about copyright, putting pressure on the existing scholarly communications structure.

The Google books case, the Hathi Trust case, and the Georgia State ruling will all have impacts on copyright law and the traditional model of scholarly communication. The ground is soft — we can make changes now that may not have been possible 5 years ago, and may not be possible 2 years from now. Moving documents from print to digital is not a revolutionary change, but moving from a non-networked to a networked environment is. Distribution is at the heart of publishing, and is obviated if everyone has access to a document in a central location.

Before iTunes and the internet, we had to hope that the record store would carry the music we were interested in. Now, we can access any music from anywhere, and that’s the kind of thing that is happening to scholarly communications.

The environment is changing. The Digital Public Library of America and Google Books are changing the conversation. Patron-driven acquisitions and print on demand are only possible because of the networked environment. As we move towards this granular collecting, the whole dynamic of library collections is going to change.

This brings up some serious questions about the Big Deal and the Medium Deal. Anderson calls the Medium Deal individual title subscriptions, where you buy a bunch of articles you don’t need in order to ensure that you get them at a better price per download.

Anderson believes that there is little likelihood that open access is going to become the main publishing of scholarly communications in the foreseeable future, but it is going to become an increasing niche in the marketplace.

What does the journal do for us that is still necessary? What problem is solved for us by each element of the article citation? Volume, issue, and page number are not really necessary in the networked age. Our students don’t necessarily think about journals, they think about sources. The journal matters as a branding mechanism for articles, and gives us an idea of the reliability of the article. It matters who the author is. It matters when it was published. The article title tells us what the article is about, and the journal title lends that authority. But, the journal and issue don’t really tell you anything, and has more to do with the economics of print distribution. Finally the DOI matters, so you can retrieve it. So, why is the publisher missing? Because it doesn’t matter for identifying or retrieving or selecting the article.

There really is no such thing as “serials” scholarship. There are articles, but they aren’t serials. They may be in journals or a collection/server/repository. Typically there isn’t anything serial about a book, a review, a report, but… blog postings might be serial. What’s really interesting are the new categories of publication, such as data sets (as by-products of research or as an intentional product) and book+ (ongoing updated monographic publications, or monographs that morph into databases).

A database (or article or book) can be a “flow site,” such as Peggy Battin’s The Ethics of Suicide book, which she’s been working on for a decade. It will be published as both a book and as a website with ever growing content/data. It’s no longer a static thing, and gives us the benefit of currency with a cost of stability. How do you quote it? What is the version of record?

The people we serve have access to far more content than ever before, and they are more able to access it outside of the services we provide. So how do we stay relevant in this changing environment?

Definitions will get fuzzier, not clearer. This will be a tremendous boon to researchers. What emerges will be cool, exciting, incredibly useful and productive, and hard to manage. If we try to force our traditional methods of control onto the emerging models of scholarship, we will not only frustrate ourselves, but also our scholars. It is our job to internalize complexity so that we are the ones experiencing it so that our users don’t have to.

NASIG 2012: A Model for Electronic Resources Assessment

Presenter: Sarah Sutton, Texas A&M University-Corpus Christi

Began the model with the trigger event — a resource comes up for renewal. Then she began looking at what information is needed to make the decision.

For A&I databases, the primary data pieces are the searches and sessions from the COUNTER release 3 reports. For full-text resources, the primary data pieces are the full-text downloads also from the COUNTER reports. In addition to COUNTER and other publisher supplied usage data, she looks at local data points. Link-outs from the a-to-z list of databases tells her what resources her users are consciously choosing to use, and not necessarily something they arrive at via a discovery service or Google. She’s able to pull this from the content management system they use.

Once the data has been collected, it can be compared to the baseline. She created a spreadsheet listing all of the resources, with a column each for searches, sessions, downloads, and link-outs. The baseline set of core resources was based on a combination of high link-outs and high usage. These were grouped by similar numbers/type of resource. Next, she calculated the cost/use for each of the four use types, as well as the percentage of change in use over time.

After the baseline is established, she compares the renewing resource to that baseline. This isn’t always a yes or no answer, but more of a yes or maybe answer. Often more analysis is needed if it is tending towards no. More data may include overlap analysis (unique to your library collection), citation lists (unique titles — compare them with a list of highly-cited journals at your institution or faculty requests or appear on a core title list), journal-level usage of the unique titles, and impact factors of the unique titles.

Audience question: What about qualitative data? Talk to your users. Does not have a suggestion for how to incorporate that into the model without increasing the length of time in the review process.

Audience question: How much staff time does this take? Most of the work is in setting up the baseline. The rest depends on how much additional investigation is needed.

[I had several conversations with folks after this session who expressed concern with the method used for determining the baseline. Namely, that it excludes A&I resources and assumes that usage data is accurate. I would caution anyone from wholesale adopting this as the only method of determining renewals. Without conversation and relationships with faculty/departments, we may not truly understand what the numbers are telling us.]

NASIG 2012: Mobile Websites and APP’s in Academic Libraries Harmony on a Small Scale

Speaker: Kathryn Johns-Masten, State University of New York Oswego

About half of American adults have smart phones now. Readers of e-books tend to read more frequently than others. They may not be reading more academic material, but they are out there reading.

SUNY Oswego hasn’t implemented a mobile site, but the library really wanted one, so they’ve created their own using the iWebKit from MIT.

Once they began the process of creating the site, they had many conversations about who they were targeting and what they expected to be used in a mobile setting. They were very selective about which resources were included, and considered how functional each tool was in that setting. They ended up with library hours, contact, mobile databases, catalog, ILL article retrieval (ILLiad), ask a librarian, Facebook, and Twitter (in that order).

When developing a mobile site, start small and enhance as you see the need. Test functionality (pull together users of all types of devices at the same time, because one fix might break another), review your usage statistics, and talk to your users. Tell your users that it’s there!

Tools for designing your mobile site: MobiReady, Squeezer, Google Mobile Site Builder, Springshare Mobile Site Builder, Boopsie, Zinadoo, iWebKit, etc.

Other things related to library mobile access… Foursquare! The library has a cheat sheet for answers to the things freshman are required to find on campus, so maybe they could use Foursquare to help with this. Tula Rosa Public Library used a screen capture of Google Maps to help users find their new location. QR codes could link to ask a librarian, book displays linked to reviews, social media, events, scavenger hunts, etc. Could use them to link sheet music to streaming recordings.