NASIG presentations

I’ve updated my summaries of the NASIG 2012 presentations with the slides, if the presenters have made them available to be posted in SlideShare. In past years, we’ve limited access to this kind of material to members only, and I’m happy that we’re now able to share more of the awesome content that happens at the conference every year.

I’ll keep adding more as they become available, and there are many other presentations I wasn’t able to attend who have posted their slides already.

NASIG 2012: Practical Applications of Do-it-Yourself Citation Analysis

Speaker: Steve Black, College of Saint Rose

Citation analysis is the study of patterns and frequencies of citations. You might want to do this because it is an objective and quantitative way of looking at a journal’s impact (i.e. how often it is cited). It can also be used to determine the impact of authors, institutions, nations, and the “best journals for ____.”

There are many different critiques of relying on impact factors, and the literature on this is vast. There could be problems in the citations themselves and how they make it into the databases. Some citations become famous for being famous. There are many ways of gaming the system. But the main one is the conceptual difference between impact, quality, and importance. And finally, global versus local impact. Because of all this, it is important to remember that impact factor is one of many considerations for collection development, tenure & promotion, and paper submissions.

Doing it yourself allows you to taylor it to specific needs not covered elsewhere. It can be quick & dirty, exhaustive, or something in between. And, of course, does not incur the same kind of costs as subscriptions to Scopus or Web of Knowledge.

First, select a target population (journals in a sub-discipline, researchers at your institution, or a single faculty member). Then select a sample that represents the target population. Compile the works and sort/count the works cited.

Google Scholar is good for identifying articles on a topic, but not so much on authority control and streamlining citation formats. Zotero is a great citation management tool, but it doesn’t work so well for citation analysis because of how complicated it is to extract data into Excel.

Black did this to look at frequently cited journals in forensic psychology. He used WorldCat to identify the most-held journals in the field, and gathered the citations from recently published volumes. He then used PSYCinfo in EBSCOhost to look up the citations and export them to RefWorks, creating folders for each issue’s works cited.

Then exported these to Excel, sorted by title, and corrected the discrepancies in title formats. Once the data was washed, he manually counted the number of citations for each journal by selecting the cells with the title name and using the Count total in the lower information bar of Excel. This information went into a new spreadsheet. (I asked why not use a pivot table, and he didn’t know how to use one, and wasn’t sure if it would account for the title variations he may not have caught.) Generally, the groupings of citations fall within the Bradford distribution.

There are two ways to measure the reliability of the rankings you discover. On a macro level, you can look at how well the ranked lists match from month of publication to month of publication. You can test the consistency of Spearman’s rho rank correlation coefficient. And then Black went off into statistical stuff that doesn’t make sense to me just sitting here. One issue of a journal isn’t enough to determine the rankings of the journals in the field, but several volumes of 3-5 journals would do it.

On the micro level, you use more statistical methods (coefficient of variation). A large coefficient of variation indicates how much the ranking of a journal is bouncing around.

To have a useful ranked list, you need to have at least 10,000 citations, and closer to 20,000 is better. Even with that many, different samples will yield different ranks, particularly further down the list. So, a journal’s rank must be taken as an approximation of it’s true rank, and is probably a snapshot in time.

With all the positives of doing this, keep in mind the weaknesses. It is very time consuming to do. You need a lot of citations, and even that isn’t definitive. Works cited may not be readily available. It’s recreating the (very expensive) wheel, and may be better to use Scopus or Web of Knowledge if you have them.

For collection development, you can use this to assess the impact of specialized journals. Don’t be surprised to find surprises, and don’t judge new titles with this criteria.

Practical applications include identifying top journals on a topic, supporting a new major, figuring out what a department really uses, and potentially publishing for tenure purposes.

NASIG 2012: Is the Journal Dead? Possible Futures for Serial Scholarship

Speaker: Rick Anderson, University of Utah

He started with an anecdote about a picture of his dog that he thought made her look like Jean Paul Sartre. He then went to find a picture of him on Google, and had absolutely no doubt he’d not only find one quickly, but that he would find one with the same expression. In a world where he can find that picture in less than a minute makes it absurd for us to think we can keep doing serial scholarship in the way we have always done it.

The latest version of Siri can identify a reference-type question and go to Wolfram-Alpha to find the answer. How far away are we from this kind of thing doing very specific article identification and retrieval?

When budgets are falling or flat, there is a rising impatience with waste in libraries. One of the most egregious waste is that we have now and always bought stuff that nobody wants, and we still hold onto those things.

Market saturation is becoming an increasing issue as more and more articles are being submitted, and rejecting them or publishing them costs more money. A landslide of data is being created, with more coming every year. Open access mandates (whether seriously enforced or not) are forcing authors to think about copyright, putting pressure on the existing scholarly communications structure.

The Google books case, the Hathi Trust case, and the Georgia State ruling will all have impacts on copyright law and the traditional model of scholarly communication. The ground is soft — we can make changes now that may not have been possible 5 years ago, and may not be possible 2 years from now. Moving documents from print to digital is not a revolutionary change, but moving from a non-networked to a networked environment is. Distribution is at the heart of publishing, and is obviated if everyone has access to a document in a central location.

Before iTunes and the internet, we had to hope that the record store would carry the music we were interested in. Now, we can access any music from anywhere, and that’s the kind of thing that is happening to scholarly communications.

The environment is changing. The Digital Public Library of America and Google Books are changing the conversation. Patron-driven acquisitions and print on demand are only possible because of the networked environment. As we move towards this granular collecting, the whole dynamic of library collections is going to change.

This brings up some serious questions about the Big Deal and the Medium Deal. Anderson calls the Medium Deal individual title subscriptions, where you buy a bunch of articles you don’t need in order to ensure that you get them at a better price per download.

Anderson believes that there is little likelihood that open access is going to become the main publishing of scholarly communications in the foreseeable future, but it is going to become an increasing niche in the marketplace.

What does the journal do for us that is still necessary? What problem is solved for us by each element of the article citation? Volume, issue, and page number are not really necessary in the networked age. Our students don’t necessarily think about journals, they think about sources. The journal matters as a branding mechanism for articles, and gives us an idea of the reliability of the article. It matters who the author is. It matters when it was published. The article title tells us what the article is about, and the journal title lends that authority. But, the journal and issue don’t really tell you anything, and has more to do with the economics of print distribution. Finally the DOI matters, so you can retrieve it. So, why is the publisher missing? Because it doesn’t matter for identifying or retrieving or selecting the article.

There really is no such thing as “serials” scholarship. There are articles, but they aren’t serials. They may be in journals or a collection/server/repository. Typically there isn’t anything serial about a book, a review, a report, but… blog postings might be serial. What’s really interesting are the new categories of publication, such as data sets (as by-products of research or as an intentional product) and book+ (ongoing updated monographic publications, or monographs that morph into databases).

A database (or article or book) can be a “flow site,” such as Peggy Battin’s The Ethics of Suicide book, which she’s been working on for a decade. It will be published as both a book and as a website with ever growing content/data. It’s no longer a static thing, and gives us the benefit of currency with a cost of stability. How do you quote it? What is the version of record?

The people we serve have access to far more content than ever before, and they are more able to access it outside of the services we provide. So how do we stay relevant in this changing environment?

Definitions will get fuzzier, not clearer. This will be a tremendous boon to researchers. What emerges will be cool, exciting, incredibly useful and productive, and hard to manage. If we try to force our traditional methods of control onto the emerging models of scholarship, we will not only frustrate ourselves, but also our scholars. It is our job to internalize complexity so that we are the ones experiencing it so that our users don’t have to.

NASIG 2012: A Model for Electronic Resources Assessment

Presenter: Sarah Sutton, Texas A&M University-Corpus Christi

Began the model with the trigger event — a resource comes up for renewal. Then she began looking at what information is needed to make the decision.

For A&I databases, the primary data pieces are the searches and sessions from the COUNTER release 3 reports. For full-text resources, the primary data pieces are the full-text downloads also from the COUNTER reports. In addition to COUNTER and other publisher supplied usage data, she looks at local data points. Link-outs from the a-to-z list of databases tells her what resources her users are consciously choosing to use, and not necessarily something they arrive at via a discovery service or Google. She’s able to pull this from the content management system they use.

Once the data has been collected, it can be compared to the baseline. She created a spreadsheet listing all of the resources, with a column each for searches, sessions, downloads, and link-outs. The baseline set of core resources was based on a combination of high link-outs and high usage. These were grouped by similar numbers/type of resource. Next, she calculated the cost/use for each of the four use types, as well as the percentage of change in use over time.

After the baseline is established, she compares the renewing resource to that baseline. This isn’t always a yes or no answer, but more of a yes or maybe answer. Often more analysis is needed if it is tending towards no. More data may include overlap analysis (unique to your library collection), citation lists (unique titles — compare them with a list of highly-cited journals at your institution or faculty requests or appear on a core title list), journal-level usage of the unique titles, and impact factors of the unique titles.

Audience question: What about qualitative data? Talk to your users. Does not have a suggestion for how to incorporate that into the model without increasing the length of time in the review process.

Audience question: How much staff time does this take? Most of the work is in setting up the baseline. The rest depends on how much additional investigation is needed.

[I had several conversations with folks after this session who expressed concern with the method used for determining the baseline. Namely, that it excludes A&I resources and assumes that usage data is accurate. I would caution anyone from wholesale adopting this as the only method of determining renewals. Without conversation and relationships with faculty/departments, we may not truly understand what the numbers are telling us.]

NASIG 2012: Mobile Websites and APP’s in Academic Libraries Harmony on a Small Scale

Speaker: Kathryn Johns-Masten, State University of New York Oswego

About half of American adults have smart phones now. Readers of e-books tend to read more frequently than others. They may not be reading more academic material, but they are out there reading.

SUNY Oswego hasn’t implemented a mobile site, but the library really wanted one, so they’ve created their own using the iWebKit from MIT.

Once they began the process of creating the site, they had many conversations about who they were targeting and what they expected to be used in a mobile setting. They were very selective about which resources were included, and considered how functional each tool was in that setting. They ended up with library hours, contact, mobile databases, catalog, ILL article retrieval (ILLiad), ask a librarian, Facebook, and Twitter (in that order).

When developing a mobile site, start small and enhance as you see the need. Test functionality (pull together users of all types of devices at the same time, because one fix might break another), review your usage statistics, and talk to your users. Tell your users that it’s there!

Tools for designing your mobile site: MobiReady, Squeezer, Google Mobile Site Builder, Springshare Mobile Site Builder, Boopsie, Zinadoo, iWebKit, etc.

Other things related to library mobile access… Foursquare! The library has a cheat sheet for answers to the things freshman are required to find on campus, so maybe they could use Foursquare to help with this. Tula Rosa Public Library used a screen capture of Google Maps to help users find their new location. QR codes could link to ask a librarian, book displays linked to reviews, social media, events, scavenger hunts, etc. Could use them to link sheet music to streaming recordings.

NASIG 2012: Everyone’s a Player — Creation of Standards in a Fast-Paced Shared World

Speaker: Nettie Lagace, NISO – National Information Standards Organization

NISO is responsible for a lot of the things we work with all the time, by making the systems work more seamlessly and getting everyone on the same page. More than you may think. They operate on a staff of five: one who is the public face and cheerleader, one who travels to anywhere needed, one who makes sure that the documents are properly edited, and two who handle the technical aspects of the organization/site/commitees/etc.

Topic committees identify needs that become the working groups that tackle the details. Where there is an issue, there’s a working group, with many people involved in each.

New NISO work items consider:
What is not working and how it impacts stakeholders.
How it relates to existing efforts.
Beneficiaries of the deliverables and how.
Stakeholders.
Scope of the initiative.
Encouragement for implementation.

Librarians aren’t competitive in the ways that other industries might be, so this kind of work is more natural for them. The makeup of the working group tries to keep a balance so that no single interest category makes up the majority of the membership. Consensus is a must. They are also trying to make the open process aspect be more visible/accessible to the general public.

Speaker: Marshall Breeding

Library search has evolved quite a bit, from catalog searches that essentially replicated the card catalog process to federated searching to discovery interfaces to consolidated indexes. Libraries are increasingly moving towards these consolidated indexes to cover all aspects of their collections.

There is a need to bring some order to the market chaos this has created. Discovery brings value to library collections, but it brings some uncertainty to publishers. More importantly, uneven participation diminishes the impact, and right now the ecosystem is dominated by private agreements.

What is the right level of investment in tools that provide access to the millions of dollars of content libraries purchase every year? To be effective, these tools need to be comprehensive, so what do we need to do to encourage all of the players to participate and make the playing field fair to all. How do libraries figure out which discovery service is best for them?

The NISO Open Discovery Initiative hopes to bring some order to that chaos, and they plan to have a final draft by May 2013.

Speaker: Regina Reynolds, Library of Congress

From the beginning, ejournals have had many pain points. What brought this to a head was the problem with missing previous titles in online collections. Getting from a citation to the content doesn’t work when the name is different.

There were issues with missing or incorrect numbering, publishing statements, and dates. And then there are the publishers that used print ISSN for the electronic version. As publishers began digitizing back content, these issues grew exponentially. Guidelines were needed.

After many, many conversations, The Presentation & Identification of E-Journals (PIE-J) was produced and is available for comment until July 5th. The most important part is the three pages of recommended practices.

See also: In Search of Best Practices for Presentation of E-Journals by Regina Romano Reynolds and Cindy Hepfer

NASIG 2012: Copyright and New Technologies in the Library: Conflict, Risk and Reward

Speaker: Kevin Smith, Duke University

It used to be that libraries didn’t have to care about copyright because most of our practices were approved of by copyright law. However, what we do has changed (we are not in the age of the photocopier), but the law hasn’t progressed with it.

Getting sued is a new experience for libraries. Copyright law is developed through the court system, because the lawmakers can’t keep up with the changes in technology. This is a discovery process, because we find out more about how the law will be applied in these situations.

Three suits — Georgia State e-reserves, UCLA streamed digital video, and Hathi Trust & 5 partners for distributing digital scans and plans for orphaned works. In all three cases, the same defense is being used — fair use. In the Hathi Trust case, the author’s guild has asked the judge to not allow libraries to apply fair use to what they do because the copyright law covers specific things that libraries can do, even though it explicitly says it doesn’t negate fair use as well.

Whenever we talk about copyright, we are thinking about risk. Libraries and universities deal with risk all the time. Always evaluate the risk of allowing an activity against the risk of not doing it. Fair use is no different.

Without taking risks, we also abdicate rewards. What can we gain by embracing fair use? Take a look at the ARL Code of Best Practices in Fair Use for Academic & Research Libraries (which is applicable outside of the academic library context). The principles and limitations of fair use is more of a guide than a set of rules, and the best practices help understand practical applications of those guidelines.

From the audience: No library wants to be the one that wrecked fair use for everybody. Taking this risk is not the same as more localized risk-taking, as this could lead to a precedent-setting legal case.

These cases are not necessarily binding, they are a data point, and particularly so at the trial court level. However, the damages can be huge, and much more than many other legal risks we take. Luckily, in these cases, you are only liable for the actual damages, which are usually quite small.

The key question for fair use has been, “is the use transformative?” This is not what the law asks, but it came about because of an influential law review article by a judge who said this is the question he asked himself when evaluating copyright cases. The other consideration is whether the works are competitive in the market, but transformative trumps this.

When is a work derivative and when is it transformative? Derivative works are under the auspices of the copyright holder, but transformative works are considered fair use.

In the “Pretty Women” case, the judges said that multiple copies for educational purposes is a classic example of fair use. This is what the Georgia State judge cited in her ruling, even though she did not think that the e-reserves were transformative.

Best practices are not the same as negotiated guidelines. These are a broad consensus on how librarians can think about fair use in practice in an educational setting. Using the code of best practices is not a guaranteed that you will not get sued. It’s a template for thinking about particular activities.

In the Hathi Trust case, the National Federation for the Blind has asked to be added as a defendant because they see the services for their constituents being challenged if libraries cannot apply fair use to their activities that bring content to users in the format they need. In this case the benefit is great and the risk is small. Few will bring a lawsuit because the library has made copies so that the blind can use a text-to-speech program. Which lawsuit would you rather defend — for providing access or because you haven’t provided access?

Fair use can facilitate text-mining that is for research purposes, not commercial. For example, looking at how concepts are addressed/discussed across a large body of work and time. Fair use is more efficient in this kind of transformative activity.

What about incorporating previously published content in new content that will be deposited into an institutional repository? Fair use allows adaptation, particularly as technologies change. This is the heart of transformative use — quoting someone else’s work — and should be no different from using a graph or chart. However, you are using the entirety of the work, and should consider if the amount used is appropriate (not excessive) for the new work.

What about incorporating music into video projects? If the music or the video is a fundamental part of the argument and help tell the story, then it’s fair use. If you don’t need that particular song, or it’s just a pretty soundtrack, then go find something that is licensed for you to use (Creative Commons).

One area to be concerned with, though, is the fair use of distributing content for educational purposes. Course packs created by commercial entities is not fair use. Electronic course readings have not been judged in the same way because the people making the electronic copies were educators in a non-commercial setting. Markets matter — not having a market for these kinds of things helped in the GSU case.

The licensing market for streaming digital is more “hit or miss,” and education has a long precedent for using excerpts. It’s uncertain if an entirety of a work would be considered fair use or not.

Orphan works is a classic market failure, and has the best chance of being supported by fair use.

Solutions:

  • Stop giving up copyright in scholarly works.
  • Help universities develop new promotion & tenure policies.
  • Use Creative Commons licenses.
  • Publish in open access venues or retain rights and self-archive.

NASIG 2012: Discovery on a budget: Improved searching without a web-scale discovery product

Speakers: Lynn Fields & Chris Bulock, Southern Illinois University Edwardsville

They have a link resolver, database list, and A-Z journal list. They formed a task force a few years ago to redo the webpage. They decided to approach it from a UX perspective, rather than library committee perspective.

Their initial survey of users found what most of us have learned about our websites: too many links, too much text, too much library jargon. They implemented many changes based on this, and then then did some observational study of users doing two specific tasks. This also resulted in identifying confusing aspects of the library site, so they made more changes and did another observational study. For that one, they divided the participants into two groups in order to determine which aspect of the modifications was more effective.

They took what they learned from the website studies and applied that to a study of the catalog use. They wanted to know if users could find an ebook in the catalog (distinguishing it from a print book), understand the catalog displays (and use faceting), and understand the consortia catalog interface.

Lessons learned:
There’s a gap between freshman and senior instruction. They need to develop more instruction sessions on specific topics like ebooks and facets.

Discovery is more than the journey from search box to full text, and there are many factors that impact the end result. This includes the look of a database/catalog, names and labels for resources, placement of the search box, etc.

Core lessons:
1. names and language
Would students know how to define a database, periodical, e-resource, or even research? Using action-based language is more effective. Cutting down on vendor branding helps, too.

2. order matters
First impressions are important, so arrange the order of things by importance/relevance. Minimize reading. Descriptions or lengthy cues are often ignored.

3. be familiar
If you do a search in Google, Amazon, or WorldCat, you get very similar looking search results pages. If you put important content on the right column of your search results page, it won’t be as visible because users are used to ignoring advertising on that part of the screen.

4. let the users help you
Surveys, focus groups, observation studies… and get more than just the vocal minority. Observational studies are more about what they actually do than what they say they do, and are much more valuable in that way. Capture data about errors by making it easy for users to contact you (i.e. EZproxy host errors).

5. search boxes
No box can search everything, but people will use it for anything. If the search box is limited, make that clear. Searches of database titles (not content) can be problematic, as users expect it to search for article-level content.

6. work together
Discovery doesn’t respect department divisions. Work together from the beginning.

Don’t think that doing this once is enough. Keep it ongoing, and remember that the discovery process has many steps.

tl;dr — Usability studies and improving the user experience is hard, but necessary. Discovery isn’t just about buying some massive central index of content.

NASIG 2012: Managing E-Publishing — Perfect Harmony for Serialists

Presenters: Char Simser (Kansas State University) & Wendy Robertson (University of Iowa)

Iowa looks at e-publishing as an extension of the central mission of the library. This covers not only text, but also multimedia content. After many years of ad-hoc work, they formed a department to be more comprehensive and intentional.

Kansas really didn’t do much with this until they had a strategic plan that included establishing an open access press (New Prairie). This also involved reorganizing personnel to create a new department to manage the process, which includes the institutional depository. The press includes not only their own publications, but also hosts publications from a few other sources.

Iowa went with BEPress’ Digital Commons to provide both the repository and the journal hosting. Part of why they went this route for their journals was because they already had it for their repository, and they approach it more as being a hosting platform than as being a press/publisher. This means they did not need to add staff to support it, although they did add responsibilities to exiting staff in addition to their other work.

Kansas is using Open Journal Systems hosted on a commercial server due to internal politics that prevented it from being hosted on the university server. All of their publications are Gold OA, and the university/library is paying all of the costs (~$1700/year, not including the .6 FTE staff hours).

Day in the life of New Prairie Press — most of the routine stuff at Kansas involves processing DOI information for articles and works-cited, and working with DOAJ for article metadata. The rest is less routine, usually involving journal setups, training, consultation, meetings, documentation, troubleshooting, etc.

The admin back-end of OJS allows Char to view it as if she is different types of users (editor, author, etc.) to be able to trouble-shoot issues for users. Rather than maintaining a test site, they have a “hidden” journal on the live site that they use to test functions.

A big part of her daily work is submitting DOIs to CrossRef and going through the backfile of previously published content to identify and add DOIs to the works-cited. The process is very manual, and the error rate is high enough that automation would be challenging.

Iowa does have some subscription-based titles, so part of the management involves keeping up with a subscriber list and IP addresses. All of the titles eventually fall into open access.

Most of the work at Iowa has been with retrospective content — taking past print publications and digitizing them. They are also concerned with making sure the content follows current standards that are used by both library systems and Google Scholar.

There is more. I couldn’t take notes and keep time towards the end.

NASIG 2012: Results of Web-scale discovery — Data, discussions and decisions

Speakers: Jeff Daniels, Grand Valley State University

GVSU has had Summon for almost three years — longer than most any other library.

Whether you have a web-scale discovery system or are looking at getting one, you need to keep asking questions about it to make sure you’re moving in the right direction.

1. Do we want web-scale discovery?
Federated searching never panned out, and we’ve been looking for an alternative ever since. Web-scale discovery offers that alternative, to varying degrees.

2. Where do we want it?
Searching at GVSU before Summon — keyword (Encore), keyword (classic), title, author, subject, journal title
Searching after Summon — search box is the only search offered on their website now, so users don’t have to decide first what they are searching
The heat map of clicks indicates the search box was the most used part of the home page, but they still had some confusion, so they made the search box even more prominent.

3. Who is your audience?
GVSU focused on 1st and 2nd year students as well as anyone doing research outside their discipline — i.e. people who don’t know what they are looking for.

4. Should we teach it? If so, how?
What type of class is it? If it’s a one-off instruction session with the audience you are directing to your web-scale discovery, then teach it. If not, then maybe don’t. You’re teaching the skill-set more than the resource.

5. Is it working?
People are worried that known item searches will get lost (i.e. catalog items). GVSU found that the known items make up less than 1% of Summon, but over 15% of items selected from searches come from that pool.
Usage statistics from publisher-supplied sources might be skewed, but look at your link resolver stats for a better picture of what is happening.

GVSU measured use before and after Summon, and they expected searches to go down for A&I resources. They did, but ultimately decided to keep them because they were needed for accreditation, they had been driving advanced users to them via Summon, and publishers were offering bundles and lower pricing. For the full-text aggregator databases, they saw a decrease in searching, but an increase in full-text use, so they decided to keep them.

Speaker: Laura Robinson, Serials Solutions

Libraries need information that will help us make smart decisions, much like what we provide to our users.

Carol Tenopir looked at the value gap between the amount libraries spend on materials and the perceived value of the library. Collection size matters less these days — it’s really about access. Traditional library metrics fail to capture the value of the library.

tl;dr — Web-scale discovery is pretty awesome and will help your users find more of your stuff, but you need to know why you are implementing it and who you are doing it for, and ask those questions regularly even after you’ve done so.