vufind – eclectic librarian

#libday8 day 4 — lies, damn lies, and statistics

How to Lie with Statistics cover — How to Lie with Statistics by Darrell Huff & Irving Geis

My day began with organizing and prioritizing the action items that arrived yesterday when I was swamped with web-scale discovery service presentations. I didn’t get very far when it was time to leave for a meeting about rolling out VuFind locally. Before that meeting, I dropped in to update my boss (and interim University Librarian) on some things that came out of the presentations and subsequent hallway discussions.

At the VuFind meeting, we discussed some tweaks and modifications, and most everyone took on some assignments to revise menu labels, record displays, and search options. I managed to evade an assignment only because these things are more for reference, cataloging, and web services. The serials records look fine and appear accurately in the basic search (from the handful of tests I ran), so I’m not concerned about tweaking anything specifically.

Back at my desk, I started to work on the action items again, but the ongoing conversations about the discovery service presentations distracted me until one of the reference librarians provided me with a clue about the odd COUNTER use stats we’ve received from ProQuest for 2011.

I had given her stats on a resource that was on the CSA platform, but for the 2011 stats I provided what ProQuest gave me, which were dubious in their sudden increase (from 15 in 2010 to 4756 in 2011). She made a comment about how the low stats didn’t surprise her because she hates teaching the Illumina platform. I said it should be on the ProQuest platform now because that’s where the stats came from. She said she’d just checked the links on our website, and they’re still going to Illumina.

This puzzled me, so I pulled the CSA stats from 2011, and indeed, we had only 17 searches for the year for this index. I checked the website and LibGuides links, and we’re still sending users to the Illumnia platform, and not ProQuest. So, I’m not sure where those 4756 searches were coming from, but their source might explain why our total ProQuest stats tripled in 2011. This lead me to check our federated search stats, and while it shows quite a few searches of ProQuest databases (although not this index, as we hadn’t included it), our DB1 report shows zero federated searches and sessions.

I compiled all of this and sent it off to ProQuest customer support. I’m eager to see what their response will be.

This brought me up to my lunch break, which I spent at the gym where one of the trainers forced my compatriots and I to accomplish challenging and strenuous activities for 45 min. After my shower, I returned to the library to lunch at my desk and respond to some crowd-sourced questions from colleagues at other institutions.

I managed to whack down a few email action items before my ER&L co-presenter called to discuss the things we need to do to make sure we’re prepared for the panel session. We’re pulling together seasoned librarians and product representatives from five different electronic resource management systems (four commercial, one open-source) to talk about their experiences working with the products. We hashed out a few things that needed hashing out, and ended the call with more action items on our respective lists.

At that point, I had about 20 min until my next meeting, so I tracked down the head of research and instruction to hash out some details regarding the discovery service presentations that I wanted to make sure she was aware of. I’m glad I did, because she filled in some gaps I had missed, and later she relayed a positive response from one of the librarians that concerned both of us.

The meeting ended early, so I took the opportunity of suddenly unscheduled time in my calendar to start writing down this whole thing. I’d been so busy I hadn’t had time to journal this throughout the day like I’d previously done.

Heard back from ProQuest, and although they haven’t addressed the missing federated search stats from their DB1 report, they explain away the high number of searches in this index as having come from a subject area search or the default search across all databases. There was (and may still be) a problem with defaulting to all databases if the user did not log out before starting a new session, regardless of which database they intended to use. PQ tech support suggested looking at their non-COUNTER report that includes full-text, citation, and abstract views for a more accurate picture of what was used.

For the last stretch of the day, I popped on my headphones, cranked up the progressive house, and tried to power through the rest of the email action items. I didn’t get very far, as the first one required tracking down use stats and generating a report for an upcoming renewal. Eventually, I called it a day and posted this. Yay!

CiL 2008: Woepac to Wowpac

Moderator: Karen G. Schneider – “You’re going to go un-suck your OPACs, right?”

Speaker: Roy Tennant

Tennant spent the last ten years trying to kill off the term OPAC.

The ILS is your back end system, which is different from the discovery system (doesn’t replace the ILS). Both of these systems can be locally configured or hosted elsewhere. Worldcat Local is a particular kind of discovery system that Tenant will talk about if he has time.

Traditionally, users would search the ILS to locate items, but now the discovery system will search the ILS and other sources and present it to the user in a less “card catalog” way. Things to consider: Do you want to replace your ILS or just your public interface? Can you consider open source options (Koha, Evergreen, vuFind, LibraryFind etc.)? Do you have the technical expertise to set it up and maintain it? Are you willing to regularly harvest data from your catalog to power a separate user interface?

Speaker: Kate Sheehan

Speaking from her experience of being at the first library to implement LibraryThing for Libraries.

The OPAC sucks, so we look for something else, like LibraryThing. The users of LibraryThing want to be catalogers, which Sheehan finds amusing (and so did the audience) because so few librarians want to be catalogers. “It’s a bunch of really excited curators.”

LibraryThing for libraries takes the information available in LibraryThing (images, tags, etc.) and drops them into the OPAC (platform independent). The display includes other editions of books owned by the library, recommendations based on what people actually read, and a tag cloud. The tag cloud links to a tag browser that opens up on top of the catalog and allows users to explore other resources in the catalog based on natural language tags rather than just subject headings. Using a Greasmonkey script in your browser, you can also incorporate user reviews pulled from LibraryThing. Statistics show that the library is averaging around 30 tag clicks and 18 recommendations per day, which is pretty good for a library that size.

“Arson is fantastic. It keeps your libraries fresh.” — Sheehan joking about an unusual form of collection weeding (Danbury was burnt to the ground a few years ago)

Data doesn’t grow on trees. Getting a bunch of useful information dropped into the catalog saves staff time and energy. LibraryThing for Libraries didn’t ask for a lot from patrons, and it gave them a lot in return.

Speaker: Cindi Trainor

Are we there yet? No. We can buy products or use open source programs, but they still are not the solution.

Today’s websites are consist of content, community (interaction with other users), interactivity (single user customization), and interoperability (mashups). RSS feeds are the intersection of interactivity and content. There are a few websites that are in the sweet spot in the middle of all of these: Amazon (26/32)*, Flickr (26/32), Pandora (20/32), and Wikipedia (21/32) are a few examples.

Where are the next generation catalog enhancements? Each product has a varying degree of each element. Using a scoring system with 8 points for each of the four elements, these products were ranked: Encore (10/32), LibraryFind (12/32), Scriblio (14/32), and WorldCat Local (16/32). Trainor looked at whether the content lived in the system or elsewhere and the degree to which it pulled information from sources not in the catalog. Library products still have a long way to go – Voyager scored a 2/32.

*Trainor’s scoring system as described in paragraph three.

Speaker: John Blyberg

When we talk about OPACs, we tend to fetishize them. In theory, it’s not hard to create a Wowpac. The difficulty is in creating the system that lives behind it. We have lost touch with the ability to empower ourselves to fix the problems we have with integrated library systems and our online public access catalogs.

The OPAC is a reflection of the health of the system. The OPAC should be spilling out onto our website and beyond, mashing it up with other sites. The only way that can happen is with a rich API, which we don’t have.

The title of systems librarian is becoming redundant because we all have a responsibility and role in maintaining the health of library systems. In today’s information ecology, there is no destination — we’re online experiencing information everywhere.

There is no way to predict how the information ecology will change, so we need systems that will be flexible and can grow and change over time. (Sopac 2.0 will be released later this year for libraries who want to do something different with their OPACs.) Containers will fail. Containers are temporary. We cannot hang our hat on one specific format — we need systems that permit portability of data.

Nobody in libraries talks about “the enterprise” like they do in the corporate world. Design and development of the enterprise cannot be done by a committee, unless they are simply advisors.

The 21st century library remains un-designed – so let’s get going on it.

CiL 2008: The New Generation of Library Interfaces

Speaker: Marshall Breeding

[You can find the slides from this presentation on the Library Technology Guides website.]

OCLC study in 2005 indicated that only 2% of college students begin their research with a library website or catalog, as opposed to 89% using a search engine like Google. The 2007 report indicates that library website use is down 10%. (These are surprising statistics given the data from the studies presented yesterday that indicated that students would go to a library website first. I have to wonder if the study looked at library catalogs specifically. Also, wondering if OCLC has another motive for making it seem like library catalogs are horrible — Worldcat Local, anyone?)

Okay, back to Breeding’s talk.

Library catalogs do need to be something more than a computerized version of the card catalog. OPACs suck. We have a disjointed approach to information and service delivery with the different interfaces for finding books, articles, databases, etc., and we need to do something better.

What do we call it besides OPAC? We need to re-define it, but we don’t have a name for this new thing yet. It needs powerful search engines and a clean interface with a comprehensive body of information available to our users, down to the article level. The system needs to favor electronic resources as much as our current systems favor print resources, and both should be in the same interface.

We need to be able to do deep searching of article-level items within our own interface by harvesting the data much like the Open Archives Initiative. (Publishers will be resistant to this, and in particular, aggregators that spend a lot of time and money on R&D for their search interfaces. There are a few libraries experimenting with this, but I think we need some turnkey technologies before more libraries adopt this approach.)

Web 2.0 tools can provide some options for tweaking interfaces and bringing them together, but it needs to be seamless and not cobbled together like what we have now.

Interface features that Breeding wants: simple point of entry, relevancy ranked results (users expect that the “good stuff” will be listed first), facets for narrowing and navigation (let users drill down through the results set, incrementally narrowing the field, rather than using boolean or “advance search”), query enhancement (validated spell check, automatic inclusion of authorized and related terms, etc.), suggested related results (make the query and the response to it better than the query provided), navigational bread crumbs (select/deselect facets), and a few more that I didn’t type fast enough to get. You want to make it so easy that users aren’t thinking about the interface but rather are thinking about the content.

We need appropriate organizational structures, such as faceted applications of subject terminology, discipline-specific thesauri or ontologies, and tags. We need enriched content like book jacket images (Syndetic Solutions, Amazon Web Services, Google Book Search API, etc.), rating scores, and an as-yet-unavailable open content solution. We need a personalized user experience with a single sign-on that is persistent throughout the session. We are entering the post-metadata search era with full-text searching of books becoming more available, in conjunction with the already available full-text searching of journals, so metadata searching is less and less necessary, thus making the strict rules for cataloging and indexing less necessary, allowing us to focus on other aspects of cataloging. We need to move beyond the discovery of content to the delivery of content. We need library-specific features such as appropriate relevance factors (keyword rankings + library weightings, circulation frequency, other library holdings, scholarly content, etc.), results grouping (FRBR), and collection focused.

Take your content and services to where the users are. Wed library-specific requirements and expectations with the content-delivery sophistication of e-commerce tools.

Can the library community bear the cost of this new OPAC? Can we afford to not do it or do it so slowly that we become irrelevant? We don’t have another 3-5 years to get to where we should have been five years ago.

A few interfaces we have today: Endeca, AquaBrowser Library, Ex Libris Primo, Innovative Interfaces Encore, OCLC Worldcat Local, The Library Corporation’s Indigo, LibraryThing for Libraries as an add-on, Scriblio (WordPress based), vuFind, eXtensible Catalog, and a various ILS with next-gen features (Polaris, Koha, Evergreen).

CiL 2008: What’s New With Federated Search

Speakers: Frank Cervone & Jeff Wisniewski

Cervone gave a brief over-view of federated searching, with Wisniewski giving a demonstration of how it works in the real world (aka University of Pittsburgh library) using WebFeat. UofP library has a basic search front and center on their home page, and then a more advanced searching option under Find Articles. They don’t have a Database A-Z list because users either don’t know what database means in this context or can’t pick from the hundreds available.

Cervone demonstrated the trends in using meta search, which seems to go up and down, but over-all is going up. The cyclical aspect due to quarter terms was fascinating to see — more dramatic than what one might find with semester terms. Searches go up towards mid-terms and finals, then drop back down afterwards.

According to a College & Research Libraries article from November 2007, federated search results were not much different from native database searches. It also found that faculty rated results of federated searching much higher than librarians, which begs the question, “Who are we trying to satisfy — faculty/students or librarians.”

Part of why librarians are still unconvinced is because vendors are shooting themselves in the foot in the way they try to sell their products. Yes, federated search tools cannot search all possible databases, but our users are only concerned that they search the relevant databases that they need. De-duplication is virtually impossible and depends on the quality of the source data. There are other ways that vendors promote their products in ways that can be refuted, but the presenters didn’t spend much time on them.

The relationships between products and vendors is incestuous, and the options for federated searching are decreasing. There are a few open source options, though: LibraryFind, dbWiz, Masterkey, and Open Translators (provides connectors to databases, but you have to create the interface). Part of why open source options are being developed is because commercial vendors aren’t responding quickly to library needs.

LibraryFind has a two-click find workflow, making it quicker to get to the full-text. It also can index local collections, which would be handy for libraries who are going local.

dbWiz is a part of a larger ERM tool. It has an older, clunkier interface than LibraryFind. It doesn’t merge the results.

Masterkey can search 100 databases at a time, processing and returning hits at the rate of 2000 records per second, de-duped (as much as it can) and ranked by relevance. It can also do faceted browsing by library-defined elements. The interface can be as simple or complicated as you want it to be.

Federated searching as a stand-alone product is becoming passe as new products for interfacing with the OPAC are being developed, which can incorporate other library databases. vufind, WorldCat local, Encore, Primo, and Aquabrowser are just a few of the tools available. NextGen library interfaces aim to bring all library content together. However, they don’t integrate article-level information with the items in your catalog and local collections very well.

Side note: Microsoft Enterprise Search is doing a bit more than Google in integrating a wide range of information sources.

Trends: Choices from vendors is rapidly shrinking. Some progress in standards implementation. Visual search (like Grokker) is increasingly being used. Some movement to more holistic content discovery. Commercial products are becoming more affordable, making them available to institutions of all sizes of budgets.

Federated Search Blog for vendor-neutral info, if you’re interested.

Samantha Brennan on I’ve been published!November 30, 2020
What a fascinating sport. We'd love to have you back anytime! Welcome!
FY19 conferences, an update – eclectic librarian on FY19 conferencesJanuary 4, 2019
[…] was very excited to finally have approval to attend the Timberline Acquisitions Institute this year, but turns out […]
quantified self, an addendum – eclectic librarian on the quantified selfMarch 27, 2018
[…] I shared a list of apps and tools I’m using to monitor and track things, mainly health-related. Well, my…