ER&L 2014 — Mining and Application of Diverse Cultural Perspectives in User-Generated Content

“Coal Mining in Brazil” by United Nations Photo

Speaker: Brent Hecht, Ph.D., University of Minnesota

His research is at the junction of human/computer interaction, geography, and big data. What you won’t see here is libraries.

Wikipedia has revolutionized computing in two ways: getting users to a large repository of knowledge and by being hugely popular with many people. In certain cases it has become the brains of modern computing.

However, it’s written by people who exist in a complex cultural context: region, gender, religion, etc. The cultural biases of Wikipedia have an impact on related computer processes. Librarians and educators are keenly aware of the caveats of Wikipedia such as accuracy and depth, but we also need to think about cultural bias.

Wikipedia exists in a large number of languages. Not much has been understood about the relationships between them until recently. Computer scientists assume that larger language editions are supersets of smaller ones and conceptually consistent across them. Social scientists know that each cultural communities defines things differently, and will cover unique sets of concepts. Several studies have looked into this.

A vast majority of concepts appear in only one language. Only a fraction of a percent are in all languages.

If you only read the English article about a concept that has an article in at least one other language edition, you are missing about 29% of the information that you could have gotten if you could read that other article.

Some of the differences are due to cultural factors. Each language edition will have a bias towards countries where the language is prominent.

What can we do if we take advantage of this? Omnipedia tries to break down the language silos to provide a diverse repository of world knowledge, and highlights the similarities and differences between the versions. The interface can be switched to display and search in any of the 25 languages covered.

Search engines are good for closed informational requests and navigational queries, but not so great for exploratory search. Atlasify tries to map concepts to regions. When the user clicks on entities in the map, it will display (in natural language) the relationship between the query and the location. They know this kind of mapping doesn’t work for every concept, but the idea of mapping search query concepts can be applied to other visualizations like the periodic tables or congressional seat assignments.

Bear in mind, though, that all of these tools are sensitive to the biases of their data sources. If they use only the English Wikipedia, they can miss important pieces, or worse, perpetuate the cultural biases.

ER&L: ROI — Why oh why?

Speaker: Doralyn Rossman

How to use a combination of qualitative and quantitative data to tell a story.

ROI is a hot topic. People outside of the library are aware of it. Comparing yourself to other libraries is challenging because your missions are different. Showing how you contribute to the mission of your institution is much more valuable.

Methods of assessment: ROI, use, impact, alternative comparison (lib versus other service), customer satisfaction & outcomes, and commodity production (services, facilities, resources).

When you tell your story, start at the top: strategic plan, accreditation, etc. Give administrators information in the language they need to share with others. What do they need to know that they don’t know they need to know? What do they not want to know?

There are plenty of examples out there — do your homework.

Quantitative metrics: COUNTER, simultaneous users, multiple-year deals, capped inflation, staffing & workflows, reference queries, instruction sessions, citation reports & impact factor of collection, cost if purchased individually.

Qualitative metrics: relevance to curriculum, formatting efficiencies, user self-sufficiency, condition and usabilty of collection, proactive trouble shooting, MINES protocol from ARL.

Story: cost avoidance for users, reduced cost of course materials, quick access to research materials for faculty and grant work, attracting and retaining faculty/students, what would you do if they library didn’t exist, contribution to the strategic plan.

Example: University of Tennessee surveyed faculty about their grant proposal process. They found that faculty used more materials and resources than what was reflected in the proposals themselves. There is an importance of library resources at all stages in the grant process and publishing process.

LIBvalue project is a good resource. It’s generated by an IMLS grant following up on the ROI research at various institutions. Recommended reading.

If you’re not already collecting data, start now. You want a long-term study. Build it into your routines so you do it on a regular basis.

There are no cookie cutter methods. You have to know what story you want to tell, and then find the data to do that. Each situation/institution will be unique.

From the audience: Sense Maker is a good tool for capturing qualitative data.

IL 2010: Personal Content Management

speaker: Gary Price

Giving generalities about mobile devices is challenging because there are so many options. If your library doesn’t already have a mobile website, go for a web app rather than something platform specific.

The cloud can be a good backup for when your devices fail, since you can access it from other places. But, choose a cloud service or backup service carefully – consider reputation and longevity. If you see something you want to preserve for future use, save it now because it could be gone later. Capture it yourself and keep it local.

Backup your computer (pay now or pay later). Price recommends Mozy and Carbonite. Also, pay attention to the restore options (internet vs. DVD).

[I kinda zoned out at this point, as I’m pretty sure he’s not going to talk about much of anything I don’t already know about or will read about on Lifehacker. Unfortunately, choosing a seat in the front row prevents me from politely leaving to attend a different session.]

Ithaka’s What to Withdraw tool

Have you seen the tool that Ithaka developed to determine what print scholarly journals you could withdraw (discard/store) that are already in your digital collections? It’s pretty nifty for a spreadsheet. About 10-15 minutes of playing with it and a list of our print holdings resulted in giving me a list of around 200 or so actionable titles in our collection, which I passed on to our subject liaison librarians.

The guys who designed it are giving some webinar sessions, and I just attended one. Here are my notes, for what it’s worth. I suggest you participate in a webinar if you’re interested in it. The next one is tomorrow and there’s one on February 10th as well.


  • They have an organizational commitment to preservation: JSTOR, Portico, and Ithaka S+R
  • Libraries are under pressure to both decrease their print collections and to maintain some print copies for the library community as a whole
  • Individual libraries are often unable to identify materials that are sufficiently well-preserved elsewhere
  • The What to Withdraw framework is for general collections of scholarly journals, not monographs, rare books, newspapers, etc.
  • The report/framework is not meant to replace the local decision-making process

What to Withdraw Framework

  • Why do we need to preserve the print materials once we have a digital version?
    • Fix errors in the digital versions
    • Replace poor quality scans or formats
    • Inadequate preservation of the digital content
    • Unreliable access to the digital content
    • Also, local politics or research needs might require access to or preservation of the print
  • Once they developed the rationales, they created specific preservation goals for each category of preservation and then determined the level of preservation needed for each goal.
    • Importance of images in journals (the digitization standards for text is not the same as for images, particularly color images)
    • Quality of the digitization process
    • Ongoing quality assurance processes to fix errors
    • Reliability of digital access (business model, terms & conditions)
    • Digital preservation
  • Commissioned Candace Yano (operations researcher at UC Berkeley) to develop a model for copies needed to meet preservation goals, with the annual loss rate of 0.1% for a dark archive.
    • As a result, they found they needed only two copies to have a >99% confidence than they will still have remaining copies left in twenty years.
    • As a community, this means we need to be retaining at least two copies, if not more.

Decision-Support Tool (proof of concept)

  • JSTOR is an easy first step because many libraries have this resource and many own print copies of the titles in the collections and Harvard & UC already have dim/dark archives of JSTOR titles
  • The tool provides libraries information to identify titles held by Harvard & UC libraries which also have relatively few images

Future Plans

  • Would like to apply the tool to other digital collections and dark/dim archives, and they are looking for partners in this
  • Would also like to incorporate information from other JSTOR repositories (such as Orbis-Cascade)

IL2009: Retooling Technical Services for the Digital Environment

Speaker: Brad Eden

We’ve heard it all before — we’re doing more with less, our work is changing dramatically, but the skill sets needed have not changed.

There are several catalysts for the change in libraries. The current economic trends indicate that state support for higher education will dwindle to nothing. We need to find different funding sources for higher education. The original hope for the Google book project was that it would challenge publishers, but instead they’ve turned around and become a publisher themselves, so we need to be prepared to pay for and provide access for our users.

People think everything is on the internet, so getting funding for library spaces and collections is challenging, so we need to repurpose our existing spaces. We need to shift our spending resources to developing access to unique local collections.

We need to move from local to network level collaboration in metadata and resource sharing. We need to train our users on how to retrieve information through their mobile devices.

[We need a speaker who recognizes that his audience is mostly technical-oriented people who are ready to change as needed and don’t need a pep talk to do it.]

Speakers: Doris Helfer, Mary Woodley, & Helen Heinrich

Retirements and new hires provided impetuses to change, along with a need to revise processes to reflect how things had changed over the past couple of decades. They decided to do the analysis in-house because of the level of staff interest and experience. They set out several objectives: eliminate duplicate & unnecessary tasks, streamline workflows, leverage technology, and explore vendor services.

They reduced the number of steps in copy cataloging (stopped checking sources and editing TOC information). They gave up shelf listing and decided to deal with duplicate call numbers if they came up. Aim for one-touch handling. Let go of perfection.