data – eclectic librarian

SSP/NASIG – What Do All of these Changes Mean for Vendors?

Data storage - old and new — data sharing

Speaker: Caitlin Trasande, Head of Research Policy, Digital Science

Social impact is the emerging bacon.

Digital Science supports and funds startups that build software for research. The scope is the full life cycle of research, ranging from reading literature to planning and conducting experiments to publishing and sharing the data. The disgrunterati are those who decided to be the last to complain about broken processes and build better products and models.

[insert overview of several projects funded by Digital Science]

Information may want to be free, but it needs to be accessible and understandable.

SSP/NASIG – Data Wranglers in LibraryLand—Finding Opportunities in the Changing Policy Landscape

All You Can Eat Bacon! — all you can eat…data?

Speaker: T. Scott Plutchak, Director of Digital Data Curation Strategies, The University of Alabama at Birmingham

Data is the new bacon. Data is the hot buzzword in scholarly publishing. He is working on the infrastructure, services, and policies needed to manage data on an institutional level.

Concern about data has been around for a long time. NIH developed their first policy in 2003, but it was pretty weak. Things got serious when the public access policy became mandatory in 2009. NSF developed a data management policy in 2011, which got a little more attention.

A scholarly publishing roundtable was created in 2009, reporting in 2010, made up of university administrators, librarians, publishers, and researchers. They recommended flexible policies for each agency, developed in collaboration with their consitutencies.

Libraries should be thinking about how and where and what kinds of data they should store and manage.

My small liberal arts university probably will have to do some things with this, but not to the extent he’s talking about. This is an R1 library problem, not a library problem at large. Yet.

#ERcamp13 at George Washington University

This is going to be long and not my usual style of conference notetaking. Because this was an unconference, there really wasn’t much in the way of prepared presentations, except for the lightening talks in the morning. What follows below the jump is what I captured from the conversations, often simply questions posed that were left open for anyone to answer, or at least consider.

Some of the good aspects of the unconference style was the free-form nature of the discussions. We generally stayed on topic, but even when we didn’t, it was about a relevant or important thing that lead to the tangents, so there were still plenty of things to take away. However, this format also requires someone present who is prepared to seed the conversation if it lulls or dies and no one steps in to start a new topic.

Also, if a session is designed to be a conversation around a topic, it will fall flat if it becomes all about one person or the quirks of their own institution. I had to work pretty hard on that one during the session I led, particularly when it seemed that the problem I was hoping to discuss wasn’t an issue for several of the folks present because of how they handle the workflow.

Some of the best conversations I had were during the gathering/breakfast time as well as lunch, lending even more to the unconference ethos of learning from each other as peers.

Anyway, here are my notes.

Continue reading “#ERcamp13 at George Washington University”

NASIG 2013: Knowledge and Dignity in the Era of Big Data

CC BY 2.0 2013-06-10 — “Big Data” by JD Hancock

Speaker: Siva Vaidhyanathan

Don’t try to write a book about fast moving subjects.

He was trying to capture the nature of our relationship to Google. It provides us with a services that are easy to use, fairly dependable, and well designed. However, that level of success can breed hubris. He was interested in how this drives the company to its audacious goals.

It strikes him that what Google claims to be doing is what librarians have been doing for hundreds of years already. He found himself turning to the core practices of librarians as a guideline for assessing Google.

Why is Google interested in so much stuff? What is the payoff to organizing the world’s information and making it accessible?

Big data is not a phrase that they use much, but the notion is there. More and faster equals better. Google is in the prediction/advertising business. The Google books project is their attempt to reverse engineer the sentence. Knowing how sentences work, they can simulate how to interpret and create sentences, which would be a simulation of artificial intelligence.

The NSA’s deals that give them a backdoor to our data services creates data insecurity, because if they can get in, so can the bad guys. Google keeps data about us (and has to turn it over when asked) because it benefits their business model, unlike libraries who don’t keep patron records in order to protect their privacy.

Big data means more than a lot of data. It means that we have so many instruments to gather data, cheap/ubiquitous cameras and microphones, GPS devices that we carry with us, credit card records, and more. All of these ways of creating feed into huge servers that can store the data with powerful algorithms that can analyze it. Despite all of this, there is no policy surrounding this, nor conversations about best ways to manage this in light of the impact on personal privacy. There is no incentive to curb big data activities.

Scientists are generally trained to understand that correlation is not causation. We seem to be happy enough to draw pictures with correlation and move on to the next one. With big data, it is far too easy to stop at correlation. This is a potentially dangerous way of understanding human phenomenon. We are autonomous people.

The panopticon was supposed to keep prisoners from misbehaving because they assumed they were always being watched. Foucault described the modern state in the 1970s as the panopticon. However, at this point, it doesn’t quite match. We have a cryptopticon, because we aren’t allowed to know when we are being watched. It wants us to be on our worst behavior. How can we inject transparency and objectivism into this cryptopticon?

Those who can manipulate the system will, but those who don’t know how or that it is happening will be negatively impacted. If bad credit can get you on the no-fly list, what else may be happening to people who make poor choices in one aspect of their lives that they don’t know will impact other aspects? There is no longer anonymity in our stupidity. Everything we do, or nearly so, is online. Mistakes of teenagers will have an impact on their adult lives in ways we’ve never experienced before. Our inability to forget renders us incapable of looking at things in context.

Mo Data, Mo Problems

Serials Solutions DC Users Group Meeting 2012

Meeting Table by Mo Elnadi — “Meeting Table” by Mo Elnadi

Updates from Serials Solutions – mostly Resource Manager (Ashley Bass):

Keep up to date with ongoing enhancements for management tools (quarterly releases) by following answer #422 in the Support Center, and via training/overview webinars.
Populating and maintaining the ERM can be challenging, so they focused a lot of work this year on that process: license template library, license upload tool, data population service, SUSHI, offline date and status editor enhancements (new data elements for sort & filter, new logic, new selection elements, notes), and expanded and additional fields.
Workflow, communication, and decision support enhancements: in context help linking, contact tool filters, navigation, new Counter reports, more information about vendors, Counter summary page, etc. Her most favorite new feature is “deep linking” functionality (aka persistent links to records in SerSol). [I didn’t realize that wasn’t there before — been doing this for my own purposes for a while.]
Next up (in two weeks, 4th quarter release): new alerts, resource renewals feature (reports! and checklist!, will inherit from Admin data), Client Center navigation improvements (i.e. keyword searching for databases, system performance optimization), new license fields (images, public performance rights, training materials rights) & a few more, Counter updates, SUSHI updates (making customizations to deal with vendors who aren’t strictly following the standard), gathering stats for Springer (YTD won’t be available after Nov 30 — up to Sept avail now), and online DRS form enhancements.
In the future: license API (could allow libraries to create a different user interface), contact tools improvements, interoperability documentation, new BI tools and reporting functionality, and improving the Client Center.
Also, building a new KB (2014 release) and a web-scale management solution (Intota, also coming 2014). They are looking to have more internal efficiencies by rebuilding the KB, and it will include information from Ulrich’s, new content types metadata (e.g. A/V), metadata standardization, industry data, etc.

Summon Updates (Andrew Nagy):

I know very little about Summon functionality, so just listened to this one and didn’t take notes. Take-away: if you haven’t looked at Summon in a while, it would be worth giving it another go.

360 Link Customization via JavaScript and CSS (Liz Jacobson & Terry Brady, Georgetown University):

Goal #1: Allow users to easily link to full-text resources. Solution: Go beyond the out-of-the box 360 Link display.
Goal #2: Allow users to report problems or contact library staff at the point of failure. Solution: eresources problem report form

They created the eresources problem report form using Drupal. The fields include contact information, description of the resource, description of the problem, and the ability to attach a screenshot.

When they evaluated the slightly customized out of the box 360 Link page, they determined that it was confusing to users, with too many options and confusing links. So, they took some inspiration from other libraries (Matthew Reidsma’s GVUS jQuery code available on Github) and developed a prototype that uses custom JavaScript and CSS to walk the user through the process.

Some enhancements included: making the links for full-text (article & journal) butttons, hiding additional help information and giving some hover-over information, parsing the citation into the problem report page, and moving the citation below the links to full-text. For journal citations with no full-text, they made the links to the catalog search large buttons with more text detail in them.

Some of the challenges of implementing these changes is the lack of a test environment because of the limited preview capablities in 360 Link. Any changes actually made required an overnight refresh and they would be live, opening the risk of 24 hour windows of broken resource links. So, they created their own test environment by modifying test scenarios into static HTML files and wrapping them in their own custom PHP to mimic the live pages without having to work with the live pages.

[At this point, it got really techy and lost me. Contact the presenters for details if you’re interested. They’re looking to go live with this as soon as they figure out a low-use time that will have minimal impact on their users.]

Customizing 360 Link menu with jQuery (Laura Wrubel, George Washington University)

They wanted to give better visual clues for users, emphasize the full-text, have more local control over linkns, and visual integration with other library tools so it’s more seamless for users.

They started with Reidsma’s code, then then forked off from it. They added a problem link to a Google form, fixed ebook chapter links and citation formatting, created conditional links to the catalog, and linked to their other library’s link resolver.

They hope to continue to tweak the language on the page, particularly for ILL suggestion. The coverage date is currently hidden behind the details link, which is fine most of the time, but sometimes that needs to be displayed. They also plan to load the print holdings coverage dates to eliminate confusion about what the library actually has.

In the future, they would rather use the API and blend the link resolver functionality with catalog tools.

Custom document delivery services using 360 Link API (Kathy Kilduff, WRLC)

They facilitate inter-consortial loans (Consortium Loan Service), and originally requests were only done through the catalog. When they started using SFX, they added a link there, too. Now that they have 360 Link, they still have a link there, but now the request form is prepopulated with all of the citation information. In the background, they are using the API to gather the citation information, as well as checking to see if there are terms of use, and then checking to see if there are ILL permissions listed. They provide a link to the full-text in the staff client developed for the CLS if the terms of use allow for ILL of the electronic copy. If there isn’t a copy available in WRLC, they forward the citation information to the user’s library’s ILL form.

License information for course reserves for faculty (Shanyun Zhang, Catholic University)

Included course reserve in the license information, but then it became an issue to convey that information to the faculty who were used to negotiating it with publishers directly. Most faculty prefer to use Blackboard for course readings, and handle it themselves. But, they need to figure out how to incorporate the library in the workflow. Looking for suggestions from the group.

Advanced Usage Tracking in Summon with Google Anaytics (Kun Lin, Catholic University)

In order to tweak user experience, you need to know who, what, when, how, and most important, what were they thinking. Google Anayltics can help figure those things out in Summon. Parameters are easy ways to track facets, and you can use the data from Google Analytics to figure out the story based on that. Tracking things the “hard way,” you can use the conversion/goal function of Google Analytics. But, you’ll need to know a little about coding to make it work, because you have to add some javascripts to your Summon pages.

Use of ERM/KB for collection analysis (Mitzi Cole, NASA Goddard Library)

Used the overlap analysis to compare print holdings with electronic and downloaded the report. The partial overlap can actually be a full overlap if the coverage dates aren’t formatted the same, but otherwise it’s a decent report. She incorporated license data from Resource Manager and print collection usage pulled from her ILS. This allowed her to create a decision tool (spreadsheet), and denoted the print usage in 5 year increments, eliminating previous 5 years use with each increment (this showed a drop in use over time for titles of concern).

Discussion of KnowledgeWorks Management/Metadata (Ben Johnson, Lead Metadata Librarian, SerialsSolutions)

After they get the data from the provider or it is made available to them, they have a system to automatically process the data so it fits their specifications, and then it is integrated into the KB.

They deal with a lot of bad data. 90% of databases change every month. Publishers have their own editorial policies that display the data in certain ways (e.g., title lists) and deliver inconsistent, and often erroneous, metadata. The KB team tries to catch everything, but some things still slip through. Throught the data ingestion process, they apply rules based on past experience with the data source. After that, the data is normalized so that various title/ISSN/ISBN combinations can be associated with the authority record. Finally, the data is incorporated into the KB.

Authority rules are used to correct errors and inconsistencies. Rule automatically and consistently correct holdings, and they are often used to correct vendor reporting problems. Rules are condified for provider and database, with 76,000+ applied to thousands of databases, and 200+ new rules are added each month.

Why does it take two months for KB data to be corrected when I report it? Usually it’s because they are working with the data providers, and some respond more quickly than others. They are hoping that being involved with various initiatives like KBART will help fix data from the provider so they don’t have to worry about correcting it for us, but also making it easier to make those corrections by using standards.

Client Center ISSN/ISBN doesn’t always work in 360 Links, which may have something to do with the authority record, but it’s unclear. It’s possible that there are some data in the Client Center that haven’t been normalized, and could cause this disconnect. And sometimes the provider doesn’t send both print and electronic ISSN/ISBN.

What is the source for authority records for ISSN/ISBN? LC, Bowker, ISSN.org, but he’s not clear. Clarification: Which field in the MARC record is the source for the ISBN? It could be the source of the normalization problem, according to the questioner. Johnson isn’t clear on where it comes from.

my presentation for Internet Librarian 2012

Apologies for the delay. It took longer than I expected to have the file and a stable internet connection at the same time. You’ll find the notes on the SlideShare page.

Creative Visualizations of Library Data (IL 2012) from Anna Creech

being a student is time-consuming

I need to find a happy medium between self-paced instruction and structured instruction.

I signed up for a Coursera class on statistics for social science researchers because I wanted to learn how to better make use of library data and also how to use the open source program for statistical computing, R. The course information indicated I’d need to plan for 4-6 hours per week, which seemed doable, until I got into it.

The course consists of several lecture videos, most of which include a short “did you get the main concepts” multiple-choice quiz at the end. Each week there is an assignment and graded quiz, and of course a midterm and final.

It didn’t help that I started off behind, getting through only a lecture or two before the end of the first week, and missing the deadline for having the first assignment and quiz graded. I scrambled to catch up the second week, but once again couldn’t make it through the lectures in time.

That’s when I realized that it was going to take much longer than projected to keep up with this course. A 20-30 min lecture would take me 45-60 min to get through because I was constantly having to pause and write notes before the lecturer went on to the next concept. And since I was using Microsoft OneNote to keep and organize my notes, anything that involved a formula took longer to copy down.

By the end of the third week, I was still a few lectures away from finishing the second week, and I could see that it would take more time than I had to keep going, but I decided to go another week and do what I could.

That was this week, and I haven’t had time to make any more progress than where I was last week. With no prospect of catching up before the midterm deadline, I decided to withdraw from the course.

This makes me both disappointed in myself and in the structure of the course. I hate quitting, and I really want to learn the stuff. But, as I fell further and further behind, it became easier to put it off and focus on other overdue items on my task list, and thus compounding the problem.

The instructor for the course was easy to follow, and I like his lecture style, but when it came time to do the graded quiz and assignment, I realized I clearly had not understood everything, or he expected me to have more of a background in the field than a novice. It also seemed like the content was geared towards a 12 week course and with this being only 8 weeks, rather than reduce the content accordingly, he was cramming it all into those 8 weeks.

Having deadlines was a great motivation to keep up with the course, which I haven’t had when I’ve tried to learn on my own. It was the volume of content to absorb between those deadlines that tripped me up. I need to find a happy medium between self-paced instruction and structured instruction.

ER&L 2012: New ARL Best Practices in Fair Use

laptop lid comic relief + Dawn and Drew — photo by YayAdrian

Speaker: Brandon Butler (Peter Jaszi was absent)

The purpose of copyright is to promote the creation of culture. It is not to ensure that authors get a steady stream of income no matter what, or to pay them back for the hard work they do, or to show our respect for the value they add to society. It’s about getting the stuff into the culture, and giving the creators enough incentive to do it.

One way it does it is to give creators exclusive rights for a limited period of time. The limit encourages new makers to use and remix existing culture.

Fair use is the biggest balancing feature of copyright. It ensures that the rights provided to the creators don’t become oppressive to the users. Fair use is the legal, unauthorized use of copyrighted material… under some circumstances. And we’ve spent generations trying to figure out which circumstances apply.

Fair use is a space for creativity. It gives you the leeway to take the culture around you and incorporate it into your work. It allows you to quote other scholarship in your research. It allows you to incorporate art into new works.

There are four factors of fair use. Every judge should consider the reason for the use, the kind of work used, the amount used, and the effect on the market. But it doesn’t tell the judges how much to consider or which is more important. The good news is that judges love balancing features, and the Supreme Court has determined that fair use protects free speech. However, since copyright is automatically conferred as soon as the creation is fixed, the fair use judicial interpretations have shifted greatly since 1990 to be more in the balance of the users in certain circumstances.

Without fair use, copyright would be in conflict with the 1st Amendment.

Judges want to know if the use is transformative (i.e. for a new purpose, context, audience, insight) and if you used the right amount in that transformative process. For example, parody is making fun of the original work, not just reusing it. An appropriate amount can refer to both the quantity of the original in the transformative work, and also the audience who received your transformative work. For example, the many photographic memes that take pictures and alter them to fit a theme, like One Tiny Hand.

Judges care about you and what you think is fair. There is a pattern of judges deferring to the well-articulated norms of a practice community.

Best practices codes are a logical outgrowth of the things the communities have articulated as their values and the things they would consider to be legitimate transformative works. Documentary filmmakers, scholars, media literacy teachers, online video, dance collections, open course ware, poets… many groups are creating best practices for fair use.

The documentary filmmakers have had a code of best practice for a long time. They realized that without it, they were limiting themselves too much in what they could create. Once they codified their values, more broadcast sources were willing to take films and new kinds of films were being made. Insurers of errors and omissions insurance were able to accept fair use claims, and lawyers use the Statement to build their own best practices in the relevant areas.

Keep in mind, though, that these are best practices and not guidelines. Principles, not rules. Limitations, not bans. Reasoning, not rote. The numerical limits we once followed are not the law, and we need to keep them fresh to be relevant.

Licensing is a different thing all together. This means you may have less rights in some instances, and more rights in others, regardless of fair use.

For libraries, fair use enables our mission to serve knowledge past, present, and future. We have a duty to make copyrighted works real and accessible in the way people use things now. What will libraries be in the future? How will we stay relevant? We need to have some flexibility with the stuff we have in our collections.

Many librarians are discouraged. Insecurity and hesitation equal staff costs to hire someone to clear copyright questions. Fair use would help, but it’s underused. Risk aversion subsumes fair use analysis.

The ARL document took a lot of people from diverse institutions and many hours of discussion to create it, and it was reviewed by several legal experts. It’s not risk-free, since it would need to stand up in court first (and there are always lawsuit-happy people), but it seems okay based on past judgement.

They hope it will put legal risks into perspective, and will give librarians a tool to go to general counsels and administrations and let them know things are changing. It considered the views of librarians and their values, and they also hope that people will speak out publicly that they support the Code.

Fair use applies in these common situtations:

course reserves — digital access to teaching materials for students and faculty, although it should be limited to access by only the appropriate audience
both physical and virtual exhibits — if it highlights a theme or commonality, you’re doing something new to help people understand what’s in your library
digitizing to preserve at-risk items — you’re not a publisher or scam artist, you’re a librarian making sure the things are accessible over time (like VHS tapes)
digitizing special collections and archives — you’re keeping it alive
access to research and teaching materials for disabled users — i.e. Daisy
institutional repositories
creation of search indexes
making topically-based collections of web-based materials

Practice makes practice. It won’t work if you don’t use it.

ejournal use by subject

A couple of weeks ago I blogged about an idea I had that involved combining subject data from SerialsSolutions with use data for our ejournals to get a broad picture of ejournal use by subject. It took a bit of tooling around with Access tables and queries, including making my first crosstab, but I’ve finally got the data put together in a useful way.

It’s not quite comprehensive, since it only covers ejournals for which SerialsSolutions has assigned a subject, which also have ISSNs, and are available through sources that provide COUNTER or similar use statistics. But, it’s better than nothing.

mapping ejournal use to subject areas

I had a thought last night as I was trying to fall asleep: what if I took our data on demand file that includes subjects and mashed it up with our consolidated JR1 use statistics? Could I get a better picture of the disciplines at my institution that are using ejournals? It’s definitely something worth looking at.

Samantha Brennan on I’ve been published!November 30, 2020
What a fascinating sport. We'd love to have you back anytime! Welcome!
FY19 conferences, an update – eclectic librarian on FY19 conferencesJanuary 4, 2019
[…] was very excited to finally have approval to attend the Timberline Acquisitions Institute this year, but turns out […]
quantified self, an addendum – eclectic librarian on the quantified selfMarch 27, 2018
[…] I shared a list of apps and tools I’m using to monitor and track things, mainly health-related. Well, my…