Speaker: Caitlin Trasande, Head of Research Policy, Digital Science
Social impact is the emerging bacon.
Digital Science supports and funds startups that build software for research. The scope is the full life cycle of research, ranging from reading literature to planning and conducting experiments to publishing and sharing the data. The disgrunterati are those who decided to be the last to complain about broken processes and build better products and models.
[insert overview of several projects funded by Digital Science]
Information may want to be free, but it needs to be accessible and understandable.
Speaker: T. Scott Plutchak, Director of Digital Data Curation Strategies, The University of Alabama at Birmingham
Data is the new bacon. Data is the hot buzzword in scholarly publishing. He is working on the infrastructure, services, and policies needed to manage data on an institutional level.
Concern about data has been around for a long time. NIH developed their first policy in 2003, but it was pretty weak. Things got serious when the public access policy became mandatory in 2009. NSF developed a data management policy in 2011, which got a little more attention.
A scholarly publishing roundtable was created in 2009, reporting in 2010, made up of university administrators, librarians, publishers, and researchers. They recommended flexible policies for each agency, developed in collaboration with their consitutencies.
Libraries should be thinking about how and where and what kinds of data they should store and manage.
My small liberal arts university probably will have to do some things with this, but not to the extent he’s talking about. This is an R1 library problem, not a library problem at large. Yet.
This is going to be long and not my usual style of conference notetaking. Because this was an unconference, there really wasn’t much in the way of prepared presentations, except for the lightening talks in the morning. What follows below the jump is what I captured from the conversations, often simply questions posed that were left open for anyone to answer, or at least consider.
Some of the good aspects of the unconference style was the free-form nature of the discussions. We generally stayed on topic, but even when we didn’t, it was about a relevant or important thing that lead to the tangents, so there were still plenty of things to take away. However, this format also requires someone present who is prepared to seed the conversation if it lulls or dies and no one steps in to start a new topic.
Also, if a session is designed to be a conversation around a topic, it will fall flat if it becomes all about one person or the quirks of their own institution. I had to work pretty hard on that one during the session I led, particularly when it seemed that the problem I was hoping to discuss wasn’t an issue for several of the folks present because of how they handle the workflow.
Some of the best conversations I had were during the gathering/breakfast time as well as lunch, lending even more to the unconference ethos of learning from each other as peers.
Don’t try to write a book about fast moving subjects.
He was trying to capture the nature of our relationship to Google. It provides us with a services that are easy to use, fairly dependable, and well designed. However, that level of success can breed hubris. He was interested in how this drives the company to its audacious goals.
It strikes him that what Google claims to be doing is what librarians have been doing for hundreds of years already. He found himself turning to the core practices of librarians as a guideline for assessing Google.
Why is Google interested in so much stuff? What is the payoff to organizing the world’s information and making it accessible?
Big data is not a phrase that they use much, but the notion is there. More and faster equals better. Google is in the prediction/advertising business. The Google books project is their attempt to reverse engineer the sentence. Knowing how sentences work, they can simulate how to interpret and create sentences, which would be a simulation of artificial intelligence.
The NSA’s deals that give them a backdoor to our data services creates data insecurity, because if they can get in, so can the bad guys. Google keeps data about us (and has to turn it over when asked) because it benefits their business model, unlike libraries who don’t keep patron records in order to protect their privacy.
Big data means more than a lot of data. It means that we have so many instruments to gather data, cheap/ubiquitous cameras and microphones, GPS devices that we carry with us, credit card records, and more. All of these ways of creating feed into huge servers that can store the data with powerful algorithms that can analyze it. Despite all of this, there is no policy surrounding this, nor conversations about best ways to manage this in light of the impact on personal privacy. There is no incentive to curb big data activities.
Scientists are generally trained to understand that correlation is not causation. We seem to be happy enough to draw pictures with correlation and move on to the next one. With big data, it is far too easy to stop at correlation. This is a potentially dangerous way of understanding human phenomenon. We are autonomous people.
The panopticon was supposed to keep prisoners from misbehaving because they assumed they were always being watched. Foucault described the modern state in the 1970s as the panopticon. However, at this point, it doesn’t quite match. We have a cryptopticon, because we aren’t allowed to know when we are being watched. It wants us to be on our worst behavior. How can we inject transparency and objectivism into this cryptopticon?
Those who can manipulate the system will, but those who don’t know how or that it is happening will be negatively impacted. If bad credit can get you on the no-fly list, what else may be happening to people who make poor choices in one aspect of their lives that they don’t know will impact other aspects? There is no longer anonymity in our stupidity. Everything we do, or nearly so, is online. Mistakes of teenagers will have an impact on their adult lives in ways we’ve never experienced before. Our inability to forget renders us incapable of looking at things in context.
Updates from Serials Solutions – mostly Resource Manager (Ashley Bass):
Keep up to date with ongoing enhancements for management tools (quarterly releases) by following answer #422 in the Support Center, and via training/overview webinars.
Populating and maintaining the ERM can be challenging, so they focused a lot of work this year on that process: license template library, license upload tool, data population service, SUSHI, offline date and status editor enhancements (new data elements for sort & filter, new logic, new selection elements, notes), and expanded and additional fields.
Workflow, communication, and decision support enhancements: in context help linking, contact tool filters, navigation, new Counter reports, more information about vendors, Counter summary page, etc. Her most favorite new feature is “deep linking” functionality (aka persistent links to records in SerSol). [I didn’t realize that wasn’t there before — been doing this for my own purposes for a while.]
Next up (in two weeks, 4th quarter release): new alerts, resource renewals feature (reports! and checklist!, will inherit from Admin data), Client Center navigation improvements (i.e. keyword searching for databases, system performance optimization), new license fields (images, public performance rights, training materials rights) & a few more, Counter updates, SUSHI updates (making customizations to deal with vendors who aren’t strictly following the standard), gathering stats for Springer (YTD won’t be available after Nov 30 — up to Sept avail now), and online DRS form enhancements.
In the future: license API (could allow libraries to create a different user interface), contact tools improvements, interoperability documentation, new BI tools and reporting functionality, and improving the Client Center.
Also, building a new KB (2014 release) and a web-scale management solution (Intota, also coming 2014). They are looking to have more internal efficiencies by rebuilding the KB, and it will include information from Ulrich’s, new content types metadata (e.g. A/V), metadata standardization, industry data, etc.
Summon Updates (Andrew Nagy):
I know very little about Summon functionality, so just listened to this one and didn’t take notes. Take-away: if you haven’t looked at Summon in a while, it would be worth giving it another go.
Goal #1: Allow users to easily link to full-text resources. Solution: Go beyond the out-of-the box 360 Link display.
Goal #2: Allow users to report problems or contact library staff at the point of failure. Solution: eresources problem report form
They created the eresources problem report form using Drupal. The fields include contact information, description of the resource, description of the problem, and the ability to attach a screenshot.
Some enhancements included: making the links for full-text (article & journal) butttons, hiding additional help information and giving some hover-over information, parsing the citation into the problem report page, and moving the citation below the links to full-text. For journal citations with no full-text, they made the links to the catalog search large buttons with more text detail in them.
Some of the challenges of implementing these changes is the lack of a test environment because of the limited preview capablities in 360 Link. Any changes actually made required an overnight refresh and they would be live, opening the risk of 24 hour windows of broken resource links. So, they created their own test environment by modifying test scenarios into static HTML files and wrapping them in their own custom PHP to mimic the live pages without having to work with the live pages.
[At this point, it got really techy and lost me. Contact the presenters for details if you’re interested. They’re looking to go live with this as soon as they figure out a low-use time that will have minimal impact on their users.]
Customizing 360 Link menu with jQuery (Laura Wrubel, George Washington University)
They wanted to give better visual clues for users, emphasize the full-text, have more local control over linkns, and visual integration with other library tools so it’s more seamless for users.
They started with Reidsma’s code, then then forked off from it. They added a problem link to a Google form, fixed ebook chapter links and citation formatting, created conditional links to the catalog, and linked to their other library’s link resolver.
They hope to continue to tweak the language on the page, particularly for ILL suggestion. The coverage date is currently hidden behind the details link, which is fine most of the time, but sometimes that needs to be displayed. They also plan to load the print holdings coverage dates to eliminate confusion about what the library actually has.
In the future, they would rather use the API and blend the link resolver functionality with catalog tools.
Custom document delivery services using 360 Link API (Kathy Kilduff, WRLC)
License information for course reserves for faculty (Shanyun Zhang, Catholic University)
Included course reserve in the license information, but then it became an issue to convey that information to the faculty who were used to negotiating it with publishers directly. Most faculty prefer to use Blackboard for course readings, and handle it themselves. But, they need to figure out how to incorporate the library in the workflow. Looking for suggestions from the group.
Advanced Usage Tracking in Summon with Google Anaytics (Kun Lin, Catholic University)
Use of ERM/KB for collection analysis (Mitzi Cole, NASA Goddard Library)
Used the overlap analysis to compare print holdings with electronic and downloaded the report. The partial overlap can actually be a full overlap if the coverage dates aren’t formatted the same, but otherwise it’s a decent report. She incorporated license data from Resource Manager and print collection usage pulled from her ILS. This allowed her to create a decision tool (spreadsheet), and denoted the print usage in 5 year increments, eliminating previous 5 years use with each increment (this showed a drop in use over time for titles of concern).
Discussion of KnowledgeWorks Management/Metadata (Ben Johnson, Lead Metadata Librarian, SerialsSolutions)
After they get the data from the provider or it is made available to them, they have a system to automatically process the data so it fits their specifications, and then it is integrated into the KB.
They deal with a lot of bad data. 90% of databases change every month. Publishers have their own editorial policies that display the data in certain ways (e.g., title lists) and deliver inconsistent, and often erroneous, metadata. The KB team tries to catch everything, but some things still slip through. Throught the data ingestion process, they apply rules based on past experience with the data source. After that, the data is normalized so that various title/ISSN/ISBN combinations can be associated with the authority record. Finally, the data is incorporated into the KB.
Authority rules are used to correct errors and inconsistencies. Rule automatically and consistently correct holdings, and they are often used to correct vendor reporting problems. Rules are condified for provider and database, with 76,000+ applied to thousands of databases, and 200+ new rules are added each month.
Why does it take two months for KB data to be corrected when I report it? Usually it’s because they are working with the data providers, and some respond more quickly than others. They are hoping that being involved with various initiatives like KBART will help fix data from the provider so they don’t have to worry about correcting it for us, but also making it easier to make those corrections by using standards.
Client Center ISSN/ISBN doesn’t always work in 360 Links, which may have something to do with the authority record, but it’s unclear. It’s possible that there are some data in the Client Center that haven’t been normalized, and could cause this disconnect. And sometimes the provider doesn’t send both print and electronic ISSN/ISBN.
What is the source for authority records for ISSN/ISBN? LC, Bowker, ISSN.org, but he’s not clear. Clarification: Which field in the MARC record is the source for the ISBN? It could be the source of the normalization problem, according to the questioner. Johnson isn’t clear on where it comes from.