NASIG 2010: Linked Data and Libraries

Presenter: Eric Miller, Zepheira, LCC

Nowadays, we understand what the web is and the impact it has had on information sharing, but before it was developed, it was in a “vague but exciting” stage and few understood it. When we got started with the web, we really didn’t know what we were doing, but more importantly, the web was being developed so that it was flexible enough for smarter and more creative people to do amazing things.

“What did your website look like when you were in the fourth grade?” Kids are growing up with the web and it’s hard for them to comprehend life without it. [Dang, I’m old.]

This talk will be about linked data, its legacy, and how libraries can lead linked data. We have a huge opportunity to weave libraries into the fabric of libraries, and vice versa.

About five years ago, the BBC started making their content available in a service that allowed others to use and remix the delivery of the content in new ways. Rather than developing alternative platforms and creating new spaces, they focus on generating good content and letting someone else frame it. Other sources like NPR, the World Bank, and Data.gov are doing the same sorts of things. Within the library community, these things are happening, as well. OCLC’s APIs are getting easier to use, and several national libraries are putting their OPACs on the web with APIs.

Obama’s open government initiative is another one of those “vague but exciting” things, and it charged agencies to come up with their own methods of making their content available via the web. Agencies are now struggling with the same issues and desires that libraries have been tackling for years. We need to recognize our potential role in moving this forward.

Linked data is a best practice for sharing data, connecting data, and uses the semantic web. Rather than leaving the data in their current formats, let’s put them together in ways they can be used on the wider web. It’s not the databases that make the web possible, it’s the web that makes the databases usable.

Human computation can be put to use in ways that assist computers to make information more usable. Captcha systems are great for blocking automated programs when needed, and by using human computation to decipher scanned text that is undecipherable by computers, ReCaptcha has been able to turn unusable data into a fantastic digital repository of old documents.

LEGOs have been around for decades, and their simple design ensures that new blocks work with old blocks. Most kids end up dumping all of their sets into one bucket, so no matter where the individual building blocks come from, they can be put together and rebuild in any way you can imagine. We could do this with our blocks of data, if they are designed well enough to fit together universally.

Our current applications, for the most part, are not designed to allow for the portability of data. We need to rethink application design so that the data becomes more portable. Web applications have, by neccesity, had to have some amount of portability. Users are becoming more empowered to use the data provided to them in their own way, and if they don’t get that from your service/product, then they go elsewhere.

Digital preservation repositories are discussing ways to open up their data so that users can remix and mashup data to meet their needs. This requires new ways of archiving, cataloging, and supplying the content. Allow users to select the facets of the data that they are interested in. Provide options for visualizing the raw data in a systematic way.

Linked data platforms create identifiers for every aspect of the data they contain, and these are the primary keys that join data together. Other content that is created can be combined to enhance the data generated by agencies and libraries, but we don’t share the identifiers well enough to allow others to properly link their content.

Web architecture starts with web identifiers. We can use URLs to identify things other than just documents, but we need to be consistent and we can’t change the URL structures if we want it to be persistent. A lack of trust in identifiers is slowing down linked data. Libraries have the opportunity to leverage our trust and data to provide control points and best practices for identifier curation.

A lot of work is happening in W3C. Libraries should be more involved in the conversation.

Enable human computation by providing the necessary identifiers back to data. Empower your users to use your data, and build a community around it. Don’t worry about creating the best system — wrap and expose your data using the web as a platform.