Speakers: Mitzi M. Cole & Jeremy Gottwig
They wanted to create an author and publication repository. They maintain information about the authors with name variants and the codes for their divisions, in addition to identifying the relationships between the author and publication and among the authors themselves. The display of this information needed to be displayed with links to the full-text using their link resolver or by attaching full text files when allowed by copyright. Those needs helped identify requirements for the system they chose.
They were able to use the Homeland Security database of NASA employees to harvest information about the authors to begin populating that database. They then harvested data from a number of commercial sources to find their publications.
They developed crosswalks to move metadata from the various sources, and developed a utility to de-dupe the records and match author name variants. They also used the OpenURL standard to create full-text links.
They used Fedora Repository to store all of this. In order to use Fedora, you need to think in objects. Using relational queries, you can repurpose almost all of your data.
The author metadata was created using MADS in a separate tool and then exported to the format needed for Fedora. They used the local identification numbers for employees for the PID. On the public display of the author record, they show the author’s publications and the related authors (i.e. co-authors).
In the publications database, they used PUB_MD identifiers for publications to create unique PIDs. Unfortunately, there were problems with variations because they were pulling data from several sources.
The public end uses Drupal with Zen-based themes. For future consideration, they are looking at self-submission methods, Hydra, Islandora, BibApp, and whatever else may be developed which will plug into Fedora.
Talk to the end users early and often. We think like librarians and they don’t. Publisher metadata can be problematic — even they have trouble with author disambiguation. Automation can only go so far, and you still need human quality control. There’s always something new on the horizon.
Speaker: Amy Buckland
Many repositories are founded because someone at an institution wants one or sets it as a goal. But you need to know what that means.
Clifford Lynch calls it “a set of services that a university offers to the members of its community for the management and dissemination of digital materials created by the institution and its community members.”
If it’s a service, it impacts all areas of the library. It needs to be sustainable, and flexible as formats change. At McGill, they mediate the deposits in order to make sure copyrights are honored.
How do users find out about your IR? What kind of support will it have? You need to know this at the beginning so that you can market it to the stakeholders.
Who are the members of the community? Will students be included? At McGill, they are, and it’s a requirement for graduation that theses are deposited.
There is no way that what you are using for your IR will make it manageable and accessible. Be very wary of tools that promise this.
What goes in? What kinds of digital materials? Is it dark or open? Flat media (articles, papers, theses, etc) are easy, but it gets tricky with other objects and dynamic resources like blogs and websites.
The standards don’t stay standard for very long. Pick one and stick to it, because retro-fitting is challenging.
Who is a part of your institution? Do alumni get included? For public institutions, is the public included?
Institutional repository is a librarian term. Call it something else depending on who you talk to so that it has meaning to them.
“our goal is to take advantage of the web to make the knowledge created at our institution available to the world”
Speaker: Jim DelRosso
Digital projects are chronically understaffed. We need to be advocates to change that.
If you are going into digital repositories, copyright is going to be a pain in your butt.
Digitization isn’t magical. It takes people to figure out if it can be included and then do all the work to get it there. If it was possible to do this easily, libraries wouldn’t be doing it, Google would.
Step one: digitize
Step two: ???
Step three: Profit! (or for libraries, survive)
People are one of the toughest things to get support for. Tech is shiny and gets the attention, but it can be done without staff. Tell decision makers that technologies go obsolete, shiny gets dull, but people will last.
Our capacity to do works that will stagger our users and our peers is sometimes missed. There is right now more potential for the dissemination and preservation of knowledge than we have ever seen in human history. Go forth and do that!