Yesterday, I was thinking some more about uses for RSS with library OPACs. The idea of having an RSS feed for new books continues to nag me, but without more technical knowledge, I know this is something that I couldn’t make work. Then something clicked, and I called up our library systems administrator to ask him a few questions. As I suspected, our new books list in the OPAC is a text file that is generated by a script that searches the catalog database once a week. I began to ponder what it would take to convert that flat file into XML, and if would it be possible to automate that process.
I grabbed a copy of the flat file from the server and took a look at it, just to see what was there. First off, I realized that there was quite a bit of extraneous information that will need to be stripped out. That could be done easily by hand with a few search & replace commands and some spreadsheet manipulation. So, the easy way out would be to do it all by hand every week. Here* is what I was able to do after some trial and error, working with books added in the previous week.
A harder route would be to put together a program that would take the cleaned up but still raw text file and convert each line into <item> entries, with appropriate fields for <title> (book title), <description> (publication information & location), <category> (collection), etc. This new XML file would replace the old one every week. If I knew any Perl or ColdFusion, I’m certain that I could whip something up fairly quickly.
The ideal option would be to write a program that goes into the catalog daily and pulls out information about new books added and generates the XML file from that. I suspect that it would work similarly to Michael Doran‘s New Books List program, but would go that extra step of converting the information in to RSS-friendly XML.
If anyone knows of some helper programs or if someone out there in library land is developing a program like this, please let me know.
* File is now missing. I think I may have delete it by accident. 1/13/05