The Jester's Case for Fedora

Peter Murray has written a series of pieces about the Fedora digital repository system over at the Disruptive Library Technology Jester blog.

In the first piece, On the Need for a General Purpose Digital Object Repository, it is argued that having a unified repository simplifies management of information systems or “silos.”  For instance, there needn’t be duplication of workflows or synchronization of content if a number of an organization’s repositories, digital libraries, electronic journals, course management systems and so on are all built atop a robust institutional repository.  A unified repository is useful if one desires a search across previously disparate digital projects or collections, if one wishes to eliminate redundancies in coding, if one intends to have a particular object, collection of objects, or part of an object shared across different systems – e.g., a journal article repurposed in a course management system and deposited into an open archive.  With an open, flexible repository, like Fedora, such a configuration is possible assuming your organization, unit, or consortium has someone to devote to managing and customizing the repository. 

An advantage of using the Fedora system, as outlined in Why Fedora? Because You Don’t Need Fedora, is that due to modular design and adherence to more or less open standards, one is not necessarily wedded to Fedora for the foreseeable future.  Items in a Fedora repository are serialized as XML objects, either in the Fedora-METS or FOXML format.  While some of this information is copied into a relational database system and an RDF triplestore for speed and convenience, it is all intact within the serialized XML objects which reside in a predictable directory hierarchy on the local filesystem.  There are at least two advantages to this design:

  1. Should Fedora experience a catastrophic system glitch, one may rebuild the entire system via a built-in utility (cleverly named ”fedora-rebuild”) that goes through the objects on the filesystem and restocks the database and triplestore.  And assuming that the administrator of the system is worth his salt, there should be regular full backups of the filesystem, so the entire repository should be rebuildable.  As Peter notes, a simple copy of the filesystem on which the XML objects reside is a fine practice in a larger digital preservation strategy.
  2. If one decides to move away from Fedora to the Next Best Thingâ„¢, it should be relatively simple to migrate content from Fedora into the new system because of Fedora’s storage of all objects (and associated metadata, files, and disseminators) to the filesystem as serialized XML.  All one needs, perhaps, is a set of funky XSLT scripts to massage the objects into a format that works with the new system and voila.  (That is a gross oversimplification, but the point remains that open standards, simple file operations, and XML markup do make for more orderly migrations than black boxes, complex datastores, and loose coupling of information.)
  3. Having one’s objects stored as XML on the filesystem also opens up opportunities to see how tools which act thereupon might be glued into the repository infrastructure.  One such example might be for an XML-aware search engine (such as amberfish, Lucene, or Zebra).  Since you’ve got low-level access to these files, it would be fairly simple to tack on a search & indexing system that is independent of your choice of repository.

The third piece, Thinking about Our Fedora Disseminators, highlights Fedora as a repository system that’s put real emphasis on digital preservation.  While other repository systems allow for preservation of an object and its metadata, Fedora grants one the ability to preserve the behavior of digital objects and the datastreams thereof, a potential approach to the issue of format migration/emulation.  Through a dissemination abstraction (the “behavior definition”) one might apply the same abstract behaviors to items in different formats, saving one the time of defining redundant behaviors.  My explanation is rather vague and incomplete, so I would encourage you to read Peter’s third piece in detail.  The point is that “for each record, the application simply asked the repository to deliver a thumbnail of the object. And the repository, regardless of media type, delivered one.” 

Taken together, Peter makes a strong case for Fedora as a fine back-end for a unified, multi-purpose repository.  Unlike other repository systems that focus more on the front-end, Fedora focuses on being the plumbing, the “digital library operating system” as Ron Jantz calls it.    Were I not already a Fedora enthusiast, I would find it quite difficult not to consider Fedora (or something like it, such as LANL’s aDORe Archive) at MPOW.  Now if someone can send me some hints on drumming up institutional support…

 

Comments