I2: Resource Description

I can hardly believe it’s been eight months since I last wrote about the NISO I2 project. A lot has changed since then ((I’ve moved and changed jobs, in fact)). I continue to work on I2 however; they won’t get rid of me that easily.

In the last post, I wrote:

The next step is to build upon the report to draw yet more conclusions from the data — there’s an awful lot there — and flesh out some repository use cases for institutional identifiers. The I2 core group is moving quickly towards finalizing identifier metadata elements so that a standard may be drafted, and I think having some use cases documented will help drive the standard in a direction the community can get behind.
Since that time, the three scenario groups – Electronic Resources; Institutional Repositories and Learning Management Systems; and Library Resource Management – have concluded their work. The work of the scenario groups included surveys of over 300 people working in these fields. The survey results have been analyzed and reports were posted on the NISO website. These reports have been used to flesh out use cases for an institutional identifier. Upon completion of this work, the scenario groups were disbanded and work continued in a broader I2 working group.

The I2 working group has concentrated its work on analysis of similar standards and, as I alluded to earlier, significant effort has gone into defining core metadata to identify institutions, such as institution name, institution type, location information, variant identifiers, domain name(s), URL(s), and (optionally-typed) relationships to other institutions. During these discussions it was difficult for me to hear the issues and needs around I2’s metadata and identifiers without linked data springing to mind.

While we are designing a standard and not a system or a service per se, it seems useful to include in the standard an informative section about implementation and architecture ((This practice seems more or less common in my (admittedly limited) experience, cf. the unAPI specification.)); I find that reading standards is much easier on the brain when you get not only the standard itself but some examples of implementation, and that will be true as well, one hopes, of I2 standard implementers. To that end, the group will be producing an XML schema of the I2 metadata elements and also an RDF schema.

I have been working on the RDF for I2 on and off for the past month or two. Below are my impressions, as someone who is new to modeling in RDF, and the procedures I used to produce the draft RDF schema.

JSONovich Crawls Into the Future

For those of you who use JSONovich rather than, e.g., JSONView, I’ve tweaked the plugin (now at version 1.3) to work with Firefox 3.6.

I updated the version up on the Mozilla site as well, but things tend to stay in the sandbox for months at a time. (For instance, I submitted version 1.2 back in November, and it’s not yet available.) Feel free to install via my page.

Forking

I am not certain if this is a good idea or not, but I decided to set up a “work blog” as I set off on my new path as a digital library architect. The lines between this blog and that blog are fuzzy – most lines are, in my eyes – so bear with me. I’ve never been a prolific writer – it’s always a chore, an activity I simultaneously want to do more of, and do better, and also struggle mightily with. (It’s the public school education? HAR HAR!) But even so, the posts here may slow yet more. Or maybe that will be true of the new blog. We shall see.

I’ve found that microblogging has largely filled the blogging gap for me; I’m more comfortable, somehow, posting smaller, more easily digestible “thoughtlets” via Twitter/identi.ca/Facebook. Perhaps I’ve succumbed to attention deficit disorder, flitting from one tiny undeveloped idea to the next. It’s probable but I digress.

If you’re interested, you can follow along as I grapple with questions about digital library architecture. Comments are most welcome, both here and there, as always.

Exploring Curation Micro-services

thumbnail of micro-repo treeAs far as I’m concerned, the most exciting developments this year in repositories and digital curation have come out of the California Digital Library. It has been impossible not to notice their papers and presentations. Put simply, their idea is that digital curation is enabled by “micro-services” built upon well-known abstractions such as the filesystem. The benefits are obvious: filesystem tools are ubiquitous and cross-platform, and there are strong market forces to ensure the filesystem persists. The idea is radically simple and straightforward, though many questions remain about such a paradigm. I’ll return to those later.

If you have not yet taken a look at CDL’s curation micro-service specifications, most of which may be printed on as few as one or two sheets of paper, see the Digital Library Building Blocks.

My co-workers in the LC Repository Development Center have been chatting about these specs on and off throughout the year. After months of procrastinating, I finally read all of the specs on Thursday; it’s wonderful that you can do so in the course of one reading session, I might add. Yesterday a bunch of us RDCers got together to chat (informally) about the specs: what they’re for, how they work, and how they interact with one another. I learn by doing, by examples, so I combed through each of the specs in advance of our meeting and tried to construct a minimal repository ((Perhaps it’s more in line with the specs to refer to this space as “a managed filesystem that drives repository and curation services,” given the CDL philosophy that preservation is not a place/repository. But it’s easier to say “repository,” so there you go.)) based on micro-services.

Command-line Shuffle

Being a nerd, I tend to like the command-line. When I’m working on my laptop at home, I tend to like listening to music. Before I discovered that mplayer had a really convenient shuffle idiom, I would invoke it thusly (to listen to all my Pavement tracks in shuffle mode):

export IFS=$'\n'
for track in $(find /mnt/upnp/MediaTomb/Audio/Artists/Pavement -name \*.mp3 | ~/bin/shuffle.py); do mplayer $track; done

And the wee shuffle script I whipped together looks like this:

#!/usr/bin/env python
# shuffle.py

import sys
import random

args = list(sys.stdin)
random.shuffle(args)
sys.stdout.writelines(args)

And here’s the convenient shuffle idiom that renders my arg-shuffling script somewhat useless:

find /mnt/upnp/MediaTomb/Audio/Artists/Pavement -name \*.mp3 | mplayer -playlist - -shuffle -loop 0

I2: Survey Results

I wrote in June that the I2 subgroup surveyed “repository managers to determine the current practices and needs of the repository community regarding institutional identifiers. Results from the survey will inform a set of use cases that will be shared with the community, and that are expected to drive the development of a new standard for institutional identifiers.”

The survey closed in July, and the subgroup spent August writing a report on the survey results. That report is now final and it’s available to the public. Feedback may be sent to our (woefully underutilized) public i2info mailing list, left as a comment on this post, or e-mailed to me privately which I can forward to our internal list.

The next step is to build upon the report to draw yet more conclusions from the data – there’s an awful lot there – and flesh out some repository use cases for institutional identifiers. The I2 core group is moving quickly towards finalizing identifier metadata elements so that a standard may be drafted, and I think having some use cases documented will help drive the standard in a direction the community can get behind.

Onward and upward.

Linking World Digital Library Data

As I mentioned earlier, I’ve been learning about linked data in the context of dropping it into the World Digital Library project. I am hopeful we’ll be able to deploy the RDF views ((Sadly, the URIs are uglyish due to some constraints from our caching configuration. I figure we can redirect uglyish URIs to cool ones and make use of owl:sameAs if those constraints go away.)) before too long. In advance of that, I thought it might be helpful to share a sample of what our RDF would look like. The RDF below represents the WDL item for the U.S. Constitution. I appreciate constructive criticism.

A few things to note:

  • Mmm, Unicode.
  • Item types are from the Bibliographic Ontology.
  • Most of the properties are from the Dublin Core Metadata Element Set ontology, especially used where literals are objects rather than resources identified by URI.
  • Where possible I dug up or found URIs and used the Dublin Core Metadata Terms ontology.
  • An item is modeled as an aggregation of its constituent files, as defined in OAI-ORE. The notion here is that an ORE aggregation of an item, as expressed in a resource map which is discoverable via a link header in each item detail page, is a “whole” item, including all of its files ((sans certain low-quality derivatives such as small thumbnails and tiles for the zoom interface)), metadata, and translations.
  • I’m also making light use of the NEPOMUK File Ontology to express that constituent files are files, and to be explicit about file sizes so that folks know in advance of retrieving it how large files are.
  • Links out to DDC (Decimalised Database of Concepts), Lingvoj, DBpedia, and Library of Congress Authorities & Vocabularies (e.g., LC Subject Headings) are included where possible. ((I was poking through the DBpedia output for Geonames URIs as well, but my method was way too slow and clunky, so that’s disabled for the time being. Clients can always follow their noses from the DBpedia output.)) I’d be especially stoked to hear of other vocabs I might link to. The more linked the data, the better.
  • The output below is Turtle for readability, but the application will offer up RDF/XML.

The data after the jump:

Is MARC a Data Model?

I posted a status update to Twitter, identi.ca, and Facebook late last night hoping to suss out two questions:

  1. Is MARC a data model?
  2. But really: what qualifies something as a data model?

I’d poked around looking for clues to the latter and was left cold by the long Wikipedia entry. Maybe I’ve been doing the micro-blog thing for too long and my ability to parse information that comes in greater-than-140-character chunks has been damaged. Plus I like learning from examples, and what better example for the library geek than MARC?

The feedback I received was pretty impressive, and not all of it consistent with the rest. I found it an interesting example of crowdsourcing, so to speak. As each response came in, I would read it, cross-reference with, e.g., Wikipedia articles, for accuracy, and revise my own answers to the above questions. I’m honing in on an answer to the former question. The latter question is still a bit murky.

I thought I’d share the responses, too. Responses from Twitter are included in full w/ links to the original. Responses from quasi-public Facebook have been anonymized. You can see my replies interspersed as well and watch the evolution of the (admittedly short) discussion. After the jump: