NJLA 2007 Talk

June 5, 2007

This is a slightly modified (read: rough) transcription of the talk I gave at this year’s NJLA conference, called “Library Revolution.”

The abstract described an idealistic scenario:

Imagine, if you will, a world where library services are automatically discovered; Library users retrieve information objects and metadata with a single click, never having to navigate the dark alleys of dead-ends that are full-text resolvers; Information sources and services are connected and remixed according to user preferences and needs, where and when they wish. What if we could leverage existing library and industry standards, applications, and protocols to make this a reality? And soon?

In this scenario, a potential library revolution could be fomented – in which the goal would be to return the means of production to users, to hand over the reins, to re-envision ourselves as tool- and service-building artisans, as Karen G. Schneider described in her keynote at the Code4Lib 2007 conference, rather than gatekeepers and information proxies – and I’m going to suggest some ways this might be achieved. For now I’ll assume it’s self-evident why it is desirable to, generally speaking, get “our stuff” “out there” and meet users at their points of need.

Rather than get into nitty-gritty details, I’d like to describe a higher-level vision which has been put forward by a host of library technologists that have come before me (especially Daniel Chudnov). Some aspects of my vision may indeed be pie-in-the-sky, but consider this:

Isn't pie delicious?
Shouldn't we reach for it?

So, what’s the problem? Why a revolution? Here are some (arguably trite) observations:

Full-text resolvers do not work well. You should not have to click through two, three, or four windows to get at full-text -- assuming it's actually there and not a complete dead end! Don't get me wrong -- it's better to have access to full-text through a resolver than not to. I'd like to see more resolver systems that implement look-ahead resolution like Oregon State University's new, and freely available, metasearch tool, LibraryFind. LF uses what Jeremy Frumkin, the Chair of Innovative Library Services at OSU, likes to call "two-click workflow": one click to find, one click to get.
Information splatter. We've accumulated too many silos and need to figure out better ways to access all of that information via a single interface, whether the method is federation, aggregation, or something else. Users should not have to go to multiple sites to search our collections for resources of interest.
Sandboxing. Our content and services are, generally speaking, tightly coupled to our websites, so we are generally unable to meet users at their points of need.
Service usage -- reference desk visits, OPAC searches -- appears to be dwindling.
Growing popularity of Google, Amazon, and "web 2.0" or social networking sites -- del.icio.us, flickr, twitter, myspace, facebook, ning, librarything. These sites are great -- especially MySpace, where I get all sorts of offers for new prescription drugs and live adult webcams. But these sites really -are- great. They empower users to connect with one another, to describe their own resources, to share with others, to remix information. And most of all? They're incredibly easy to use. Are our tools as easy to use? Are we similarly empowering users?

An aside on “2.0”: Although I cringe at the viral “2.0” meme – web 2.0, library 2.0, business 2.0, identity 2.0, enterprise 2.0, learning 2.0, travel 2.0… – it is interesting to note that there is something to “2.0”. Something revolutionary. And it’s not folksonomies, it’s not tagging, it’s not tag clouds, it’s not sharing, it’s not any particular site or idea. It is the very fabric of “2.0”, and that is a re-envisioning of the web from connecting people with data to connecting people with people. The web has evolved from a network of interlinked documents to an extension of the social fabric connecting us all.

As you can see, a revolution of sorts has already begun. Time magazine selected “You” as their 2006 person of the year. When MSNBC covered the story, the headline read “From blogs to YouTube, user-generated content transforms the Internet”. I’m personally not that interested in the social networking aspect, and it already receives a lot of coverage from the library 2.0 gang. Library 2.0 is a popular topic now, and much has been said of wikis, blogs, and RSS. These are important topics but others are already covering them quite well. The point to take away from 2.0, in my view, is that it’s empowering and inspiring users to do things with information they previously were not able or willing to do. Ask tens of millions of people to help us catalog MARC records? Right. But ask them to tag videos on YouTube, bands on Last.FM, images on Flickr, links on del.icio.us, and so forth? There you go. My areas of interest with regard to library revolution are unifying our content and services, getting them outside the library sandbox, and returning the means of production in this very “2.0” way.

Let’s step through some technologies and technological concepts that may play a role in reaching this outcome.

Systems integration: We have accumulated a wealth of resources over the years and have purchased, or built, or licensed, numerous systems to access these resources that have traditionally been disparate. This is a great accomplishment; the more information we can get into the hands of our users, the better. The process doesn't scale, though, and has resulted in a proliferation of information silos. Because of thorny issues of interoperability, not to mention licensing issues, technological incompatibilities, and lack of resources, we have thus far struggled to bridge the gaps between these silos. The result? A number of different search interfaces, with different result sets, in different formats, supporting different depths of coverage.

How can we reconcile in our users' minds this information environment with the "simple, single search box" mentality of the Google age? What if we built bridges between our systems? Pull together metasearch with the link resolver, the link resolver with the catalog, the catalog with institutional repositories. Easy, right? Well, no. But at the very least, if you can get XML out of these systems -- whether through OAI-PMH, or SRU, or a database export -- you can bring it together. Index it with a tool like Solr, and you've got your Google-ish library search tool.
Auto-discovery: Auto-discovery is used by a number of technologies, though perhaps its usage to announce syndication (RSS) feeds is the most well-known. The mechanism for syndication auto-discovery is actually quite simple. Got a feed for your site? Add a single line of HTML code to any page you'd like to announce it on, and modern web browsers will pick it up and clue you in.

In HTML, there is a LINK tag, not to be confused with the A tag (which stands for anchor) commonly used for hyperlinks. The anchor and link tags differ in the following ways:
- Anchor tags may have text content and show up as labels for links. For instance, you might link to FoxNews.com and label it "Fair and balanced? Yeah right." LINK tags do not have text content.
- Actionability: Anchor tags are clickable. They take you someplace. LINK tags are not clickable.
- Context: Anchor tags appear in the body of a document. LINK tags appear in the HEAD.
- Semantics: Anchor tags may represent any number of things. It might be a link to content further down in the current page, it might link to another page entirely, or it might even be used to activate some javascript or launch a popup window. LINK tags are used solely to describe document relationships, more semantic information. For instance, a LINK tag might describe a link to the next and previous chapters in an e-book, a LINK tag might be used to link to alternative representations of a document, such as versions in other languages, or versions formatted in RSS or the Atom syndication format. The LINK tag is a great way to leverage the existing web architecture to handle the problem of "one resource, many representations", and I wouldn't be surprised if the OAI-Object Reuse and Exchange initiative took a hard look at it.
The LINK tag sort of auto-discovery, such as for syndication feeds, is common, but is not the only implementation of auto-discovery. There are more sophisticated ways, such as Zero Configuration Networking.
Syndication: You've probably heard a lot about RSS, or Really Simple Syndication, and I wouldn't be surprised if most of you are already using it. It's a great technology, simple to use and implement, and I know it saves me a great deal of time on a daily basis. Instead of having to click through and browse the 50 or so websites I track regularly, I read updated content from each site in my feed aggregator in a unified interface. A lot of attention is already paid to RSS, especially in library 2.0 circles, so I won't say much more about it. Syndication allows content to be syndicated into feeds that folks can subscribe to and unsubscribe from willy-nilly.

But I thought it was important to include an explicit mention of syndication since a couple of the other topics relate to it, and since it is a great example of getting stuff out there. Rather than requiring your audience to come to your website, syndication enables them to read your content in an environment of their choosing. It's worth noting that my wife is not a fan of syndication. She likes the experience of going to different websites, enjoying their different takes on web design, and compartmentalizing her web surfing. And that's great; no one, to the best of my knowledge, has advocated an "RSS-only" interface. Content available in the RSS format is also available otherwise, so it is a convenient option for people like myself.

One more point about RSS, despite saying I wouldn't talk much about it. It's kind of an academic point, but I feel it warrants some clarification. The term RSS has quickly become the Band-Aid, or the Kleenex, of syndication feeds. RSS is one of a number of formats used for marking up syndication feeds. Another is the Atom Syndication Format. Most browser and feed aggregators are fully aware of both feed types -- for instance, Bloglines has supported both formats since June of 2006 -- and they generally should render the same, and that's why you don't hear about Atom much; it's a detail that is, for the most part, behind the scenes.
OpenSearch: Does anyone here have a website or a catalog? Do they have search interfaces? Perfect, you're about a third of the way there. OpenSearch is a specification for some simple formats that allow you to share search results. Just as syndication allows you to decouple your content from your website, OpenSearch allows you to decouple your search engines from your websites. Here's how it works:
1. Go to your search page and look at the URL after you run a search
2. Write an OpenSearch description document
3. Embed a LINK tag linking to the OpenSearch description document, for auto-discovery
4. Return search results in RSS or Atom
You might ask "why bother?" Firstly, the newest browsers -- FF2 and IE7, among others -- support auto-discovery of OpenSearch targets. So folks can search Google, Wikipedia, Amazon, eBay ... and your websites or catalogs directly from their browser. Secondly, it allows for fairly simple federation of searches across OpenSearch targets. Since each target contains a description document that is machine-readable, I can point my OpenSearch client at a number of targets, find their descriptions, and learn how to search them. Results are, by convention, returned in RSS or Atom, which are easily crosswalkable, so aggregating result sets is fairly trivial (though how to sort or rank them is tricky). Thirdly, since results are returned as RSS or Atom, one can in effect subscribe to search results. For example, you could subscribe to a search on Wikipedia for "Anarcho-Syndicalism", and your feed aggregator will be alerted whenever that search returns new results. Or, a Linguistics professor could subscribe to your catalog's OpenSearch target, hoping to be alerted when new materials about Germanic syntax are cataloged -- and it's worth noting, this is as easy as just two or three clicks in a web browser.
unAPI: Numerous tools and protocols exist for integrating library resources into other information systems, library or otherwise. OAI-PMH and OpenURL are two great examples of successful and widely deployed technologies. Unfortunately, few developers outside the relatively small world of library technology know anything about library standards, and this is seen as a significant integration barrier. Dan Chudnov, a librarian programmer at LC, reflected on this: 'we librarians and those of us librarians who write standards tend, in writing our standards, to "make complex things possible, and make simple things complex."'

To address this issue, a number of librarians and technologists came together to develop a new standard called unAPI. unAPI is a tiny web-based specification designed to solve the problem of identifying, copying, and pasting discrete content objects to and from web applications (including catalogs, bibliographic databases, repositories, link resolvers, and so forth), making it simpler for developers outside the library world to get at our vast intellectual resources. The objective of unAPI, then, is to enable web sites with HTML interfaces to information-rich objects to simultaneously publish richly structured metadata for those objects, or those objects themselves, in a predictable and consistent way for machine processing.

unAPI consists of three parts: A microformat for embedding object identifiers in HTML, an HTML LINK tag for unAPI service auto-discovery (as used for RSS, Atom, and OpenSearch), and a web service consisting of three functions--get formats , get formats for x identifier , get format y of identifier x -- two of which have a standardized response format, returning XML.
ZeroConf: I want to acknowledge Dan Chudnov again, for suggesting that Zero Configuration Networking might have a place in library services. The general question here is "why can't library tools be as cool as iTunes is?" Just waltz into Starbucks, connect to the wi-fi, and you can see everyone else's playlist. You can listen to their music, even. What sort of magic made this sort of auto-discovery "just work?"

The technology is called Zero Configuration Networking, or ZeroConf, though you may see older mentions under the names Rendezvous and Bonjour. Without getting into the hairy details, ZeroConf is a small stack of fairly low-level technologies that piggy-back on the ubiquitous domain name system (or DNS), which enables us to type identifiers like "google.com" and "nytimes.com" into a web browser and rest assured that our computers will take care of the rest for us -- looking up the domain name, finding the network address of the server, connecting to the server on an appropriate port, and so forth. ZeroConf allows machines to connect to networks without any knowledge of what's already on the network, without regard for the type of network topology or infrastructure, and both register the services it provides and query the services already provided by other nodes on the network. It sounds complicated, but everytime you walk into Starbucks and start up iTunes, it seems pretty trivial. It "just works."

Wouldn't it be great if users could discover our services and resources that easily? What if we went through with systems integration and announced that unified service via ZeroConf? A visiting scholar could enter our library, connect her machine to the network -- and let's forget about authentication and authorization for now -- and immediately find our new service, from which she could run a simple search against all of our bibliographic databases, all of our catalog records, all of our full-text holdings, and all of our repository objects. What if library services "just worked"?Significant work needs to be done in this area before it becomes viable as I've just described, but I find it to be a compelling vision.

The technology as is widely acknowledged is the easy part. So how do we get there? How do we re-envision ourselves as library artisans? How do we craft services that “just work?” How can we investigate all these technologies that let us unleash our considerable assets?

Commitment to innovation. If you can afford to have skunkworks in your organization, even if it's only one employee, or allowing a couple of creative individuals to devote 10% of their time to innovative pursuits, it's worth it.
Bold direction.
Think outside the orgchart -- leverage collaborative development, forge communities, make the most of your consortial ties.
... You tell me.

We can yield revolutionary results via small steps and a bold, forward-thinking direction. Pie in the sky? Maybe, but isn’t pie delicious? (Yes, that was a glib and abrupt ending, but I’m tired of editing.)

Twitter Facebook LinkedIn

NJLA 2007 Talk

You May Also Enjoy

Understanding (e.g.) DOIs for data sets

Ingest: Lessons learned

Ingest is a barrier to ingest

Impressions from Open Repositories 2010