Braindump for Q2 2010

April 6, 2010

My my, has it really been three months since I wrote up my agenda? I’ve been busy chipping away at the agenda so I thought I’d document my progress now that Q2 is underway.

Reviewing digital library platforms for the e-Content Stewardship Council

The platform review project that our digital collections curator and I have undertaken continues. We began the project by having folks demonstrate each platform and how they use it, and have been busy with small, informal interview sessions with many of the same folks but also others who work outside of the Libraries. We have a few more interview sessions to conduct and document, so the data gathering portion of the project is nearly complete. In the meantime we’ve been discussing evaluation criteria. We started off with a short list of criteria, but then noticed the criteria Purdue are using for their comparative analysis of institutional repository software and adopted those instead. We sketched out a structure for the final report, which we hope to finish in May.
Reviewing functional requirements for an institution-wide repository of electronic records

This work is still under way. We have a set of well-documented functional requirements for an e-records repository service but have yet to make progress on building anything. We’ve been talking about applying for a grant to help fund some additional staffing which might be used to help build out proof-of-concept curation services (preservation, provenance, description, discovery) for e-records. I’m really keen on applying curation micro-services, such as those used at CDL, to the e-records domain. I see this effort as benefiting both the curation micro-services community and the e-records community – not to mention our own electronic records initiatives here at Penn state. An all-around win, if you ask me, but then I’m biased. This will be a major activity in the latter half of this year continuing into the next.
Learning more about “big data” and continuing the data management discussion

Our content stewardship program will doubtless need to address research data. We’re not there yet. In the meantime, Penn State’s ITANA chapter will be pulling together a working group on the technological and architectural challenges of research data. Jeff Nucciarone and I will be chairing the group. In the meantime, research data has been on my mind for two reasons: Michael Lesk gave a talk at the information school urging libraries to turn their attention to research data; and I’ll be attending the Research Data Access and Preservation Summit in Phoenix later this week.
Evaluating the DLT archival storage prototype and joining the technical team of the Data Storage Working Group

The Data Storage Working Group effort has been repurposed. The steering team will continue to meet informally and discuss archival storage and curation needs across the campuses. The technical team has been dissolved, and the majority of us (who already work together in DLT in support of the same mission) will continue to work in this space.
Evaluating next-generation information discovery tools for the libraries with the Libraries’ Department of Information Technology

The RFP process continues. We hope to have wrapped up our evaluation by the beginning of summer.[

](http://www.libraries.psu.edu/psul/itech.html)
Evaluating change management solutions with a team from Penn State’s ITANA group

I haven’t found much time to stay involved with this team, unfortunately, but their work continues apace.
Working on requirements for a draft institutional identifier standard with the NISO I2 working group

The I2 working group is putting finishing touches on a draft standard and on core metadata required to identify institutions. We hope to share this draft and put out a request for comments in the coming months. I’ve been modeling the I2 domain in RDF both for more RDF experience and also with the hope that an eventual I2 core service will be exposed as linked data.
Attending Code4Lib 2010

You can tell how good a code4lib conference is by how little you remember of it. By that measure, this year’s conference was the best yet. Some of the highlights for me: 1) Linked data, a pattern for exposing resources and metadata via the web, continues to be a hot topic among cutting-edge library developers. There was a focus this year on how to participate in the linked data web in practical and lowish-barrier ways. The speed with which concepts move, at code4lib, from “novel, and interesting to a few” to “widely talked about and deployed” is dizzying; 2) Software development practices continue to mature in libraries. We’re talking more and more about test-driven design and agile development. While these methodologies are beneficial to developers themselves, I find this remarkable because it means the gap between coders and stakeholders is being bridged, and that means better and more usable software, and happier users; 3) Repositories are not typically a hot topic at code4lib, but there were a number of prepared talks, lightning talks, and breakout sessions on the topic. Fedora tends to be the repository most often talked about, if only because it is the repository that requires the most hacking – and these are the people doing the hacking. What I found interesting this year was the dissatisfaction with monolithic repository software packages, and the movement towards “homebrewed”, though standards-based, repository services, such as those being advocated by the California Digital Library.
Meetings, meetings, meetings

The meetings, they continue.
Continuing to absorb as many of the following as possible: strategic plans, project portfolios, process management documents, and various and sundry reports, wikis, and blogs

And this continues as well, though it’s hard to find time to contextualize when you’ve got actual tasks and deadlines.

And here are some new and upcoming things.

I’ve written about my search for a practice-oriented curation technology/architecture community, and I’m glad to say I’ve made some progress on finding said community. I’ve been part of a conversation revolving loosely around the digital-curation group and that conversation has now turned to planning a curation technology workshop which we’re called CURATEcamp (CURAtion TEchnology Camp). I hope to have more details to share soon.
I am attending Open Repositories 2010 in Madrid this July. I expect to learn about how folks are using repository systems such as Fedora, DSpace, and ePrints, but am more interested in all the other stuff happening on the periphery. There has also been talk of a curation micro-services birds-of-a-feather session, which might serve as a good event to get potential CURATEcampers talking.
I’ll be in Washington, DC in a few weeks working on a team to evaluate IMLS National Leadership grant applications. This will be a new experience for me, and one to which I need to devote a significant chunk of time between now and then, so I’m excited. It will be interesting to see what folks are doing outside of Penn State, and also to get an idea for what sorts of projects wind up getting funded.
I have some vague ideas for project charters but have yet to really flesh them out. One involves some collaborative development on tools around curation microservices, to be used and evaluated by honest-to-goodness curators with honest-to-goodness data, and the other is about benchmarking some distributed filesystems.
Techies at Penn State need to talk more. I want a BarCamp-style event for PSU techies so that we can discuss issues across departmental boundaries. Administrators have been nothing but supportive of the idea, and now I just need to find some time to sketch what I have in mind.
Digital Library Technologies, my department, is hiring! We’re looking for someone to come develop software to support our content stewardship program. Like writing code? Interested in how data is curated, stored, and discovered at scale? Consider applying. (Will link to position when it goes public later this week.)

Braindump complete. Brain now empty, except to say: boy, State College sure is lovely in the spring.

Twitter Facebook LinkedIn

Braindump for Q2 2010

You May Also Enjoy

Understanding (e.g.) DOIs for data sets

Ingest: Lessons learned

Ingest is a barrier to ingest

Impressions from Open Repositories 2010