Data management discussion

January 6, 2010

My first week on campus is cruising by. On Monday I sat in on a meeting called by our Chief Information Officer (and my boss’s boss), Kevin Morooney, to discuss what data management means to folks in Penn State’s central information technology group, Information Technology Services.

Attendees came from all across the ITS organization: Administrative Information Services, Security Operations and Services, Consulting and Support Services, Teaching and Learning with Technology, Marketing and Communications, Research Computing and Cyberinfrastructure, Identity and Access Management, and Digital Library Technologies (my department).

The meeting was chiefly for exchanging information, for reconstituting the discussion about what we talk about when we talk about data management. Some of the preceding high-level work done in this area has been around a business intelligence initiative – though I’m not sure what this means, exactly – and development of a University data classification scheme.

I wasn’t terribly surprised to learn how different the perspectives around the table were, but there were also some common themes such as security and identity management.

We’ve all got data, and so data management is done by just about everyone everywhere. It gets very tricky, naturally, when you start talking about data management planning across an institution as large and diverse as Penn State. Kevin asked for folks to mention examples of institutions of higher learning that have tackled data management at the institutional level. Most of the examples given were private, resource-rich schools – no shock there, perhaps.

I’ve been somewhat disconnected from academia for a few years now, so I was hesitant to mention my perhaps outdated examples. I’ve had a chance to poke around and verify what I suspected, in the meantime.

Indiana University’s Information Policy Office has published a Data Management website listing policies, guidelines, a classification scheme and dictionary, data managers, and the membership of the Committee of Data Stewards, “a group… responsible for establishing policies, procedures, and guidelines for management of institutional data across Indiana University.”
The University of Washington has also been active in this area. UW Technology and the eScience Institute published a report, Conversations with University of Washington Research Leaders, on “a large-scale effort to assess the information technology needs of the Univeristy of Washington’s top researchers. … [T]he goals of the project were (1) to understand how UW researchers currently use technology and anticipate using technology in the future to support their research activities, and (2) to identify the resources and services they need to maintain and build upon their remarkable record of success. To accomplish these goals [UW Technology and the eScience Institute] interviewed 127 researchers.” The first recommendation in the report, which contains an entire section on data management, is that the University should provide a new data management paradigm.

Many big questions remain. What does “data management” even mean? Who are the stakeholders and what are their expectations? How would data management responsibilities be divvied up? In which directions should outreach be concentrated? What data is even out there?

Among the big and challenging topics, thoroughly intertwingled, are: security, privacy, and access control; scalability and performance; provenance and auditing; metadata and discoverability; persistence; access vectors; buy-in; the notion of “one-sized-fits-allness;” trust (five huge gigantic scary letters); incorporation w/ existing workflows; and probably eleventy zillion others.

All of which I solved in a dream last night – and then forgot. It probably involved a cloud (or SOA, right?).

The meeting adjourned shortly after determining that there were no immediate follow-ons or action items, except to keep thinking about data management and looking for good reasons to reconvene the group. (Sidebar: I was involved with an effort within the Libraries at the University of Washington to develop and conduct an institutional data census, a difficult and involved process aimed at answering at least one question – what data is out there? – and so I’d be stoked to see a similar effort at Penn State.) This ad hoc group will probably meet once or twice a year and I look forward to watching things develop in this space. I learned a bunch from the meeting, and that’s extremely valuable while I put my work here into context.

When isn’t learning valuable?

Twitter Facebook LinkedIn

Data management discussion

You May Also Enjoy

Understanding (e.g.) DOIs for data sets

Ingest: Lessons learned

Ingest is a barrier to ingest

Impressions from Open Repositories 2010