I posted a status update to Twitter, identi.ca, and Facebook late last night hoping to suss out two questions:<ol>

  • Is MARC a data model?
  • But really: what qualifies something as a data model?
  • </ol>

    I’d poked around looking for clues to the latter and was left cold by the long Wikipedia entry. Maybe I’ve been doing the micro-blog thing for too long and my ability to parse information that comes in greater-than-140-character chunks has been damaged. Plus I like learning from examples, and what better example for the library geek than MARC?

    The feedback I received was pretty impressive, and not all of it consistent with the rest. I found it an interesting example of crowdsourcing, so to speak. As each response came in, I would read it, cross-reference with, e.g., Wikipedia articles, for accuracy, and revise my own answers to the above questions. I’m honing in on an answer to the former question. The latter question is still a bit murky.

    I thought I’d share the responses, too. Responses from Twitter are included in full w/ links to the original. Responses from quasi-public Facebook have been anonymized. You can see my replies interspersed as well and watch the evolution of the (admittedly short) discussion. After the jump:

    @bangpound: @mjgiarlo MARC is a markup language. It makes no declarations about how data is stored only how it's formatted.</a>
    @ranginui: @mjgiarlo a piece of crap, cue neil young and crazy horse
    @anarchivist: @mjgiarlo not a data model, it's a transmission format
    @vphill: @mjgiarlo I've heard that said about MARC too, let me know if you get an answer
    A container for a data model, such as AACR2
    @mjgiarlo: @bangpound, @anarchivist, @vphill: So. let's see: MARC21 bib is a profile of a serialization/transmission format w/ AACR2 as the data model?
    @anarchivist: @mjgiarlo wouldn't even assume AACR2 if I was you.
    @mjgiarlo: @anarchivist: Okay. Something says "authors go in 100; contributors go in 700," though, right? Is that not a data model? Sorry if dense.
    MARC is not a data model (and neither is AACR2) in the sense that neither of them explicitly describes entities and relationships among entities. The relationships in these two non-relational frameworks are implicit, and the semantics of the model must be supplied in the end by the people who use these frameworks. RDA/FRBR is a move toward an actual data model -- it makes some relationships explicit and can properly be represented in an Entity-Relationship diagram (with all those relationship words that explicitly express the semantics -- words like, for example, "is realized through" or "is embodied in" or "is exemplified by"), but even RDA/FRBR does not fully express all of the relationships/semantics and must be translated into an actual data model in order to be implemented -- librarians have been irresponsible, in my opinion, in refusing to learn about relational database concepts, mostly because of their slavish adherence to the old flat-file style that MARC represents.
    @gmcharlt: @mjgiarlo MARC is many things at once, which is part of the problem. Not just transmission standard; embodies current cataloging worldview
    @edsu: @mjgiarlo i think there are aspects of data modeling in Z39.2 & ISO 2709, and certainly in MARC21 ; that said, i think @gmcharlt is right.

    So, based on all the responses I’ve gotten (on Facebook, on Twitter, around the office), here’s my current thinking:

    • MARC means more than one thing.
    • One meaning of MARC is MARC the binary format. A format is not a data model.
    • Another meaning of MARC is, e.g., MARC21 Bibliographic.
    • MARC21 Bibliographic is a profile of MARC, which is serialized in the MARC binary format.
    • MARC21 Bibliographic defines semantics for fields and subfields and indicators, which makes it feel like a data model. This gets at some of the assumptions I've internalized about data models.
    • The MARC21 Bibliographic data model thus has well-defined entities, but otherwise is a poor data model, primarily because:
      1. It does not have well-defined relationships between the entities;
      2. It conflates different conceptual models, such as the FRBR Group 1 entities and also mixes FRBR Group 1 entities with Group 2 and 3 entities.
    • I'm not sure where this leaves AACR2, but it feels like it just fell out of the discussion.

    I’d be pleased if the discussion continued. If nothing else, it really satisfies my curiosity and gets my brain going (which is useful on a Monday morning).

    Updated: