Metadata analysis of the items

Having chosen the items and done a first round of domain modelling, I then set about analysing which metadata standards were used for each. This is shown in the table below. To help me think about the following steps I created categories to which each item could belong: artifacts (MPC), texts (book, web pages), multimedia (DVD, photographs, interview, albums), and physical (event, studio).

Here the main points that emerged from this analysis:

  • The most prominent standard is the Library of Congress' MARC21, due to the fact that half of the items are held in American libraries (with one documented by WorldCat which itself links to libraries)
  • The Smithsonian uses a proprietary system called EDAN, which incorporates established standards like Dublin Core, MARC, and METS, but which is not publicly documented nor easily open to integration in Linked Open Data. Some Smithsonian institutions, like SAAM, are publishing their online collection in LOD but this isn't yet the case for the one in which the items I chose are held
  • Discogs uses Open Graph as its standard however for non-music releases, like Studio A, this is implemented in a rather limited fashion with the only metadata on the page being a type, id, image, and description (as well as marketplace-related data)
Item Category Owner Provider Standard
MPC 3000 Limited Edition Artifacts Smithsonian National Museum of African American History and Culture Smithsonian National Museum of African American History and Culture EDAN
Donuts Texts Library of Congress Library of Congress MARC21 / MODS
Donuts Multimedia St. Paul Public Library St. Paul Public Library MARC21
Welcome 2 Detroit Multimedia Library of Congress Library of Congress MARC21 / MODS
Maureen Yancey Oral History Interview Multimedia Smithsonian National Museum of African American History and Culture Smithsonian National Museum of African American History and Culture EDAN
Our Vinyl Weighs A Ton Multimedia n/a WorldCat MARC21
Discographies Texts n/a MusicBrainz MMD
Photographs Multimedia Brian Cross n/a n/a
Suite For Ma Dukes Physical San Francisco Public Library (DVD recording) San Francisco Public Library (DVD recording) MARC21
Studio A Physical n/a Discogs Open Graph

Metadata Alignment

Following the analysis is the metadata alignment, mapping elements and properties from different standards that are useful to describe the items. As Jenn Riley put it in Seeing Standards: A Visualization of the Metadata Universe, "The sheer number of metadata standards in the cultural heritage sector is overwhelming," [1] which makes the need for alignment all the more important but also difficult.

Being entirely new to the world of metadata standards, the alignment step proved the most challenging but it was also an ideal way in which to get more familiar with the inter-operability and parallels between bibliographic standards like MARC21, generic ones like Dublin Core, and more specific ones like CIDOC-CRM and FRBROO. I definitely learnt a lot about how to navigate documentation and how challenging it can be to try and describe items accurately in just one language as well as how types of items differ in how they can be described by specific languages. And to once again return to the underlying motivations behind this project, it also became apparent that certain types of work like a discography can be quite difficult to fully capture without turning into a large project in and of itself, for example building a specific ontology that can deal with the intricacies of the artist's work like in this case Yancey's use of sampling (and the ongoing usage of his material as samples by new generations of artists).

In the end I chose to focus the alignment on four of the standards from the analysis - EDAN, MARC21, MODS, MMD - and another four to help me better capture the items chose - Dublin Core, IPTC, CIDOC-CRM, and FRBROO.

A few things to note on that last point:

  • I kept EDAN in despite it not being publicly documented because I wanted to try and see how much of it I could figure out by just navigating the records of the items I chose as well as records similar to the other items
  • I kept MMD even though it's not very flexible or useful outside of documenting music releases, and as we'll see in the rest of the project in the end I used more flexible music-minded vocabularies like Music Ontology to fully capture the various musical aspects of and relationships between the chosen items and related entities
  • I chose Dublin Core as a way to have a most generic description of every property in case it was needed in the modelling
  • IPTC was chosen as the standard with which to describe the photographs, the non-catalogued item. I spent a bit of time looking into standards for images (using Riley's map) but nothing felt quite appropriate in capturing what this particular visual item is about. In the end I think IPTC worked (as you can see in the RDF section where the item is fully described) even if it is more of a professional standard rather than something for the cultural heritage domain
  • CIDOC-CRM and FRBROO were chosen for their flexibility, interoperability and dedication to cultural heritage documentation, exchange, and interpretation of the past, allowing me to think about the items in terms of relationships between creative work, expression, and manifestation

I worked through the modelling and metadata phases of the project in parallel, often going back and forth between them, adding, removing, and refining properties, relationships, and entities. It was through this process that I came to somewhat change the perspective and approach to two of the items, as mentioned in the Selection page. The discographies used MMD, which applies to music, but I wanted them to be a combination of text with information about music as well as a concept of a body of work for which CIDOC-CRM and FRBROO were best. The event was first described using the MARC21 record from the SF Library and then this was abandoned in favor of describing it purely as an event, using a combination of standards. These changes are most apparent in the next step, modelling.

The full alignment can be seen in the spreadsheet below or accessed directly here.


[1] - Jenn Riley, Seeing Standards: A Visualization of the Metadata Universe - http://jennriley.com/metadatamap/seeingstandards.pdf