Descriptive Metadata for
Audio-Oriented Digital Collections


In recent years, the online environment has had a profound impact on music and the recording industry. In fact, some accounts of the internet boom around the turn of the last century have suggested that file sharing of music had a direct impact on the swift adoption of broadband internet access (Hartley, 2009). Since that time, file sharing has become commonplace, shifting from Napster to torrent technology, while legitimate online sellers like iTunes have entered the marketplace, and database services like Naxos Music Online have begun selling access to streaming audio. While these online entities account for a vast proportion of the public's current procurement of music, they are primarily mechanisms for the distribution of commercial recordings.

Non-commercial or unpublished recordings have traditionally been the concern of sound archives. The variety in these types of recordings is broad, including field recordings of various kinds, broadcast recordings, oral histories, and early recordings on fragile formats like 78s, wax cylinders and the like. The institutions that collect them often fall somewhere between the traditional worlds of archives, libraries and museums. For these institutions, the emergence of a networked world has meant new channels for distribution through the development of digital libraries and repositories. But it has also meant that the already complex aspects of descriptive cataloguing of this material have been transposed into the equally complex world of metadata. Zeng and Qin (2008) recommend that choices about metadata schema be based on an analysis of functional requirements. These types of online collections run the gamut from digital libraries, to archival catalogues with sound files, to open platforms for hosting user generated content. Despite this diversity, it seemed natural to ask whether sound recordings have specific metadata requirements that need to be taken into account in online audio collections.

Analysis of metadata requirements typically leads to the selection of one of the many formal metadata schemas that define metadata element sets. Schema comparisons and development of crosswalks between them constitute one approach to metadata research (Corthaut, Govaerts, Verbert, & Duval, 2008). While this approach is useful from a design point of view for digital libraries, it does not examine metadata as actually implemented. Many schemas, such as Dublin Core, may be modified, qualified, or used and interpreted in different ways (Cole & Shreeves, 2004); furthermore, not all elements in a given set are mandatory, and some may be rarely used. Therefore, it was decided that the best approach for this research project was to study metadata implementations in real collections, and to compare metadata elements rather than schemas. This approach would allow an appraisal of requirements for audio content that reflected its unique characteristics, independent of schema considerations. There was a second practical consideration for limiting the scope to elements instead of schemas, which was that detailed documentation about the schemas in use was available for some collections, but non-existent for others. It was also decided that the scope would be limited to elements in metadata records as publicly displayed, which meant focusing on descriptive metadata, since structural and technical metadata is not usually intended for human reading, and therefore not always provided. The benefit of this empirical method, though, was that comparisons could be made between collections that were quite disparate in nature.

The primary research question of this study is whether the nature of audio material in online digital collections leads to specialized descriptive metadata requirements, and if so, what they might be. A secondary question is what lessons we might learn from the practices of current collections.

The question of what to call the entities in this study requires some explanation. The possibilities included digital libraries, online collections, digital archives and digital repositories. The choice was not easy, since, as shown in Table 1, the sponsoring institutions were of many kinds. Borgman identifies two perspectives on the term digital library: one that sees them as "content collected on behalf of user communities" and the other that sees them as "institutions or services" (Borgman, 2001, p. 35). Both perspectives describe my selected sites, but the word library seemed imperfect to describe material clearly archival in nature. "Digital archives" was ruled out by analogy and "repository" was rejected because it was thought to imply an institutional process rather than a curated presentation. As a compromise, the term "online collection" offered a solution. It is true that the archival community distinguishes between holdings and collections, since "collection" implies an artificially created grouping, rather than an organically derived fonds, but, since archives make selections from those holdings when they put audio online, the term seemed appropriate in this case. The phrase "audio-oriented" was adapted from Michael Lesk's term "text-oriented" as a way of characterizing collections whose main focus is on audio recordings, but that might contain supporting material in other formats, such as photographs or transcripts.

