Descriptive Metadata for
Audio-Oriented Digital Collections


What do the findings tell us about metadata requirements for audio-oriented digital libraries, collections and archives? The results found in this study suggest that metadata for these collections is not a case of one-size-fits-all. Collections might be created in the context of any number of fields of study, not just music, but they are likely to contain music or spoken word, and often together. They may be structured hierarchically like archival fonds, but are more likely to be structured with an item level granularity. Many different disciplines create sound recordings for different purposes, and so the collections contained a diversity of broadcast recordings, field recordings, historic commercial recordings and archival recordings. The collections were also likely to have related materials in media other than sound.

Heterogeneous and Homogeneous Content

Collections may bring together an array of recordings from disparate sources, or they may focus on a specialized topic that leads to the inclusion of other kinds of media and material besides sound. In the first case, there tends to be a low number of commonly recurring metadata elements from record to record, and in the second, there is a moderate to high number of elements, which do tend to recur from record to record. These factors suggest that the nature of the content plays a role in determining the metadata requirements, and moreover, the disciplinary context of content creation will determine the character of the content and the kind of pre-existing metadata that may already have been created. Heterogeneous content suggests the need for a flexible approach to metadata, while homogeneous content seems to demand a more customized, but therefore less nimble, approach to bring meaningful differences to the fore. In general, it was found that the number and consistency of metadata elements was related to the scope and consistency of content.

Multiple Contributors

In terms of the descriptive elements, some general conclusions may be drawn from the set of most common elements. First, is the wide variety of roles that are involved in authoring a sound recording. A book too has many hands touch it as it is created, but editors and printing press operators are generally excluded from the formal descriptions entered in catalogues. In contrast, many of the roles in recordings are deemed relevant, whether or not they actually make any sound (viz. conductors and recordists). Audio metadata needs to accommodate this characteristic. At the same time, the proliferation of roles presents challenges for semantic interoperability (Cwiok, 2005), as adding qualifiers become more essential to make the contributor names meaningful.

Time and Place

The prevalence of time and place metadata, coupled with the emerging trend of visualizing that data with dynamic maps speaks to another characteristic sound. Namely, that it often represents a specific time and place, which form an integral part of its meaning. The importance of the specificity appears to matter more, the more specialized the collection is. Time and place were found to be at least as important as descriptions and notes; there were 36 elements found for time and place and 18 for descriptions and notes. One could go further and argue that matrix and 'take' numbers that play such a crucial role in commercial recordings are mostly valuable because they tie a recording to a specific time and place, as well as to specific performers.

Because recordings present a particular moment, it seems that they require a lot of contextualization to bring out their full meaning. Description in sentence or paragraph form is also useful, but information of time and place, can be presented very precisely and concisely, and is therefore easier to structure as data. Once recorded in structured form, these elements become facets that offer great potential for innovative search and discovery mechanisms.

Time and place as metadata are only useful, however, when the information was recorded in the first place. If the information is not there, it is impossible to recreate it. They also lose their usefulness in cases where recordings are heavily edited and come to represent a montage of different recordings. In general, though, this is an issue that has less bearing on field recordings and historical recordings, where the recording practice was very simple.

The significance of time and place in a broader sense, is that they allow content to be located and integrated into world history, and this makes them universal, in a sense, because everyone can relate them to their own experience of time and knowledge of history, regardless of language or culture. Sound recordings do not have a monopoly on time and space by any means, yet recorded sound is 'about' time on many levels, and can’t not be about time and place, it seems.

Description and Notes

Finally, the consistent presence of a notes field in most of the collections suggests that sound recordings require summarization. In monographs and other text-based information packages, the title carries a lot of information and so receives a lot of attention in standard bibliographic descriptions. Recordings may relate to a musical work and therefore carry a title, but when they don't, even a constructed title may not suffice. One could argue that a constructed title is an abbreviated form of a note. The question is not so much aboutness as it is explanation. In this way, descriptive notes for recordings play a similar role to the scope and content notes used in descriptions of archival fonds.


As mentioned above, there is a tension between the need for specific types of metadata and the desire for interoperability. As long as one is using a recognized schema, crosswalks are not a technological impossibility, but it is perhaps worth asking what type of material one wishes to be interoperable with. The Dismarc project represents a significant achievement in its integration of the catalogues of so many audio archives, but it was probably made easier by the fact that the collections were all audio-oriented. Even so, the commonality of elements was not high and consistency of application was not great. For example, in the sample records gathered from Dismarc, the information value related to the sound carrier 'CD' appeared variously in the extent, medium, and format fields. In the latter, it was defined by a controlled vocabulary. The records described holdings from five different archives. It would be useful to develop a commonly accepted theoretical understanding about audio-oriented metadata (and video-oriented metadata, for that matter), such that, at least audio-oriented collections could align their approach to metadata with each other and improve the chances of successful interoperability. This would acknowledge the fact that sound archives will tend to want to share with other like-minded collections. In fact, this is exactly the principle behind the Dismarc project.

This website was created by John Huck in March, 2010
School of Library and Information Studies
University of Alberta