Descriptive Metadata for
Audio-Oriented Digital Collections


The eighteen online audio-oriented digital collections included this study are listed here. They are listed with names of host institutions, URLs and acronyms that will be used throughout the discussion that follows. The findings are discussed in three sections that correspond to three phases of analysis:

  • Collection profiles: hosts (Table 1), types of content (Table 2), collection features (Table 3).
  • Metadata elements on a collection level (Table 4).
  • Metadata elements on an aggregate level (Table 5 and Table 6)

Collection Hosts

Collection hosts were found to fall into five non-exclusive categories: university libraries, university departments or institutes, national libraries or archives, museums or archives, and consortia. The collections were often supported by several bodies working in collaboration, and four were supported by organizing that fell into more than one category. Consortia were distinguished from collaborations, and the category was used in cases where the material in the collection came from more than one institution. Less than half were hosted by university libraries, though two thirds were hosted by a university entity of some kind. National libraries, archives and museums hosted seven collections, and three consortia were found.

Types of Host Institutions

University Library 7
University Department or Institute 5
Museum or Archive 4
National Library or Archive 3
Consortium 3
Collections in Multiple Categories 4

The Dismarc collection bears special mention, because it is the intended sound and music component of the Europeana joint digital library. Dismarc, which stands for Discovering Music Archives, holds collection and item level descriptions for over twenty European sound archives. In many cases, the collections were intimately connected to the activities of their hosts: some, like the Australian National Film and Sound Archive (NFSA) and the Vincent Voice Library (VVL), were an online presence for a long-standing collection, and others, like CHARM (AHRC Research Centre for the History and Analysis of Recorded Music), drew their material from off-line holdings.

The locations of the host institution was revealing. While almost all were based in North America, the United Kingdom and Europe, the content several of those collections had a specific geographic focus elsewhere. This likely reflects the fact that academics in Western countries often maintain research programs that study the cultures of other people and places.

Location of Host Institutions

Canada 1
United States 7
United Kingdom 3
Europe 5
South Africa 1
Australia 1

Geographic Focus of Content

Global* 7
The Americas 1
Canada* 1
US 2
UK 1
Europe 1
Middle East 1
Africa 3
Australia 1

*CHARM, CPDP & VG were also considered to have a 'format focus'

Types of Content

The character of the content was similarly diverse. Music was the predominant content focus, but almost as many collections included the spoken word. Four were considered to have material scientific or environmental in nature, which included soundscapes, animals, or, in the case of SemArch (Semitisches Tonarchiv), the spoken word collected for linguistic analysis. It is interesting to note that three of the scientific collections accepted user submitted content, and this point of difference with the other collections could reflect a difference in purpose, wherein the online collections are intended to consolidate current research data worldwide. This is certainly the case with SemArch and Xeno-Canto America (XCA).

Almost all the collections included field recordings, whether of music, the spoken word, or environmental sounds. The three that didn't – CHARM (AHRC Research Centre for the History and Analysis of Recorded Music), the Cylinder Preservation and Digitization Project (CPDP), and the Virtual Gramophone (VG) – were specifically focused on commercial 78s or cylinders. Interminglings of music, spoken word, field recordings and broadcast recordings were very common. This could be, in part, because the traditional approach to folklore research (Bartis, 1979) sees the spoken word and music as merely different manifestations of the culture of a people. This ecumenism is even reflected in the name of the American Folklife card catalog: Traditional Music and Spoken Word (TMSWC). Broadcast material was often found in collections with field recordings, which might be explained by the fact that these categories share the characteristic of being non-commercial recordings.

Recording Types

Music 13
Spoken Word 11
Scientific & Environmental 4
Multiple Categories 8

Recording Sources

78s & Cylinders 7
Field Recordings 15
Broadcast 7
User Submitted 3
Multiple Categories 7


Sound vs. Other Media

Sound Only 9
Sound and Other Media 9

The research data indicates that half of the collections contained additional media besides sound. Sometimes this meant video, as in the cases of Dismarc, NFSA, and Spoken Word Services (SWS), sometimes it meant transcriptions or field notes related to the recordings, as in the cases of the Henry Reed Collection (HRC), the James Koetting Ghana Field Recording Collection (JKC), and the Milman Parry Collection (MPC), and sometimes it meant that extended profiles on selected topics (e.g. performers or instruments) with text and photos were made available as value added content (DEKKMMA and VG). This admixture of media did help explain why the category of audio-oriented collections was difficult to define in the first place, since it couldn't always be defined as databases of sound recordings.

In some cases, the other media were given item level descriptions like the recordings (Dismarc, NFSA, and SWS), while in others the additional media was accessible through a separate search path, or through a link from the recording they related to (DEKKMMA, MPC, and VG). It is likely that this trend towards audio-oriented collections filled with supporting material in other media will only continue, especially with the changing practices of constituencies like oral historians, who are doing more of their work with video (Sipe, 1991). In many cases, the additional media content supported the audio content by providing more context, especially in the case of field recordings. For instance, transcripts were available for some spoken word recordings.

Collections Features

The presence of maps as a feature in the search and discovery interfaces was an especially interesting finding. While not found in overwhelming numbers, they were observed in recently constructed sites, like Archival Sound Recordings (ASR), and in sites that featured user submitted content, like the Freesound Project (FSP), SemArch, and XCA. Three of the five examples used embedded Google maps (ASR, FSP, and XCA), one used a clickable map of Africa as a search tool (DEKKMMA), while the fifth used maps for information only (SemArch), in a navigable browsing hierarchy of locations. The Google maps could be used to visualize the location of content or as a search mechanism. Sometimes a location indicated a precise longitude and latitude (FSP and XCA), but sometimes it indicated a general region of origin (ASR).

Collection Features

Maps 5
Tagging & User Feedback 9
Metadata Schema 11
Controlled Vocabulary 5

The detailed investigation of interactive user features like commenting, tagging, designating 'favorites', and recommender systems was not part of the research question, but information on these aspects was recorded nonetheless to further contextualize the collections. The ability to create a list of favourites was the most commonly offered feature. Tagging and commenting are more directly related to metadata, but these features were only seen in a few cases (ASR, FSP, and XCA). FSP gives users the ability to tag, add multiple descriptions, and leave comments, which together form the core elements of FSP's descriptive metadata. Whereas in the case of ASR, tagging supplements, but does not replace the official metadata.

Only partial information about metadata schema and descriptive standards employed was collected, since the information was not always available. Nevertheless, data for some collections was found. The standards identified included MARC, MODS, METS, EAD and DC. Various controlled vocabularies were employed, such as LCSH, BBC keywords, and collection defined vocabularies (Dismarc). This is an area where further research is needed.

Metadata Elements: Collection Level

Counting and comparing the number of metadata elements used in records from the various collections is a method that reveals some characteristics of those collections and allows comparisons to made between collections. The average number of elements per record ranges from 9.6 for Dismarc samples, up to 24 for XCA samples, for a mean value of 14.77. The number of elements each set of five samples held in common ranges from 2 to 24, with a mean value of 11.

Summary of Collection Level Analysis

  • Column A: Average number of elements per record.
  • Column B: Number of elements common to all sample records.
  • Column C: Total number of unique elements found in all sample records.
  • Column D: Difference between column A and column B.
  • Column E: Difference between column C and column B.
  A B C D E
Average Values For All Collections 14.77 11 19.06 3.77 4.29
3 Examples          
James Koetting Collection [JKC] 12 12 12 0 0
BL Archival Sound Recs. [ASR] 10.75 5 25 5.75 14.25
Spoken Word Services [SWS] 10.2 7 14 3.2 3.8

Some basic characteristics may be observed. First of all, the average number of elements per record (column A above) gives an indication of degree of description or cataloguing level of each collection, and in general, this is found to be beyond a merely basic level. This indicates that descriptive metadata is significant for the collections. In fact, the average number of elements per record across all collections is close in number to the fifteen elements in the Dublin Core metadata set. It's hard to know whether this says anything about Dublin Core, or whether it is just coincidence.

Second, the number of elements common to all the samples from a collection (column B), combined with the total number of unique elements seen in those samples (column C) can be compared to the average number of elements to give an indication of the consistency of application of a collection's metadata scheme, whatever it might be. In particular, the difference between columns B and C reveals a picture of variation from collection to collection in the consistency of application. On the one hand, collections like JKC were found to have a very consistent set of elements, while collections like ASR had a much lower number of common elements and a high number of unique observed elements – much higher than the average number of elements per record, in fact. This variation in consistency may be the result of inconsistent cataloguing, but, more likely, it is caused by heterogeneity in the content.

In fact, the collections where one finds the greatest difference between the total elements observed and the average number per record, or between the average number and the number held in common are ASR, Dismarc, NFSA and the Digital Library of Appalachia (DLA). These collections are either composed of many individual collections (Dismarc and DLA) or are processed by a single institution but display highly diverse content (ASR, NFSA). ASR has dozens of separate collections, each consistent in character with itself, but quite different each to the other.

One would expect to find a similar situation on sites with user-submitted content like the Freesound Project, where the content is limited only by the imagination of its users, but, in fact, the deployment of elements in that collection remains quite consistent. This might be explained by the fact that, unlike archival holdings or important digitization projects, the recordings are being added progressively by users, one at a time, and their character is unknown by the system designers ahead of time. Therefore, there has been no attempt to accommodate specialized fields or radically heterogeneous sets of pre-existing metadata.

In a similar way, specialized collections that focus on a very particular type of material, such as one person's field recordings, can be quite consistent in their use of metadata elements. The ILAM (International Library of African Music) Digital Sound Archive, JKC, SemArch, and XCA collections all displayed no difference between the average number, the number in common and the total number observed. HRC is also close to this mark, considering the high number of elements it uses. Use of a database structure was observed to have some effect on consistency, in some cases. For instance, SemArch is apparently run with an Oracle database. However, singularity of purpose or common source of content were the more common explanations for element consistency. For instance, the content in HRC, ILAM, and JKC comes from a single ethnographer, in each case, and the XCA collection is very specific in its purpose of collecting bird songs. The nature of the birdsongs itself, lends itself to relatively straightforward (though potentially very specific) description, as there is no need to assign topical subjects to songs, or list performers, instrument names, and so forth.

In addition to more consistently used fields, specialized collections tended to use a higher number of elements than general collections. Table 4 arranges the collections in ascending order of average number of elements per record. At the low end of the range (between 9.6 and 10.8 average elements), we find collections that contain commercial 78s and cylinders (ASR, CHARM, and CPDP), broadcast material (SWS), as well as aggregate style collections (ASR, Dismarc). In the middle section (12-16.2 elements), we find more focused collections, including three that have a consistent set of elements. The top end of the range consists primarily of specialized collections.

One reason that specialized collections had more elements, on average, per record, could be due to the fact that, in a collection of recordings all produced as part of a comprehensive research program (DEKKMMA), or by a single collector (HRC and MPC), the meaningful difference between records is found in the details. Collections of homogenous material also lend themselves well to comprehensive musicological analysis of forms (HRC) or scientific analysis of sound characteristics (DEKKMMA and XCA), which would likely increase the number of elements in use. It should be noted that NFSA and VVL make use of finding aid style parent/child records for collections and items in collections. As mentioned above, since the unit of study was the item, elements from the parent and child records were combined, which may mean that they had a greater number of elements than an equivalent bibliographic style record. Also, the null fields in DEKKMMA, including a few that were not every used in the same records (e.g. voice type and instrument name) likely explains to some extent the high number of total fields, as well as the greater difference between the total number of fields and the average number of fields, and between the average and the number of common fields.

Metadata Elements: Aggregate Level

After examining the characteristics of metadata on a collection level, the final step in the analysis was to derive a comprehensive view of trends and commonalities across the collections. While they were distinguished from each other by content type the collections were united in being 'oriented' towards audio, and the hope was that some indication of more specific characteristics they shared would emerge from a study of the most common metadata elements observed.

The principle behind choosing to include elements that had appeared in at least 75% of the sample records was that commonalities would be emphasized and general trends would be easy to see. A limitation of the method was that it did not allow nuances and details to be taken into account. But the number of samples taken from each database is small enough that the kinds of conclusions that are appropriate to draw from the results are general ones. In the discussion that follows, statements such as 'two thirds of collections had title metadata' should be taken to mean 'two thirds of collections had title metadata as a consistent element in the sample records examined.'

Summary of Aggregate Level Analysis

  • This tables summarizes Table 5, which summarizes the complete list of elements found in Table 6
  • Elements that appeared in 75% of the sample records from a given collection were included in the compiled list.
  • Elements were organized into groupings through an intuitive affinity process.
  • The groupings were given names and organized as sub-categories of seven metadata categories identified by Lagoze, Lynch, & Daniel (1996).
Metadata Categories Groupings Number of Metadata Elements in Groupings Percentage of Total Metadata Elements
1.0 Descriptive 8 149 66.80%
2.0 Administrative 1 23 10.30%
3.0 Terms & Conditions 1 4 1.80%
4.0 Content Ratings 1 2 0.90%
5.0 Provenence 2 15 6.70%
6.0 Linkage 1 5 2.20%
7.0 Structural 3 26 11.70%

Of the 223 elements included in the comprehensive list, 149, or roughly two thirds, were considered to be descriptive metadata. Seven groupings included elements from at least two thirds of the collections: title (17 collections), author (16), description (16), identification numbers (14), keywords (13), date (13), and place (12). Of these, only identification numbers was not considered a descriptive element (see methodology for a discussion of this grouping). Also, it should be said that if all the groupings under the category structural (7.0) had been combined into a single grouping, it would have included 14 collections. If one adds the categories of administrative (2.0) (which contains the grouping identification numbers) and structural (7.0), they represent approximately 23% of the total number of elements.

One would expect elements like author and title to be common, and identification numbers were emphasized in the ARSC rules (ARSC, 1995), but it was somewhat surprising to find that date and place represented such a significant percentage of the total elements in category 1.0. There were almost twice as many place elements as there were collections that they came from, on account of the fact that 12 elements came from just 3 collections. Elements that recorded specialized musical details, such as key or range were not in widespread use, but the fact that such a high number (21) occurred consistently in four collections is interesting to note and probably due to a tight collection focus.

Other key findings include the fact that there were almost twice as many elements as there were collections in the author grouping, which confirms the intuitive understanding that recordings have multiple contributors. Almost all collections used a description or notes field. In fact, slightly more collections were represented in the description/notes grouping (1.3) than in the keywords grouping (1.4). When taken with the prevalence of date and place elements, this suggests that audio content tends to require explanation or contextual information to be meaningful.

Groupings Within the Category of Descriptive Metadata (1.0)

Grouping Name Number of Collections Represented Number of Elements Percentage of All Elements in Category 1.0
1.1 Title 17 22 14.80%
1.2 Contributor/Author 16 29 19.50%
1.3 Description/Notes 16 18 12.10%
1.4 Keywords 13 16 10.70%
1.5 Date 13 16 10.70%
1.6 Place 12 21 14.10%
1.7 Language 5 6 4.00%
1.8 Music Specific Details 4 21 14.10%

This website was created by John Huck in March, 2010
School of Library and Information Studies
University of Alberta