History and Literature Review

This section presents the history of information retrieval standards, the development of SRU and the current literature on the SRU protocol.

MARC and Z39.50

Information protocols and standards for libraries have existed for decades. Arguably the most used standard for information interchange in the library community is the MAchine Readable Cataloguing (MARC) record which began as a pilot project by the Library of Congress in 1966 (Avrams, 1975). This standard has developed over the years, and currently is used in a variety of national and international formats. MARC records revolutionized the way libraries exchanged information, allowing catalogue records to be re-used without requiring each library to manually enter the data from each record into their own systems.

As computers became more commonplace, the integrated library system (ILS) became a fact of life in many libraries. Large academic libraries adopted them first, followed by larger public institutions. Today, almost all libraries in Canada and the United States operate some form of ILS, most of which provide networked access through the Internet. The network availability raised the issue of increased access to information at other libraries, both for patrons and for activities such as Inter-Library Loans (ILL).

The National Information Standards Organization in the United States established a working committee in 1979 to develop an information retrieval protocol. Z39.50-1988 was the result of that group's efforts. Since that time, Z39.50 has been revised numerous times to address short-comings and add new features, with the most recent version published in 2003. The Z39.50 protocol gave users have the ability to search multiple databases using the familiar interfaces available in their local software or online public access catalogue (OPAC). This protocol enables verification of bibliographic information, ILL and other information access related activities. Some federated search sites such as CRCnetBase and union catalogues such as TAL Online leverage the Z39.50 protocol to provide a single display of resources available from multiple institutions.

Z39.50 is not without criticisms. It uses protocol-specific methods for both communication and data exchange, forming an obstacle to implementation for many organizations. Although both the protocol and record formats are standardized, they are not widely used outside the library community. In addition to the unique nature of the software and interchange formats used, the networking ports used by the protocol are often blocked by municipal or corporate security policies. The protocol provides a wide range of configuration options for both servers and clients. The concomitant complexity raises further obstacles as both the server and client configurations must complement each other for full functionality. A mismatch in configuration can result in anything from unpredictable or incomplete results to a total lack of functionality. The Z39.50 protocol did provide for an explain operation geared towards assisting with configuration issues. However, the operation was never widely adopted due to its complexity and the lack of a common data exchange format.

SRU

The SRU protocol grew out of the Z39.50 protocol. Implementers desired a protocol that would address a number of issues with the existing protocol. The two primary goals were to use standard Internet protocols and communication formats for information interchange, thereby removing obstacles to implementation by information providers outside the traditional library community. The World Wide Web communicates using hypertext transport protocol (HTTP) and HTTP Secure (HTTPS) for communication. Adopting these protocols for communication removes the need for implementing specialized protocols simply to successfully connect with remote systems and removes the challenges of adding unusual communication ports to local security policies. Extensible Markup Language (XML) has quickly evolved into a widely used information interchange format, and the SRU developers adopted it as the basis for information exchange. XML has the added advantage of being easily adapted for use with a variety of types of information.

A technical committee was formed in 2001 to develop the new protocol. Because the changes in the information retrieval protocol were designed to allow greater integration on the web, the initial name for the protocol was Search / Retrieve Web Service using the initialization SRW. The terms SRW and SRU were used to differentiate between the methods available for web based communication. SRW communication using SOAP based access, while SRU uses the Representational State Transfer (REST) approach. The actual protocol operation is the same regardless of the communication method, and the current version of the protocol uses SRU to refer to both methods. The literature continues to contain references to both SRW and SRU.

The first version of the protocol was released in 2002, version 1.1 was released in 2004 and version 1.2 was released in 2007. Version 2.0 is currently being developed, with the latest draft released in July 2009 for comment by the community (Denenberg, 2009b). This work is being carried out by the Search Web Services Technical Committee of the Organization for the Advancement of Structured Information Standards (OASIS).

Literature Review

The available literature regarding the SRU protocol falls into broad categories. The largest group is descriptive in nature, either providing an introduction to the protocol or comparing it to other search protocols. A second group of articles reports on development activities, while a third group provides information on specific implementations. The literature dealing with SRU does so within discrete contexts, reporting on activities or projects rather that subjecting the protocol to qualitative or quantitative research.

Morgan (2004) provides an introduction to the protocol which is widely cited. Morgan opens by discussing the challenges the protocol is intended to address, and continues to describe the three basic operations of the protocol, complete with sample responses. He also provides a section on a sample application used to list journals in a repository. The final section of his article discusses how SRU and the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) complement each other, a theme further expanded upon by Sanderson, Young, and LeVan (2005). Morgan concludes his article with the statement "If index providers were to expose their services via [SRU], then the content of the 'hidden Web' would become more accessible and there would be less of a need to constantly re-invent the interfaces to these indexes." (summary) While it seems evident that such a standardization would be in the best interests of users of these indexes, research should verify this assumption given the resources necessary to implement SRU interfaces to large vendor databases.

Reiss (2007) examines the role SRU could play in metasearch, or "one-search access to multiple electronic resources" (Reitz, 2007). After providing an introduction to the protocol and placing it within the context of the "emerging bibliographic infrastructure" (p. 374), he describes hypothetical systems which could search bibliographic, archival and current events information on the Internet through a single SRU server implementation. Other protocols which may compete or complement the SRU protocol are briefly discussed after a review of three projects relating to metasearch. He concludes by stating "SRU is an important piece of the entire upgrade that the library community needs to undergo in order to modernize our bibliographic infrastructure in order to improve and expand the search and retrieval services that libraries need to provide our users" (p. 384). Much of the literature on SRU showcases two projects, one being the TEL project (described below), both of which are described in Reiss' article. The constant citation of these projects makes one wonder what other projects are using the protocol and why these two specific projects are so frequently mentioned.

Taylor and Dickmeiss (2006) report on the implementation of an SRU service at the Library of Congress as a front-end to their traditional Z39.50 search service. Their presentation deals mainly with the delivery of bibliographic records in an XML format, demonstrating the advantages of using open standards for the interchange of data. They report ancillary benefits as the implementation of the new service also improved the performance of the library's Z39.50 service. The authors discuss three different approaches to providing XML access to the bibliographic data, explaining the advantages and disadvantages as well as the reasoning behind the method chosen at the Library of Congress. Taylor and Dickmeiss are employees of Index Data, the company chosen to implement this solution and developers of well-known open- source software toolkits for Z39.50, metasearch and SRU. In their conclusion, they report that "there are several installations in use worldwide" but fail to provide details. As mentioned above, information on other installations would be useful in ascertaining the actual adoption of SRU. Van Veen and Oldroyd (2004) describe the implementation of a "co-operative framework ... for integrated access to the major collections of the European national libraries" (abstract). This project exemplifies many of the advantages of open standards. Commercial portals were considered, but TEL decided that "the SRU protocol proved to be more promising that the commercial portals." (Section 2, para. 8) One of the reasons for their choice was the low cost to entry, while another was the flexibility afforded by the tools. The variety of encoding schemes, languages and metadata sets between the libraries all had to be accommodated. The initiative described in this article also demonstrates the need for further work on interoperability. TEL developed its own registry for metadata activities to track metadata elements in use, under consideration or that have been rejected. This article describes a unique portal solution as the final portal software actually runs as JavaScript in the end user's web browser. Their conclusion is that this solution "offers a number of advantages ...[including]... scalability, functionality, low barrier of entry into TEL, and increased control of functionality for users, data providers and service providers. Last but not least, ... there is no longer a need for a central portal" (conclusion). Of the articles reviewed, this one provides the greatest detail on both the implementation process and the challenges faced when integrating diverse information resources.

Hafezi (2007) proposes a solution for national interoperability in Iranian libraries. Although the article is very hypothetical and exploratory in nature, it provides empirical data on the supported interchange formats in use in Iranian libraries. It discusses the challenges inherent in dealing with multiple languages as Persian, English and Arabic materials are common. The article also discusses the reliance on vendors for adherence to standards. The author concludes that the use of XML and SRU would aid in interoperability, ultimately resulting in lower costs and more satisfied users (p. 733). The article demonstrates that interoperability issues apply across cultures rather than being a strictly Western concern.

The use of SRU as a data source used in conjunction with unAPI is mentioned briefly in Chudnov et al. (2006), and is described in detail in Binkley (2009). Basically, unAPI "provides the few basic operations necessary to perform simple clipboard-like copy of content objects across all sites" (Chudnov, 2006). Binkley uses it to offer metadata records to unAPI-aware applications from within a search interface, in this case a university library's catalogue. If the user has an unAPI-aware extension loaded in the web browser, a link appears on the item record display. Clicking on the link presents the user with the choice of available record formats. Once the user selects the desired format, the unAPI server retrieves the record from the SRU service and returns the record to the user.

These articles deal primarily with the use of SRU in metasearch, data delivery, and portal applications. The use of SRU in TEL establishes the effectiveness of the protocol in connecting disparate systems, while the unAPI example demonstrates the usefulness of the protocol in providing data within web-based environments. The Iranian example shows the expanding need for interoperable systems around the globe. A protocol which leverages the ubiquity of the HTTP/HTTPS protocol and XML data format greatly increases the likelihood of adoption beyond the library community.

Previous: Introduction Next: Protocol Operations