Project Details

One of the goals of this project was the implementation of an SRU client or server system. Installing and configuring such a system would provide the author with practical experience and knowledge beyond what can be gleaned through reading articles and specifications. Initially, the author hoped to be able to develop a production SRU server for the University of Alberta Libraries. As mentioned above, they have an SRU server that serves a specific function within the library's service model. Because its development was driven by a single specific need, there are changes that could be made to the system to improve its functionality and bring it more into line with organizational standards. The library was approached with this proposal and was genuinely interested and supportive of the project. The lack of suitable staff to supervise such a project within the time frame of the course precluded working on their production environment. Permission was obtained to access the library's Z39.50 server for this project.

The network environment at the University of Alberta presents a number of challenges for server-based applications. Any computer connected to a student network, whether wired or wireless, must first authenticate using a Campus Computing ID (CCID) and password. As well, in-bound traffic to the computer is regulated, rendering the establishment of a sandbox server infeasible. Because the various software packages have a multitude of dependencies, installation on the university's General Purpose Unix (GPU) servers was also not feasible. The author initially used his personal computer as a test environment, but was unable to devise a suitable method to make it accessible for evaluation.

To address these concerns, the author opted to use a virtual appliance. The virtual appliance uses bridged networking through the host computer. As the virtual appliance is started manually after a network connection is successfully made from the host computer, the authentication issues no longer exist. Connecting to the service using software on the host computer bypasses any restrictions present on the university network. The remainder of this section discusses the project, providing details on the virtual machine set up, configuration and access. Appendix F provides details on the use of the virtual machine and other associated files.

Virtual Machine Selection and Preparation

CentOS 5.2 was installed on a virtual machine running under the VMware Player application. CentOS is a Linux distribution that is binary-compatible with Red Hat Enterprise Linux. Instead of creating a virtual machine from scratch, an existing appliance was downloaded from Linhost (Ventura, 2009). In preparation for the installation, several required packages were installed using the native package manager. The dependencies were:

The CentOS distribution comes with security features enabled which interferes with communication on certain ports by unknown programs. For a production environment, the software should be configured to recognize and allow traffic related to your specific services. For testing and experimentation in a non-production environment, the simplest solution is to disable the Security Enhanced Linux (SELinux) program. This can be accomplished a number a ways, but the easiest is to use the system-config-securitylevel program to disable the program. This must be done while logged in using the root account.

YAZ Software Installation

The next stage was downloading and installing the yaz, yaz++, and yaz-proxy packages from the software page at Index Data's web site. The standard CentOS software repositories do not include these software packages, so they had to be downloaded, configured and compiled on the virtual appliance. This was accomplished using the normal process which includes configure (automated detection of system properties and setting of compiler properties), make (automated compilation of programs) and make install (automated installation of programs, libraries and documentation within the system structure). If any unsatisfied system dependencies exist, error messages will be displayed during the configure and make steps.

Installation packages are available on the Index Data web site for specific Linux distributions. However, compiling from source was necessary in this case because packages did not exist for either CentOS or Red Hat. Additionally, compilation from source is an option on most operating systems, and the author felt the exercise was worth the additional steps.

Client Software for Testing

Three different software clients were used to connect to Z39.50 and SRU implementations during this project. The command line tool yaz-client was used to test basic connectivity. The Mercury Z39.50 Client was used to test searches. Finally, a web browser was used to test URL-based access to the server.

yaz-client

Designed for use as a command-line client, the yaz-client program provides a range of options for testing. It supports both Z39.50 and SRU searches, and allows the user to make quick changes to connection strings and queries while receiving immediate feedback.

The yaz-client has numerous options, but the simplest form of starting the program is yaz-client target, where target is the connection information for the database. By default, the target is interpreted as Z39.50 server, but if the connection information includes the http protocol information, it switches to SRU. The command yaz-client z3950.loc.gov:7090/voyager connects to the Library of Congress Z39.50 server, but yaz-client http://z3950.loc.gov:7090/voyager connects to the SRU server. In this case, both services are being provided by the same program on a single port.

Conducting a simple search involves issuing a find command. Retrieving records is accomplished using the show command. Examples of both Z39.50 and SRU searches are shown in Appendix A and Appendix B respectively. These code listings also show the difference between PQF and CQL queries.

Mercury Z39.50/SRU Client

Basedow Information Systems provides a graphical client program which provides search capability using both Z39.50 and SRU. The client comes pre-configured for a number of institutions around the world, but adding new targets is not difficult. Like yaz-client, the way to define an SRU target is to include the protocol in the Z39.50 URL field. Figures in Appendix C and Appendix D show the configuration for Z39.50 and SRU targets and the results of example searches.

Web Browser

Because SRU is a web-based protocol, a web browser can be used to query SRU targets. In the absence of specifically programmed client routines, the queries can be handled by either constructing the URL by hand or using an HTML form to submit the queries to the server. The author was able to construct a form to query the service, but was unable to develop an interface as polished as the OCLC Open SiteSearch Documentation site. Further study and programming ability is required to accomplish that task.

Nonetheless, the web browser was an extremely useful tool in exploring the SRU service. One of the major benefits of SRU is its ability to be accessed by normal web methods. Both Mercury and the yaz-client are programs designed to operate with these protocols. The ability to successfully query the server using only a simple web form in a standard web browser demonstrates conformance to the HTTP protocol, and the results displayed as XML demonstrate conformance to that ubiquitous data format.

YAZ Configuration

This project uses YAZ proxy to provide an SRU interface by connecting to the University of Alberta's Z39.50 service. As mentioned above, most SRU interfaces to library catalogues are provided by this type of arrangement. Taylor and Dickmeiss (2006) provides details on the Library of Congress implementation which follows the same model.

The YAZ proxy configuration is controlled by a text file named config.xml. A complete example of this file is provided in Appendix E. As indicated by the file extension, XML and associated technologies are used for most of the configuration of the program. The records are formatted using XSL style sheets. The main exception to the use of XML is the pqf.properties file which maps CQL and Perfect Query Format (PQF) notation to each other. PQF was introduced as part of the YAZ toolkit, and has been adopted by other Z39.50 tools as an alternative to type-1 or reverse polish notation (RPN). (Taylor and Dickmeiss 2009)

The YAZ proxy software can be run by providing arguments on the command line. This is useful for preliminary testing, especially when having difficulties with connection information. The syntax from the command line is yazproxy -t hostname:port/database @:localport where hostname, port and database are the connection parameters for the backend server and localport is the port to be used for the SRU and Z39.50 services. For example, the command yazproxy -t z39.50.loc.gov:7090:voyager @:210 would create a service on port 210 of the local machine that proxies information from the Z39.50 target located at the Library of Congress. By default, this will only provide a useful connection for Z39.50 as the configuration elements necessary to create an SRU service are missing from the command.

config.xml

In its simplest form, creating a target in the config.xml file involves providing connection information, most notably the host, port and database names for the target. However, there are a number of features that may be defined which extend the functionality of the service. The full details are given in the YAZ Proxy User's Guide and Reference but the following elements are of particular importance.

Proxy Configuration Header

Config.xml is an XML file, and as such must be well-formed. It must start with the XML declaration and must have a single root element. In this case, the root element is the proxy element.

	<?xml version="1.0" encoding="UTF-8"?>
	<proxy xmlns="http://indexdata.dk/yazproxy/schema/0.9/">
	    <!-- remainder of configuration goes here -->
	</proxy> 

Target Element

The target element defines one target. A YAZ proxy server can proxy for multiple targets, so the target element may be repeated. All configuration elements for a particular target must be contained within the same target element. The name, database and default attributes define the name which will be used in the connection URL, the database used to connect to the backend target, and whether the target is the default for searches if a name or database is not explicitly defined by clients.

The url element defines the connection for the proxied target.

	<target name="neos" default="1" database="Unicorn">
		<!-- remainder of target configuration goes here -->
		<url>ualapp.library.ualberta.ca:2200</url>
		...
	</target> 

Attribute Element

The attribute element specifies valid combinations of attribute pairs, based on the Z39.50 connection attributes for use, relation, position, structure, truncation and completeness. This element can repeat. It can also be omitted if the backend target handles rejection of unsupported attribute types. It can also be used to supply an error message for unsupported attributes, which can be useful if the backend server does not gracefully handle unsupported types.

	<attribute type="1" value="1-1016" />
	<attribute type="1" value="*" error="114" /> 

Syntax Element

The syntax element specifies valid record syntax requests from a client. This element may provide, among other things, information on schema, designate error messages in a similar fashion to the attribute element and specify style sheet information for XSL transformation.

	<syntax type="usmarc" />
	<syntax type="xml" marcxml="1"  
	  	identifier="info:srw/schema/1/marcxml-v1.1">
	 	<name>marcxml</name>
	<syntax>
	<syntax type="xml" marcxml="1" 
		identifier="info:srw/schema/1/dc-v1.1"
	   stylesheet="/usr/local/share/yazproxy/MARC21slim2SRWDC.xsl" >
		<name>marcxml</name>
	<syntax>
	<syntax type="*" error="238" /> 

Cql2rpn Element

The cql2rpn element defines the location of a file which provides the details of CQL to RPN conversion. This is required for SRU searches to operate with backend Z39.50 servers that do not natively support CQL queries. Most Z39.50 servers require this element to function properly. The YAZ User's Guide and Reference discusses this in detail, but the pqf.properties file provided with YAZ proxy is generally sufficient.

	<cql2rpn>/usr/local/share/yazproxy/pqf.properties</cql2rpn> 

Explain Element

The explain element provides the response to explain operation requests from clients. In a perfect world, the software would automatically provide the explain response based on the software configuration. However, this is not feasible for most current implementations. Usually, YAZ proxy or similar software is used to connect to an existing Z39.50 server. As the proxy has no way of knowing how the remote server is configured, it cannot provide the configuration and feature details. The server administrator must manually configure this response.

The explain record can be as detailed as the administrator chooses. It can include the bare minimum, such as connection information, or can include a wealth of information about the context sets, record schema and indexes provided by the service. A more complete example of an explain record is included as part of the config.xml example in the appendices.

	<explain xmlns="http://explain.z3950.org/dtd/2.0/">
		<serverInfo>
			  <host>192.168.213.128</host>
			  <port>9000</port>
			  <database>neos</neos>
		</serverInfo>
	</explain> 

Previous: Related Standards Next: Conclusion