This web site will describe the steps taken to compile a small thesaurus for fifteen subject statements. The thesaurus was built using faceted classification analysis. While this is just a small-scale thesaurus project, the steps followed could easily be scaled for a larger project.

Step 1 - Analyze the Subject Statements

Facet analysis can be performed using a bottom-up approach. At the bottom are indexable concepts known as isolates. These are the terms that represent the field of study for which the thesaurus is being built. These terms are extracted directly from the body of literature of the field which ensures that the thesaurus contains terms that are relevant to the thesaurus subject matter. Typically, a thesaurus compiler would review the documents in a collection and create subject statements to capture the aboutness of the documents. Then, using these subject statements, indexable concepts can be identified. Indexable concepts are typically the nouns, verbs and other phrases in these statements. For example, the statement “Ordering catalogue cards for rural reference libraries” contains the following isolates: ordering, catalogue cards, rural reference libraries. Occasionally, a slight rewording of the isolate is needed - for example, rather than have “Canadian”, “Canada” was chosen instead. From the fifteen given subject statements, approximately 55 isolates were identified.


Step 2 - Facet Analysis

Facet analysis can be thought of as grouping logically related terms together in a framework. The framework was developed by S. R. Ranganathan and has evolved over time but the principles remain constant. Ranganathan wrote that any subject can be described using five fundamental categories. That is, an entire subject i.e. its terms and concepts, can all be slotted into one of the five fundamental categories. When needed, the fundamental categories, or facets, can be sub-divided (sub-faceted) to help describe the subject matter more specifically. The five categories described by Ranganathan are personality, matter, energy, space, and time or PMEST (Aitchison, Gilchrist, & Bawden, 2000, p. 70). The terms “personality” and “energy” are difficult to grasp, thus researchers have further refined these terms to make them easier to understand and use. “Personality” can be called or thought of as “Entities/things/objects” which is then further divided to include divisions such as “Abstract entities” and “Artifacts (man-made)”, and “energy” can be called “Actions/activities”.

Using these five top-level categories and their sub-categories, the isolates were placed in the appropriate slot. Some isolates were easy to place, such as “North America” which falls obviously under “Space” but others such as “health research community” did not seem to fit anywhere. When faced with difficult terms such as this, it helps to look at the term differently and try to determine what the term is trying to describe. For this example, I decided that health research community was a way to describe a “group of health researchers”, which then led to it being placed in the “Agents” category.

As the terms were placed, certain logical groupings began to emerge. There is a large number of library types in the isolate list, thus they became grouped under “Agents”, along with schools and archives. It became apparent that rather than listing each library-type at the same level as schools and archives, there should be a sub-facet of libraries and the types would be placed under here. Other groupings also emerged during this process and they were sub-faceted along similar lines. The completed facet analysis can be found here.


