Thesaurus
A thesaurus is a semantic tool used for information retrieval, query expansion and indexing, among other purposes. It is basically a selection of the basic vocabulary in a domain supplemented with information about synonyms, homonyms, generic terms, part/whole terms, “associative terms” and other information (e.g. frequency and history of terms in a given database).

Peter Marc Roget (1779-1869) produced the first edition of Thesaurus of English Words and Phrases. (Roget, 1852/1992), which is recognized as the first thesaurus. The structure  of this thesaurus was, according to Roget in his introduction, a "verbal classification.. . the same as that which is employed in the various departments of natural history".

In the modern sense is the thesaurus a child of information retrieval and information science. The year 1964 is important in the development of modern thesauri for information retrieval. Two thesauri where published: "Euratom-Thesaurus", the first published thesaurus applying the graphical method to display the paradigmatic relation between descriptors and "Thesaurus of Engineering Terms", which have been a model for later thesauri. Following the development of electronic bibliographic databases made thesauri very popular, and the thesaurus became a common follower of such databases, first in the sciences, then in the social sciences and also to a certain degree in the humanities (e.g. in Architecture and music).

According to Sparck Jones (1992, p. 1609) was the theory of semantic primitives influential  in early thesaurus construction: "A thesaurus was seen as providing a set of domain-independent semantic primitives.". According to this theory can every word be broken up into primitive kernels of meaning, semantemes (also called semantic features or semantic components). Semantemes are terms that are used to explain other terms or concepts but cannot themselves be explained by other terms. The process of breaking words down into semantemes is known as componential analysis and has been most often used lo analyze kinship terms across languages. The components are often given in considerable detail.

 

 

“Most thesauri establish a controlled vocabulary, a standardized terminology, in which each concept is represented by one term, a descriptor, that is used in indexing and can thus be used with confidence in searching; in such a system the thesaurus must support the indexer in identifying all descriptors that should be assigned to a document in light of the questions that are likely to be asked. . . .

A good thesaurus provides, through its hierarchy augmented by associative relationships between concepts, a semantic road map for searchers and indexers and anybody else interested in an orderly grasp of a subject field.”  (Soergel, 2004).

 

Important semantic relations used in thesauri

"Scope note"           A definition of the term or an explanation of the meaning of the term and its use in a specific database.

 

Non-descriptor        A synonym used as a lead-in term to a descriptor.

 

U: Use                   Reference to a descriptor. (Sometimes, e.g. INSPEC-thesaurus (1993), termed "lead-in terms “or “cross-references”). The relation between lead-in term and descriptor is one of synonymy.

 

UF: Used For           "Preferred term" cross reference to  "lead-in"-terms. (Synonym relation)

 

Parenthetical

qualifiers:              Devise used to distinguish between different meanings of a word (homograph/Homonym relationship).

                                   Example: Letters (Alphabet);

                                                Letters (Correspondence);    

 

BT: Broader term    Sometimes a distinction is made between “Generic broader terms” and “partitive broader terms:

                   BTG: Broader Term Generic 

                                                         Example:    Lion

                                                                                      BTG: Mammals

                   BTP: Broader Term Partitive

                                                         Example:    Zealand

                                                                                      BTP: Denmark

 

NT: Narrower term Sub-concept. Again a distinction may be made between generic and partitive sub concepts.

                   NTG: Narrower Term Generic

                                                         Example:    Mammals

                                                                                     NTG: Lions

                   NTP: Narrower Term Partitive

                                                         Example:    Denmark

                                                                                     NTP: Zealand            

 

RT: Related term  Other kinds of relations than Generic/Partitive and synonym/homonym relations.

 

TT: Top term.      Symbolizes the highest hierarchical level in the thesaurus. (Generic or partitive relation).

                                                         Example:    Zealand

                                                                                  BTP: Denmark

                                                                                  TT: Geographical areas

Rotated index      Alphabetical index, each word in a phrase is an access point (Syntagmatic relations).

Thesaurofacet      Facet applied in a thesaurus. (Paradigmatic relations).

 

"The explosive growth of Web search engines, with their primitive algorithms, has had some rather unfortunate effects, to my mind. Some of these engines appear to have been developed by people who saw a need, but who had not the vaguest idea that there was already a history of development of tools to fulfill similar needs. There is little evidence that some of these developers had ever used either Dialog or a library catalog.

    We should distinguish kinds of tools for facilitating access to full text on the basis of the attention they give to semantics. Older, exact-match (Boolean) systems give no attention to semantics. The search terms must appear in the document for it to be retrieved; if a term appears at all the document will be retrieved regardless of whether the term is important to the meaning of the document or not. Another approach relies on statistical information -- co-occurrence of words in the document, frequency, etc. Boolean and statistically-based systems have been found to have comparable retrieval performance, but to produce very different retrieval sets. That is, searches of the same database using a Boolean engine and a statistically-based one often produce about the same number of relevant hits, but there may be little overlap between the two sets of hits. " (Milstead (1998)

 

 

”It has come to be self-evident that a classification scheme is an indispensable tool when compiling a thesaurus. When the editor is forced to work solely within an alphabetical list of numerous descriptors, at the level of the individual term, there is a sense of working “blind”. In contrast, where a rigorous classification is developed, providing an overall picture of the subject area, the compiler has a better chance of building accurate and meaningful relationships between the terms. “ (Aitchison & Dextre Clarke, 2004, p. 10).

 

 

 

Literature:

 

Aitchison, J. (1986). A Classification as a Source for a Thesaurus: The Bibliographic Classification of H. E. Bliss as a Source of Thesaurus Terms and Structure. Journal of Documentation, 42(3), 160-181.

 

Aitchison, J. & Clarke, S. D. (2004). The thesaurus: A historical viewpoint, with a look to the future. Cataloging & Classification Quarterly, 37(3/4), 5-21. Co-published simultaneously as: The thesaurus: review, renaissance, and revision. Ed. by Sandra K. Roe & Alan R. Thomas. New York: Haworth Information Press. (Pp. 5-21).


Aitchison, J.; Gilchrist, A. & Bawden, D. (2002). Thesaurus Construction: a Practical Manual. 4. ed. London: ASLIB.

DIN 1463 (1987). Erstellung und Weiterentwicklung von Thesauri: Einsprachige Thesauri. 2. Ausg. Berlin: Deutsches Institut für Normung e.V. (DIN 1462, teil 1).

 

Evens, M. (2002). Thesaural relations in information retrieval. IN: Green, R., Bean, C.A. and Myaeng, S.H. (Eds), The semantics of relationships: an interdisciplinary perspective, Kluwer Academic Publishers, Dordrecht, pp. 143-160.

Foskett, D. J. (1975). Thesaurus. IN: Kent, Allan (ed.): Encyclopedia of Library and Information Science, Vol. 30. New York: Marcel Dekker. (Pp. 416-463).

 

Gilchrist, A (2003). Thesauri, taxonomies and ontologies - an etymological note. Journal of Documentation 59(1), 7-18.

ISO 2788 (1986). Guidelines for the Establishment and Development of Monolingual Thesauri. 2.ed. International Organisation for Standardisation (ISO). (Også som dansk standard: DIS 2788, Retningslinier for opbygning og udvikling af ensprogede tesauruser. Hellerup: Dansk Standardiseringsråd, 1985).

Krooks, D. A. & Lancaster, F. W. (1993). The Evolution of Guidelines for Thesaurus Construction. Libri, 43(4), 326-342.

 

Miller, U. (2003a). Thesaurus construction. IN: Encyclopedia of Library and Information Science. New York: Marcel Dekker. (Pp. 2800-2810).

 

Miller, U. (2003b). Thesaurus and New Information Environment. IN: Encyclopedia of Library and Information Science. New York: Marcel Dekker. (Pp. 2811-2819).

 

Milstead, J. L. (1998). Use of Thesauri in the Full-Text Environment. Based on a paper presented at the 34th Clinic on Library Applications of Data Processing. (Cochrane, Pauline A., and Eric H. Johnson, eds. Visualizing Subject Access for 21st Century Information Resources; Proceedings of the 34th Annual Clinic on Library Applications of Data Processing, March 2-4,1997. Champaign, IL: Graduate School of Library and Information Science, University of Illinois, 1998. p. 28-38.)  http://www.bayside-indexing.com/Milstead/useof.htm

 

Milstead, J. (1995). Invisible thesauri: the year 2000. Online & CDROM Review, 19(2),
93-94.


Rada, R. (1990). Maintaining Thesauri and Metathesauri. International Classification, 158-164.
 

Roberts, N. (1984) The Pre-History of the Information Retrieval Thesaurus. Journal of Documentation, 4(4), 271-285.

 

Roe, S. K. &  Thomas, A. R. (Eds.). (2004). The Thesaurus: Review, Renaissance and Revision. New York: Haworth Information Press.

 

Roget, P. M. (1852/1992). Thesaurus of English words and phrases, classified and arranged so as to facilitate the expression of ideas and assist in literary composition. (Facsimile of the First Edition). London: Bloomsbury Books. Project Gutenberg's version: http://www.gutenberg.org/cache/plucker/10681/10681

 

Roget, P. M. (). Peter Roget’s classic structure coupled with Mawson’s modernization . http://www.bartleby.com/110/

 

Soergel, D. (2004). The Arts and Architecture Thesaurus (AAT). A critical appraisal. http://www.dsoergel.com/cv/B47_long.pdf

 

Sparck Jones, K. (1992). Thesaurus. Vol. 2, pp. 1605-1613 IN: Encyclopedia of Artificial Intelligence. Vol. I-II. Ed. by Stuart C. Shapiro. New York: John Wiley & Sons.

Van Slype, G.  (1976). Definition of the Essential Characteristics of Thesauri. Brussels: Bureau Marcel van Dijk.

 

Will, L. (2006). Glossary of terms relating to thesauri and other forms of structured vocabulary for information retrieval. http://www.willpowerinfo.co.uk/glossary.htm

 

 

 

HILT - High-Level Thesaurus. A-Z of thesauri. http://hilt.cdlr.strath.ac.uk/Sources/thesauri.html                                                                                                                                                                

 

 

See also: Metathesaurus; Search thesaurus; Thesaurofacet

 

 

 

Birger Hjørland

Last edited: 10-08-2007

HOME