Assigned versus derived indexing

Derived indexing terms are terms occurring in the text to be indexed. Assigned terms are terms not occurring in the text.

 

"There are essentially two approaches to the creation and maintenance of this document or knowledge representation. One is to create a knowledge system in advance and assign the documents to it afterward: assigned indexing. The other is to derive the terms of the index language from the documents themselves: derived indexing. The manual library systems, in which books were classified according to an existing classification system, for example, Dewey or UDC, are assigned indexing systems; computerized IR-systems which extract keywords from the documents according to a weighting scheme are typically derived indexing systems. We will extend the definition of assigned indexing systems to contain all systems that use terms in their docreps [document representations] that are not taken from the documents themselves, because such external terms belong to a knowledge representation outside the document.
    The derived indexing systems became very popular when the computer made it easy to create an inverted list of all the words occurring in a document base. In the 1970s and ’80s much effort was put into the development of techniques to identify such words (phrases, sentences) in the inverted lists as were most efficient in retrieving particular documents (for a discussion of both derived vs. assigned and pre-coordinative vs. post-coordinative systems, see Foskett [1982])." (Paijmans, 1993).

 

 

Assigned terms may come from external semantic resources (e.g., authority files, classification systems or thesauri) or other kinds of external information.

 

Derived indexing systems are generally more primitive compared to assigned systems (or, of course, combinations). It is easy mechanically to mark a text for words to appear in an index, and to construe an index on this basis. However, users searching for a concept using a synonymous term, a broader term or a narrower term, will miss the information. This is the rationale behind controlled vocabularies.

 

 

Assigned terms may, on the one hand simply substitute terms represented in the document with other terms, e.g. from a controlled vocabulary. On the other hand, an assigned term may represent a conceptualization of the document, which is not expressed in the document with any terms. A romantic poem, for example, does not describe itself as such, but may be assigned the term "romantic poem".  It is common to classify documents according to an organization of disciplines. Documents may or may not describe their disciplinary memberships. Even if they do, the authors organization of disciplines may be different from those chosen to be assigned by a library or an information system. Assigning terms, which a not simple substitutions of synonyms, but which represents independent conceptualizations of document contents may turn out to be the most important area in which human indexing is better than automatic indexing.

 

 

Literature:

 

Foskett, A. C. (1977). Assigned indexing I: Semantics. In: The subject approach to information (pp. 67-85). London: Clive Bingley.


Foskett, A. C. (1982). The subject approach to information. 4th ed. London: Bingley.

 

Paijmans, H. (1993).  Comparing the document representations of two IR-systems - CLARIT and TOPIC. Journal of the American Society for Information Science, 44(7), 383-392.

 

 

See also: Indexing; Indexing theory

 

 

 

 

Birger Hjørland

Last edited: 21-07-2006

HOME