Assigned versus derived indexing
Derived indexing terms are terms occurring in the text to be indexed. Assigned terms are terms not occurring in the text.
"There
are essentially two approaches to the creation and maintenance of this document
or knowledge representation. One is to create a knowledge system in advance and
assign the documents to it afterward: assigned indexing. The other is to derive
the terms of the index language from the documents themselves: derived indexing.
The manual library systems, in which books were classified according to an
existing classification system, for example, Dewey or UDC,
are assigned indexing systems; computerized IR-systems which extract keywords
from the documents according to a weighting scheme are typically derived
indexing systems. We will extend the definition of assigned indexing systems to
contain all systems that use terms in their docreps [document representations] that are not taken from the
documents themselves, because such external terms belong to a knowledge
representation outside the document.
The derived indexing systems became very popular when the
computer made it easy to create an inverted list of all the words occurring in a
document base. In the 1970s and ’80s much effort was put into the development of
techniques to identify such words (phrases, sentences) in the inverted lists as
were most efficient in retrieving particular documents (for a discussion of both
derived vs. assigned and
pre-coordinative vs.
post-coordinative systems, see
Foskett [1982])." (Paijmans, 1993).
Assigned terms may come from external semantic resources (e.g., authority files, classification systems or thesauri) or other kinds of external information.
Derived indexing systems are generally more primitive compared to assigned systems (or, of course, combinations). It is easy mechanically to mark a text for words to appear in an index, and to construe an index on this basis. However, users searching for a concept using a synonymous term, a broader term or a narrower term, will miss the information. This is the rationale behind controlled vocabularies.
Assigned terms may, on the one hand simply substitute terms represented in the document with other terms, e.g. from a controlled vocabulary. On the other hand, an assigned term may represent a conceptualization of the document, which is not expressed in the document with any terms. A romantic poem, for example, does not describe itself as such, but may be assigned the term "romantic poem". It is common to classify documents according to an organization of disciplines. Documents may or may not describe their disciplinary memberships. Even if they do, the authors organization of disciplines may be different from those chosen to be assigned by a library or an information system. Assigning terms, which a not simple substitutions of synonyms, but which represents independent conceptualizations of document contents may turn out to be the most important area in which human indexing is better than automatic indexing.
Literature:
Foskett, A. C. (1977). Assigned indexing I: Semantics. In: The subject approach to information (pp. 67-85). London: Clive Bingley.
Foskett, A. C. (1982). The subject approach to information. 4th
ed. London: Bingley.
Paijmans, H. (1993). Comparing the document representations of two IR-systems - CLARIT and TOPIC. Journal of the American Society for Information Science, 44(7), 383-392.
See also: Indexing; Indexing theory
Birger Hjørland
Last edited: 21-07-2006