Indexing is the representation of a document (or a part of a document or an "information object") in a record or in an index for the purpose of retrieval. Common forms of indexes appear in library catalogs, bibliographical databases and back-of-the-book indexes.


Wellish (1995, p. 199-210) discusses the word “index”, its history and meanings. Strangely enough, however, he does not discuss the special meaning of index in library of information science (LIS), neither the meaning of an index as a type of document, nor the process of indexing as differentiated from the classification or description of documents. Typically, the verb to index is used in LIS about the process of assigning keywords or descriptors to bibliographical records or to pages in a book. In everyday understanding this is differentiated from classification, which is used to assign a classification code to a bibliographical record. However, those two processes are not necessarily principally different. A classification scheme may be different from a list of controlled terms, but in the case of a thesaurus based on a facetted classifications, there is no principal difference. The most important difference in knowledge organization is not between indexing and classification, but between systems based on controlled vocabularies and non-controlled vocabularies. In principle is an act of classification thus also an act of indexing and vice versa.


The representation may identify the originators of the document, its publisher, its physical properties, its subjects etc. Often is a distinction made between descriptive cataloging/indexing on the one hand and subject indexing on the other hand. "Descriptive" indexing emphasis physical properties, originator, publisher, time and place of publication etc., whereas subject indexing emphasizes the identification of the "subject" of the document.


Different parts of the document may be used by the indexer, e.g. the title, the references or the full-text. It is widely recognized that quality indexing depends on autopsy. Different techniques may be used, e.g. human, intellectual analysis or computer based statistical analyses of word frequencies. The subject indexing process consist of subject analysis followed by a “translation” of the subjects to the special system applied. The indexing terms (or other symbols such as classification codes) may be derived from the indexed documents or may be assigned by the indexer (or be both derived and assigned). The indexing terms or symbols used to express the subject may be a controlled vocabulary or free expressions. The controlled vocabulary may be, for example, a list of controlled terms or subject headings, a classification scheme or a thesaurus.



"To index a book, you need to perform these basic steps:


  1. Atomize: remove all punctuation, capital letters, apostrophized endings, etc. and put each word in the book on a separate line.

  2. Unique: remove all duplicate words.

  3. Sort: sort the resulting list of words.

  4. Boring: remove "boring" part of speech like "and", "the", "but", etc.

  5. Page: assign page numbers to the remaining words of interest".



 (Pountain, 1987, quoted from Wellish, 1995, p. 217).

  1. Indexing is a creative process.  

  2. Indexers should work from a holistic understanding of the document as a whole towards an indexing of the important "indexable matters". 

  3. Problems of syntactic and semantic ambiguity, synonyms, homographs, variant word forms etc. are very important and not dealt with by the simplistic view. Make sure that the indexing provides access to most important matters by using well-known terms for concepts and providing cross-references by important synonyms. 

  4. Important stuff may be only implicitly mentioned and should be made visible by the index.  

  5. Do not index matters mentioned if no important information is given about that matter.

  6. Seemingly "unimportant" words may be crucial for the indication of relationships.

(Based on  Wellish, 1995, p. 218).


To index a document is not first and foremost to apply an abstract theory of indexing but to take care of a large number of practical matters such as different standards of alphabetical arrangements, how to deal with initial articles, the initial Mac, how to cope with elements of different languages, what equipment and software to use, etc (see Wellish, 1995, table of contents, see also the entry on "bad indexes"). In advanced indexing, such as in Medline, detailed subject knowledge is required in addition to knowledge of indexing practice. Indexing theory cannot replace subject knowledge just as theoretical pedagogic cannot replace subject knowledge in teaching. There are specific literatures on indexing in specific domains (e.g., Kendrick & Zafran, 2001).


Ward (1996) considers the principles of indexing and the intellectual skills involved in order to determine what automatic indexing systems would require in order to supplant or complement the human indexer. Argues that good indexing, especially in technical fields, requires: considerable prior knowledge of the literature; judgment as to what to index and to what depth; reading skills; abstracting, cataloguing and classification skills. Illustrates these features with a detailed description of the abstracting and indexing processes involved in generating entries for the mechanical engineering database, POWERLINK. Briefly assesses the possibility of replacing human indexers with specialist indexing software with particular reference to the Object Analyzer from the InTEXT automatic indexing system and applying the criteria described for human indexers.





See also: Indexing theory.



Birger Hjørland

Last edited: 19-05-2007




Indeks kommer fra det latinske ord index, som betyder "angiver" eller "pegefin­ger". I informationsvidenskaben betegner indexering som regel en form for "IR-sprog", der ofte sættes som modsætning til *klassifi­kation, idet indekseringen typisk (men ikke nødvendigvis) består af verbale indekseringstermer ordnet alfabetisk. (Begrebet anvendes dog også som overbegreb for såvel klassifi­kation som verbal emne­indexering (eller omvendt)).

Det, der indekseres kan være bøger ("back-in-the-book-indexing"), tidsskrifter og *dokumenter iøvrigt. Den vigtigste teori knytter sig til indeksering af bibliografiske databaser (som f.eks. den Medicinske MEDLINE) under anvendelse af indekseringssprog som f.eks. *tesauri.

Der findes mange former for indexering: Kontrolleret versus ikke-kontrolleret indexeringsvokabular, verbal versus kodet eller nummerisk indexering, citations­inde­xering, *prækoordinativ versus *post-koordinativ indexering, indexering baseret på "extraction"/ "derived indexing"; *SAP-indeksering etc.

En commensense teori indenfor indeksering er "the strategy of unlimited aliasing", d.v.s. hvis der knytter sig usikkerhed til at finde de rigtige deskrip­torer/emneord, da vil det være frugtbart at hver dokument­re­præ­sen­tation (bibliografisk *post) indeholder termer foreslået af mange for­skel­lige indeksører. Altså den tanke, at man kan komme uden om det kvalitative pro­blem med hvilke emneord et dokument skal beskrives med, ved en kvan­ti­ta­tiv metode: bruge så mange emneord som muligt. Denne strategi kan modbevises så­vel teoretisk som empirisk (F.eks. Brooks, 1993). Se også *Polyrepræsentation.




Birger Hjørland

Last edited: 19-05-2007