Indexing
Indexing is the representation of a document (or a part of a document or an "information object") in a record or in an index for the purpose of retrieval. Common forms of indexes appear in library catalogs, bibliographical databases and back-of-the-book indexes.
Wellish (1995, p. 199-210) discusses the word “index”, its history and meanings. Strangely enough, however, he does not discuss the special meaning of index in library of information science (LIS), neither the meaning of an index as a type of document, nor the process of indexing as differentiated from the classification or description of documents. Typically, the verb to index is used in LIS about the process of assigning keywords or descriptors to bibliographical records or to pages in a book. In everyday understanding this is differentiated from classification, which is used to assign a classification code to a bibliographical record. However, those two processes are not necessarily principally different. A classification scheme may be different from a list of controlled terms, but in the case of a thesaurus based on a facetted classifications, there is no principal difference. The most important difference in knowledge organization is not between indexing and classification, but between systems based on controlled vocabularies and non-controlled vocabularies. In principle is an act of classification thus also an act of indexing and vice versa.
The representation may identify the originators of the document, its publisher, its physical properties, its subjects etc. Often is a distinction made between descriptive cataloging/indexing on the one hand and subject indexing on the other hand. "Descriptive" indexing emphasis physical properties, originator, publisher, time and place of publication etc., whereas subject indexing emphasizes the identification of the "subject" of the document.
Different parts of the document may be used by the indexer, e.g. the title, the references or the full-text. It is widely recognized that quality indexing depends on autopsy. Different techniques may be used, e.g. human, intellectual analysis or computer based statistical analyses of word frequencies. The subject indexing process consist of subject analysis followed by a “translation” of the subjects to the special system applied. The indexing terms (or other symbols such as classification codes) may be derived from the indexed documents or may be assigned by the indexer (or be both derived and assigned). The indexing terms or symbols used to express the subject may be a controlled vocabulary or free expressions. The controlled vocabulary may be, for example, a list of controlled terms or subject headings, a classification scheme or a thesaurus.
Simplistic view of the indexing process |
Non-simplistic view of the indexing process |
"To index a book, you need to perform these basic steps:
(Pountain, 1987, quoted from Wellish, 1995, p. 217). |
(Based on Wellish, 1995, p. 218). |
To index a document is not first and foremost to apply an abstract theory of indexing but to take care of a large number of practical matters such as different standards of alphabetical arrangements, how to deal with initial articles, the initial Mac, how to cope with elements of different languages, what equipment and software to use, etc (see Wellish, 1995, table of contents, see also the entry on "bad indexes"). In advanced indexing, such as in Medline, detailed subject knowledge is required in addition to knowledge of indexing practice. Indexing theory cannot replace subject knowledge just as theoretical pedagogic cannot replace subject knowledge in teaching. There are specific literatures on indexing in specific domains (e.g., Kendrick & Zafran, 2001).
Ward (1996) considers the principles of indexing and the intellectual skills involved in order to determine what automatic indexing systems would require in order to supplant or complement the human indexer. Argues that good indexing, especially in technical fields, requires: considerable prior knowledge of the literature; judgment as to what to index and to what depth; reading skills; abstracting, cataloguing and classification skills. Illustrates these features with a detailed description of the abstracting and indexing processes involved in generating entries for the mechanical engineering database, POWERLINK. Briefly assesses the possibility of replacing human indexers with specialist indexing software with particular reference to the Object Analyzer from the InTEXT automatic indexing system and applying the criteria described for human indexers.
Literature:
Kaae, S. (1990). Verbal emneindeksering i BASIS. En håndbog. Ballerup: Bibliotekscentralens forlag.
Kendrick, P. & Zafran, E. L. (Eds.). (2001). Indexing specialties: Law.. Medford, NJ: Information Today, Inc.
Lancaster, F. W. (1991/1998/2003). Indexing and abstracting in theory and practice. London: Library Association. (1st ed. 1991; 2nd ed. 1998; 3rd. ed. 2003).
Sykes, J. (2001). The value of Indexing. A White Paper Prepared for Factiva, a Dow Jones and Reuters Company. http://www.factiva.com/infopro/indexingwhitepaper.pdf
Ward, M. L. (1996). The future of the human indexer. Journal
of Librarianship and Information Science, 28(4), 217-225.
Wellisch, H. H. (1995). Indexing from A to Z. 2nd edition. New York: H. W. Wilson.
Wellisch, H. H. (2000). Glossary of terminology in abstracting, classification, indexing, and thesaurus construction. 2nd. ed. Medford : Information Today, Inc.
Links:
The American Society of Indexers http://www.asindexing.org/site/index.html
See also: Indexing theory.
Birger Hjørland
Last edited: 19-05-2007
to be edited:
Indeks kommer fra det latinske ord index, som betyder "angiver" eller
"pegefinger". I informationsvidenskaben betegner indexering som regel en form
for "IR-sprog", der ofte sættes som modsætning til *klassifikation, idet
indekseringen typisk (men ikke nødvendigvis) består af verbale
indekseringstermer ordnet alfabetisk. (Begrebet anvendes dog også som overbegreb
for såvel klassifikation som verbal emneindexering (eller omvendt)).
Det, der indekseres kan være bøger ("back-in-the-book-indexing"), tidsskrifter
og *dokumenter iøvrigt. Den vigtigste teori knytter sig til indeksering af
bibliografiske databaser (som f.eks. den Medicinske MEDLINE) under anvendelse af
indekseringssprog som f.eks. *tesauri.
Der findes mange former for indexering: Kontrolleret versus ikke-kontrolleret
indexeringsvokabular, verbal versus kodet eller nummerisk indexering,
citationsindexering, *prækoordinativ versus *post-koordinativ indexering,
indexering baseret på "extraction"/ "derived indexing"; *SAP-indeksering etc.
En commensense teori indenfor indeksering er "the strategy of unlimited aliasing",
d.v.s. hvis der knytter sig usikkerhed til at finde de rigtige deskriptorer/emneord,
da vil det være frugtbart at hver dokumentrepræsentation (bibliografisk
*post) indeholder termer foreslået af mange forskellige indeksører. Altså den
tanke, at man kan komme uden om det kvalitative problem med hvilke emneord et
dokument skal beskrives med, ved en kvantitativ metode: bruge så mange
emneord som muligt. Denne strategi kan modbevises såvel teoretisk som empirisk
(F.eks. Brooks, 1993). Se også *Polyrepræsentation.
Literature:
Breton, E. J.
(1991). Indexing for Invention. Journal of the American Society for
Information Science, 42(3), 173-177.
Brooks, T. A.
(1993).All the Right Descriptors: A Test of the Strategy of Unlimited Aliasing.
Journal of the American Society for Information Science, 44(3), 137-147.
Cooper, W. S.
(1969). Is Interindexer Consistency a Hobgoblin? American Documentation, vol.
20(3), 266-278.
Kaae, S. (1990).
Verbal emneindeksering i BASIS. En håndbog. Ballerup:
Bibliotekscentralens Forlag.
Fugmann, R.
(1992). Theoretische Grundlagen der Indexierungspraxis. Frankfurt am Main:
Indeks Verlag.
Lancaster, W. F.
(1979). Information Retrieval Systems: Characteristics, Testing and Evaluation.
2. ed. New York.
Lancaster,
Wilfrid F.: Indexing and Abstracting in Theory and Practice. London: The Library
Association, 1991. 328 sider.
Larson, Ray R.:
The Decline of Subject Searching: Long-Term Trends and Patterns of Index Use in
an Online Catalog. Journal of the American Society for Information Science,
42(3), 1991, side 197-215.
McKinin, Emma
Jean; Sievert, Mary Ellen; Johnson, E. Diana; Mitchell, Joyce A.: The Medline/Full-text
research project. Journal of the American Society for Information Science,
42(4), 1991, 297-307. ["lacked the precision of searches done in the indexed
file"]
Salton, G. (1975). A theory of indexing. Philadelphia: Society for Industrial and Applied Mathematics.
Weinberg, Bella
Hass: Why indexing fails the researcher. The Indexer, vol. 16 (no. 1, april),
1988, 3-6
Wellisch, Hans
H.: Indexing from A to Z. Bronx, New York: The H.W.Wilson Company, 1991. 461
sider.
Wille, Niels
Erik: Læsbarhed og indekseringssprog. Skrifter om Anvendt og Matematisk
Lingvistik, (SAML), No. 6, 1980, side 71-77.
Tidsskrift: The Indexer, 1972-.
Birger Hjørland
Last edited: 19-05-2007