Indexing theory

Indexing is depending both on the document to be indexed and on the indexer performing the process under specific conditions in a specific environment. Different documents are of course indexed differently by the same indexer. If they were not the index would be non-discriminative and total useless. Any theory of indexing has to deal with this fact and thus with how document attributes or properties should influence its representation.

 

The same document may be indexed differently by different indexers or by the same indexer at different times or by different indexing systems or in different libraries, for different target groups or for different ideal purposes. (See Consistency in Knowledge Organization; Request oriented indexing).

 

The indexing is close to the document if it is constructed by a set of terms selected mechanically from the document (e.g. from titles, references or full-text). This is the objective pole because the document is the object of the indexing process. Also the rhetorical view of indexing (Andersen 2004) is close to the objective pole emphasizing what the author of the document is arguing.

 

The subjective pole of indexing theory emphases that the same document may be seen differently by different people or systems and that the indexing should not aim at a purely objective representation but should also consider, for example, the collection to which the document belongs or the tasks for which the indexing is made. Automatic indexing usually represent the terms of a document relative to the terms frequency in a collection of documents. In this way is the representation not just a function of the document itself, but also a function of a collection. Another example is that the same book may be indexed differently for library for gender studies compared to a library of historical studies. Still, the indexing has to be loyal to the document being indexed, but different aspects of the document may be emphasized and the subject may be expressed in different controlled vocabularies constructed to support either collection.

 

The importance of indexing documents specific to a specific discipline, task or point of view may be illustrated by an example from the Royal Library in Copenhagen. First, the practice in this library is that a given book is circulated to different subject bibliographers. Each subject bibliographer then make a decision whether the book is relevant to his or her discipline or not. If it is relevant it is then indexed within that discipline. In this way a given document may be indexed from multiple points of view in the same catalog. Second, a staff member, Nynne Koch, began about 1972 to collect printed catalog cards which she regarded important to a new field, which she defined and termed "feminology". This initiative later developed and became an important independent library and research center "KVINFO". The important point in relation to indexing theory is that this new library was not started by a special collection of books, but by a new way of indexing books belonging to other disciplines. This example demonstrates the importance of the subjectivity of indexing: to regard the indexing in relation to the aim of the indexing system.

 

Indexing should not, of course, aim at an idiosyncratic understanding of the individual indexer. It is not his or her special interests or points of view, which should be emphasized. An indexer work in order to accomplish a goal which is implicit or explicit in a given library or information system. It is this goal, not the individual indexers goal which should form the basis for the indexing. This insight has led to an ideal of inter-indexer consistency. However, as pointed out by Cooper (1969), indexing may be consistently wrong, why studies of inter-indexer consistency may not necessarily provide a basis for indexing quality.

 

The following quote demonstrates how difficult indexing often is:

 

”Anybody who has ever tried to index a psychoanalytic article or book knows how difficult it is to find the terms that accurately answer to our abstract vocabulary. And anybody who has tried to trace the definition or description of a psychoanalytic term or concept does not need to be told how difficult that can be. These problems led the Indexing Study Group of the American Psychoanalytic Association to an experiment. About a dozen seasoned analysts independently indexed a passage from the Standard Edition [of Sigmund Freud]. When they compared what they had done, all agreed that the failure to agree about which terms to index was humbling and impressive. The group did not even agree on which words required see or see also directives, or on the words that should follow those directives. ” (Klumpner, 1993, p. 1)

 

While we know that indexers often disagree, we know very little about why they disagree and whether a discussion between them could provide some kind of consensus (or at least some kind of systematic patterns in their disagreements). We have many quantitative studies measuring degrees of disagreement, but we have almost none qualitative studies discussing the nature of disagreements. O'Connor (1967, 1969) demonstrated how relevance disagreements could be resolved by discussion with a colleague. This might also be the case with disagreements in indexing: we simply lack studies of this kind to inform us. Probably are systematic patterns in indexer-disagreements among competent indexers mostly related to different theoretical understandings. This is indirectly confirmed by citation-studies (cf. Hjørland, 2002). Concerning indexing done by people without proper subject knowledge the problem may be that indexers make too broad descriptions why users are overloaded with references without being able to make the necessary discriminations.

 

It is difficult in the literature to find comprehensive overviews and discussions of indexing theories. Andersen (2004) should be praised for providing a broad overview of these, which are presented and discussed in chapter 7 in his dissertation. He use the following systematization of the theories:

 

7.3.1 The aboutness concept

Authors discussed: Fairthorne (1969) Maron (1977), Swift, Winn & Bramer (1977) Hutchins (1978)

7.3.2 The concept of subject and subject analysis

Authors discussed: Wilson (1968) Hjørland (1992, 1997) Langridge (1989) Fugmann (1993)

7.3.3. Request, user and cognitive-oriented indexing

Authors discussed: Soergel (1985) Fidel (1994) Pejtersen (1979, 1980, 1994) Pejtersen & Austin (1983, 1984) Farrow (1991, 1994, 1995)

7.3.4. Meaning, language and interpretation [and epistemology, cf., p. 153] [rhetorical view of indexing]

Authors discussed: Blair (1990, 1992, 2003) Frohmann (1990) Andersen & Christensen (2001) Campbell (2000b) Mai (2001) Blair & Kimbrough (2002)

7.3.5. Techniques of indexing

[Automatic indexing]. Authors discussed: Salton (1971) Salton & McGill (1983)

Pre-coordinate versus post-coordinate indexing. Authors discussed: None

Latent semantic indexing. Authors discussed: Deerwester et al. (1990) Letsche & Berry (1997)

Citation indexing. Authors discussed: Garfield (1979) Small (1978) Cozzens (1989) Nicolaisen (2003)

 

Although it is praiseworthy that he provides such a comprehensive overview of indexing theories, I do not find his classification of indexing theories fruitful. "The aboutness concept" is not a theory of indexing, neither is "the concept of subject". Any theory of subject indexing has to relate to the concept of subject in one way or another. It may be of minor importance whether it is termed subject or aboutness and whether these two words are regarded as synonyms or not. Different theories of indexing relates to concepts such as aboutness or subject in different ways. Also different theories of indexing may imply different techniques of indexing and may relate differently to theories of meaning, language and interpretation. In my opinion theories of indexing crosses Andersen's categories. In other publications (e.g. Hjørland, 1997) I have proposed a quite different classification of indexing theories based on the theories' epistemological assumptions:

 

Rationalist theories of indexing (such as Ranganathan's theory) suggest that subjects are constructed logically from a fundamental set of categories. The basic method of subject analysis is then "analytic-synthetic", to isolate a set of basic categories (=analysis) and then to construct the subject of any given document by combining those categories according to some rules (=synthesis). Empiricist theories of indexing are based on selecting similar documents based on their properties, in particular by applying numerical statistical techniques.  Historicist and hermeneutical theories of indexing suggest that the subject of a given document is relative to a given discourse or domain, why the indexing should reflect the need of a particular discourse or domain. According to hermeneutics is a document always written and interpreted from particular horizon. The same is the case with systems of knowledge organization and with all users searching such systems. Any question put to such a system is put from a particular horizon. All those horizons may be more or less in consensus or in conflict. To index a document is to try to contribute to the retrieval of “relevant” documents by knowing about those different horizons. Pragmatic and critical theories of indexing (such as Hjørland, 1997) is in agreement with the historicist point of view that subjects are relative to specific discourses but emphasizes that subject analysis should support given goals and values and should consider the consequences of indexing one way or another. These theories believe that indexing cannot be neutral and that it is a wrong goal to try to index in a neutral way. Indexing is an act (and computer based indexing is acting according to the programmers intentions). Acts serve human goals. Libraries and information services also serve human goals, why their indexing should be done in a way that supports these goals as much as possible. At a first glance this looks strange because the goals of libraries and information services is to identify any document or piece of information. Nonetheless is any specific way of indexing always supporting some kind of uses at the expense of other. The documents to be indexed intend to serve some specific purposes in a community. Basically the indexing should intend serving the same purposes. Primary and secondary documents and information services are parts of the same overall social system. In such a system different theories, epistemologies, worldviews etc may be at play and users need to be able to orient themselves and to navigate among those different views. This calls for a mapping of the different epistemologies in the field and classification of the single document into such a map. Excellent examples of such different paradigms and their consequences for indexing and classification systems are provided in the domain of art by Ørom (2003) and in music by Abrahamsen (2003).

 

The core of indexing is, as stated by Rowley & Farrow to evaluate a papers contribution to knowledge and index it accordingly. Or, with the words of Hjørland (1992, 1997) to index its informative potentials.

 

"In order to achieve good consistent indexing, the indexer must have a thorough appreciation of the structure of the subject  and the nature of the contribution that the document is making to the advancement of knowledge." (Rowley & Farrow, 2000, p. 99).



But again, there may be different views of what a contribution to knowledge is, and in what way a given document contributes (or do not contribute).

 

 

 

Literature:

 

Abrahamsen, K. T. (2003). Indexing of Musical Genres. An Epistemological Perspective. Knowledge Organization, 30(3/4), 144-169.

 

Andersen, J. (2004). Analyzing the role of knowledge organization in scholarly communication: An inquiry into the intellectual foundation of knowledge organization. PhD dissertation. Copenhagen: Department of Information Studies, Royal School of Library and Information Science, 2004. Available: http://www.db.dk/dbi/samling/phd/jackandersen-phd.pdf  (Visited May 10, 2004).

 

Andersen, J., & Christensen, F. S. (2001). Wittgenstein and indexing theory. In H. Albrechtsen & J.-E. Mai (Eds.): Advances in classification research. Proceedings of the 10th ASIS SIG/CR classification research workshop (vol. 10, pp. 1-21). Medford, NJ: Information Today.

 

Blair, D. C. (1990). Language and representation in information retrieval. Amsterdam: Elsevier.

 

Blair, D. C. & Kimbrough, S. O. (2002). Exemplary documents: a foundation for information retrieval design. Information Processing and Management, 38(3), 363-379.

 

Campbell, G. (2000a). Aboutness and Meaning: How a Paradigm of Subject Analysis Can Illuminate Queer Theory in Literary Studies. IN: CAIS 2000. Canadian Association for  Information Science: Proceedings of the 28th Annual Conference.  http://www.slis.ualberta.ca/cais2000/campbell.htm

 

Campbell, G. (2000b). Queer theory and the creation of contextual subject access tools for gay and lesbian communities. Knowledge Organization, 27(3), 122-131.

 

Cooper, W. S. (1969). Is interindexer consistency a hobgoblin? American Documentation, 20, 268-278.

 

Cooper, W. S. (1979). Utility-Theoretic Indexing: A Note on Wilson's Note. Journal of the American Society for Information Science, 30(3), 170-172.

 

Cozzens, S. E. (1989). What do citations count? The rhetoric-first model. Scientometrics, 15, 437-447.

 

Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K. & Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 416(6), 391-407.

 

Fairthorne, R. A. (1969). Content analysis, specification and control. Annual Review of Information Science and Technology, 4, 73­109.

 

Farrow, J. D. (1991). A cognitive process model of document indexing. Journal of Documentation, 47(2), 149-166.

 

Farrow, J. D. (1994). Indexing as a cognitive process. IN: Encyclopedia of Library and Information Science, vol. 53, supp. 16, 155-171.

 

Farrow, J. D. (1995). All in the mind: Concept analysis in indexing. The Indexer, 19(4), 243-247.

 

Fidel, R. (1994). User-centered indexing. Journal of the American Society for Information Science, 45(8), 572-576.

 

Frohmann, B. (1990). Rules of Indexing: A Critique of Mentalism in Information Retrieval Theory. Journal of Documentation, 46(2), 81-101.

 

Fugmann, R. (1973). Role of subjectivity in establishing, using, operating and evaluating information retrieval systems.2. Retrieval systems theory. Information Storage and Retrieval, 9(7), 353-372.

 

Fugmann, R. (1979). Toward a theory of information supply and indexing. International Classification, 1979, 6(1), 3-15.

 

Fugmann, R. (1980). On the practice of indexing and its theoretical foundations. International Classification, 7(1), 13-20.

 

Fugmann, R. (1993). Subject analysis and indexing : theoretical foundation and practical advice. Frankfurt a.M. : Indeks Verlag.

 

Fugmann, R. (1994). Representational predictability: key to the resolution of several pending issues in indexing and information supply (p.414-422). IN: Albrechtsen, H.& Ørnager, S. (Eds.). Proceedings of 3rd International Conference of the International Society for Knowledge Organization 20-24 June 1994 in Copenhagen, Denmark. Frankfurt, Germany: INDEKS Verlag.

 

Garfield, E. (1979). Citation Indexing and its theory and application in science, technology, and humanities. New York: John Wiley & Sons, Inc.

 

Hjørland, B. (1992). The concept of "subject" in Information Science. Journal of Documentation, 48(2), 172-200.

 

Hjørland, B. (1997): Information Seeking and Subject Representation. An Activity-theoretical approach to Information Science. Westport & London: Greenwood Press.

 

Hjørland, B. (2002), Epistemology and the Socio-Cognitive Perspective in Information Science. Journal of the American Society for Information Science and Technology, 53(4), 257-270

 

Hjørland, B. & Kyllesbech Nielsen, L. (2001). Subject Access Points in Electronic Retrieval. Annual Review of Information Science and technology, vol. 35, 3-51.

 

Hutchins, W. J. (1977). On the problem of “aboutness” in document analysis. Journal of Informatics, 1, 17-35.

 

Hutchins, W. J. (1978). The concept of “aboutness” in subject indexing. Aslib Proceedings, 30, 172-181.

 

Jones, K. P. (1976). Towards a Theory of Indexing. Journal of Documentation, 32, 2, 118-123.

 

Kaiser, J. O. (1911). Systematic Indexing. London: Pitman.

 

Klumpner, G. H. (1992). A guide to the language of psychoanalysis: An empirical study of the relationships among psychoanalytic terms and concepts. Madison,CT: International Universities Press.

 

Langridge, D. W. (1989). Subject analysis: principles and procedures. London: Bowker-Saur.

 

Letsche, T., & Berry, M., (1997), Large-scale Information Retrieval with Latent Semantic Indexing, Information Sciences, 100(1), 105-137.

 

Lussky, J. P. (2004). Bibliometric patterns in an historical medical index: using the newly digitized Index Catalogue of the Library of the Surgeon General's Office, United States Army. Thesis, Drexel University. Available (full text): http://dspace.library.drexel.edu/retrieve/3815/Lussky_Joan.pdf

 

Mai, J.-E. (2001) Semiotics and Indexing: An Analysis of the Subject Indexing Process. Journal of Documentation, 57 (5): 591-622.

 

Mai, J.-E. (2005). Analysis in indexing: Document and domain centered approaches. Information Processing and Management, 41 (3): 599-611. Available at: http://www.ischool.washington.edu/mai/Papers/2005_AnalysisInIndexing.pdf

(Visited 2005-05-29).

 

Maron, M. E. (1977). On indexing, retrieval and the meaning of about. Journal of the American Society for Information Science, 28, 38-43.

 

Nicolaisen, J. (2003). The social act of citing: towards new horizons in citation theory. Proceedings of the 66th ASIST Annual Meeting, pp. 12-20.

 

O'Connor, J. (1967). Relevance disagreements and unclear request forms. American Documentation, 18( 3) , 165-177


O'Connor, J. (1969). Some independent agreements and resolved disagreements about answer- providing documents. American Documentation, 20(4), 311-319.

 

Pejtersen, A: M: (1979): The meaning of aboutness in fiction indexing and retrieval. Aslib Proceedings. Vol. 31, p. 251-257.

 

Pejtersen, A.M. (1980). Design of a classification scheme for fiction based on an analysis of actual user-librarian communication, and use of the scheme for control of librarians' search strategies. In O. Harbo & L. Kajber (Ed.), Theory and application of information research (pp. 146-159). London: Mansell.

 

Pejtersen, A. M. (1994). A framework for indexing and representation of information based on work domain analysis: A fiction classification example. In: Knowledge organization and quality management. Proceedings. 3. International ISKO conference, Copenhagen (DK), 20-24 Jun 1994. Albrechtsen, H.; Ørnager, S. (eds.), (Indeks Verlag, Frankfurt/Main, 1994) (Advances in Knowledge Organization, 4) p. 251-263.

 

Pejtersen, A.M., & Austin, J. (1983). Fiction retrieval: experimental design and evaluation of a search system based on users' value criteria. Part 1. Journal of Documentation, 39(4), 230-246.

 

Pejtersen, A.M., & Austin, J. (1984). Fiction retrieval: experimental design and evaluation of a search system based on users' value criteria. Part 2. Journal of Documentation, 40(1), 25-35.

 

Quinn, B. (1994). Recent theoretical approaches in classification and indexing. Knowledge Organization, 21(3), 140-147. Abstract: a selective review of recent studies in classification and indexing theory. A number of important problems are discussed, including subjectivity versus objectivity, theories of indexing, the theoretical role of automation, and theoretical approaches to a universal classification scheme.

 

Rafferty, P. & Hidderly, R. (2005). Indexing Multimedia and Creative Works: The Problems of Meaning and Interpretation. Aldershot, UK: Ashgate.


Rowley, J. E. & Farrow, J. (2000). Organizing Knowledge: An Introduction to Managing Access to Information. 3rd. Alderstot: Gower Publishing Company
 

Salton, G. (Ed.). (1971). The SMART Retrieval System-Experiments in Automatic Document Retrieval. Englewood Cliffs, NJ: Prentice Hall, Inc.

 

Salton, G & McGill, J. M. (1983). Introduction to modern information retrieval. New York: McGraw-Hill.

 

Small, H. G. (1978). Cited documents as concept symbols. Social studies of science, 8, 327-340.

 

Soergel, D. (1985). Organizing information: Principles of data base and retrieval systems. London: Academic Press.

 

Swift, D. F., Winn, V., & Bramer, D. (1977). A multi-modal approach to indexing and classification. International Classification, 4(2), 90-94.

 

Swift, D. F., Winn, V., & Bramer, D. (1978). “Aboutness” as a strategy for retrieval in the social sciences. Aslib Proceedings, 30, 182-­187.

 

Warner, J. (2002) Forms of labour in information systems.  Information Research 7(4), Available at:

http://informationr.net/ir/7-4/paper135.html

 

Weinberg, B H. (1988). Why indexing fails the researcher. Indexer, 16(1), 3-6. Available at: http://people.unt.edu/~skh0001/wein1.htm

 

Wilson, P. (1968). Two kinds of power: An essay on bibliographical control. Berkeley: University of California Press.

 

Ørom, A. (2003). Knowledge Organization in the domain of Art Studies - History, Transition and Conceptual Changes. Knowledge Organization, 30(3/4), 128-143.

 

 

See also: Check-list approach to indexing; Indexing; Indexing, qualitative studies of;  Request oriented indexing; Subject analysis; User and User Studies in KO

 

 

 

 

 

Birger Hjørland

Last edited: 13-08-2010

HOME