Indexing, Qualitative studies of

If indexing theory is an important part of knowledge organization (KO), then it is important to consider good and bad indexing, indexing from different theoretical positions and other qualitative issues. Unfortunately, the literature on this issue is very limited in spite of the enormous amount of technical literature on KO. 








Andersen, J. (2004). Analyzing the role of knowledge organization in scholarly communication: An inquiry into the intellectual foundation of knowledge organization. PhD dissertation. Copenhagen: Department of Information Studies, Royal School of Library and Information Science, 2004. Available:  (Visited May 10, 2004).


Barber, J.; Moffat, S.; Wood, F. & Bawden, D. (1988). Case studies of the indexing and retrieval of pharmacology papers. Information Processing & Management, 24(2), 141-150.

See also Errors in Knowledge Organization





Birger Hjørland

Last edited: 20-09-2006







Tinne Vammen. Rent og urent. Hovedstadens piger og fruer 1880-1920. Kbh. Gyldendal, 1986.


The author is a Danish (female) historian. The title translate: Pure and impure. The maids and mistresses of the capital [Copenhagen], 1880-1920.


The author has written a book about the house in Copenhagen where she herself is living. It goes back to around the turn of the century. It focuses on the social-psychological relations between the women in the house: the mistresses and their servant girls.


The book covers a lot of different topics. It is about a very specific geographic place and a very specific historical period. It is about the relationship between two different social classes (bourgeois mistresses and working class girls). It represents an attempt to develop "mental history" (history about mentality). We have a lot of information about the girls, where they came from, what the archives tell us about their relationship with the police, with their union, and much more.


"Pure and impure" is used symbolically. It mean that the girls had to clean the laundry. It also has a certain sexual symbolism: the girls typically came to Copenhagen from the countryside. Some of them ended up in prostitution.


What are the subjects of this work?


This book has distinctively different user groups. It is a rather obvious acquisition for a public library in Copenhagen as a contribution to local history. (It could also serve as an example of local history in the specific place in the country­side, from which the girls typically emigrated). So, one subject is:


    1) Local history, social aspects. Copenhagen, 1880-1920.


The book is a typical product of the movement "Women's studies", the attempt to illuminate women's role in society. Therefore, the book is clearly an example of:


    2) Women's history. Copenhagen, 1880-1920


It is also an example of:


    3) Social history. Working class women. Copenhagen 1880-1920


It could serve sociologists and other social scientists interested in:


    4) Class-relationships. Copenhagen 1880-1920


It also represents an attempt to go beyond "material" history into the mentality of people. Therefore:


    5) Psychohistory / The history of mentality.


    6) Family studies, sex and prostitution, historical material. Copenhagen 1880-1920


are also the subjects of this book.


A library serving the relevant police or union could use this book to inform its readers about the relationship with the people.


    The subject is thus also:


    7) Police-population relationships. Prostitutes. Copenhagen. 1880-1920.


    8) Union for housemaids, early history, aspects of.


The final subject to be mentioned here applies to the symbolism of the title. The author or publisher might have tried to sell copies of the book because of is sexual symbolism? In this case, the implied subject could be related to


     9) Pornography (As such it would probably be a disappointment to most readers).


    10) Books of general interest, easy readers



What then should be chosen as the subject of the book?


Well, the answer is that different epistemic communities have different needs: that an information service should analyze the subject according to the potential questions of its primary user group. A public library in Copenhagen or an information service for the Unions, should analyze according to their specific "epistemic interests" as indicated above.


A general library like The Royal Library or Library of Congress, should not mention all these specific subjects, but should estimate its most probable long-term utility. If the book represents an important methodological break-through in psycho-history or brings information about women's history, which could serve as important sources for further study, this book should be classified as psycho-history and women's history, respectively. Otherwise, it should only be placed as local history, Copenhagen, 1880-1920 (This last proposal would be my suggestion).


If many libraries' different subject descriptions of this book are merged in one database (a union catalog) this book would be visible from many different epistemic interests. This would be an ideal situation. 




Let us consider the book by Robert A. Wicklund with the title: "Zero-Variable Theories and the Psycholo­gy of the Explainer" (Wicklund, 1990)?


According to the title of the book, it is about certain kinds of theories ("Zero-Variable Theories" and about "the psychology of the explainer", the latter subject is related to "the psychology of science".


If you read the book, you will see that "zero-variable theories" are not favorably evaluated, they are described as a kind of simplistic theory and the book tries to explain why this kind of theory is produced so frequently in modern psycholo­gy. Why do so many psychologists (or why do so many explainers in general) tend to use this kind of simplistic theory instead of more varied theories?


In the preface to the book in question, the following appears: "The reader should not suppose that this is a book about the philosophy of social science, or about moral pronoun­cement on what is good or bad in ancient and current psychologi­cal theorizing. Instead, the reader is invited to consider the psychologi­cal side of the explainer."


The Library of Congress' (LC) has made an analysis. LC writes in its "Catalo­ging-in-Publica­tion Data" the following subject terms: "1. Psycholo­gy - Philosophy. 2. Psycholo­gists-Psychology. 3. Explanation".


This means that LC, in its first selection of subject terms, is disposed not to follow Wicklund's statement in the preface, while the two following subject statements can be said to be in accordan­ce with the self-understanding of the book, this applies especially to the last subject expression.


My own subject analysis is the following:

The book is important, because it deals with a neglec­ted issue in psychological research, or psychology as a scien­ce: The apparent decay in the theoretical level in psycholo­gy. This condition is illustrated by a num­ber of concrete analyses of psycholo­gi­cal theories, which in the succee­ding psychologi­cal research has become substantially reduced. One such example is the almost classical theory of personality by H. A. Murray from 1938.


In my opinion, the most essential thing about Wicklund's book is in particu­lar the concrete documentation of the apparent decline in psychological theory. There are many books about the philosophy and methodology of psycholo­gy, giving direction to the science of psychology, but there are relatively few books documenting the apparent decline in theory. It seems as if psycholo­gy does not exploit the best of its own theory and knowled­ge from philosophy and other sciences. How can this be explained?


Wicklund sees the documenta­tion of the theoretical decline as something less important in his book. His main interest is to use this material to give an explana­tion not only about the condition of psychology, but about the psycho­logy of explainers in general. The material which I consider having the most potential value, is for the author of the book only a minor thing.


This means there is a marked difference between the author's (Wicklund's) and my own judgment about what the poten­tial value of the book is, what its epistemological potential is. And therefore what its subject is. This book has - as any book - an unlimited number of properties. To analyze a book's subject is to choose the properties which have the greatest potenti­als for human knowledge. Therefore my subject analysis is different from that of the author as indicated by the title and the quoted sentences from the preface.


The reason that Wicklund's and my own analysis of the central subject of the book differ so much lies in my professional evaluation of Wicklund's ex­planation, which I will charac­terize as being too individualistic: Wicklund seeks explanation of the decline in psychological theory in psycholo­gical mechanisms in the persons producing those theories. 


Certainly Wicklund, in connection with his explanation, writes about inter­esting and relevant psychological phenomena (such as rumors and competi­tion) which should be a part of the pattern of explanation, but in my opinion, a broader cultural and social description is needed as a back­ground for the under­standing of these mechanisms.


In my opinion the documented examples of decline in psychologi­cal theory can in part be traced to the market for psychologi­cal books (and the market for psychologists!). In a long period after World War II, the market for psychologi­cal books was "seller's market", and it was all too easy to sell even very poorly written psychology books (and to do poor research). This phenomenon is described in an article by Jürgen Kagelmann, psychological consultant for "Psychologie Verlags Union", München in the magazine "Psychologie Heute" October 1988. Kagelmann's main point is that the far too easy sales possibilities in the 1970's made an overwhelming production of psychological books of a very doubtful quality. All that could be printed between to covers was thrown on the market, and the market was insatiable. This is an example of a non-individua­listic explanation, which in my opinion comes closer to the truth than Wicklund's explanation, even if this is not a full explanation.


Therefore, in my opinion, Wicklund has a tendency to individua­lize and psychologize a social problem, and his book contains in a way a contradic­tion. Wicklund acts in this book also in the role of "explainer", and he too has a tendency toward a very simplistic, positivistic theory, which the book is ac­tually meant to fight against.


The epistemological potential of Wicklund's book lies, in my opinion, especi­al­ly in its documenta­tion of certain conditions in psychological science which it is important to set right. Therefore the subject of the book is the epi­stemology of psychology, methodology, theory of science and philosophy. In my opinion, LC was right in its first selection of subject terms (Psycholo­gy - philosophy), which, as mentioned, was in contradiction to Wicklund's statement in the preface.


I would not consider "Zero-Variable theories" the subject of the book. It is hardly a concept with a future, not even as an explanation of the decline in theorizing. It is an open qu­estion, whether what has been called "variable psychology" (Holzkamp, 1983, p. 522), is a valuable concept.


As regards the proposed subject "psychology of the explainer" it is for me a theoretical question whether it is meaningful to search for such a theory and - even supposing it is - whether Wicklund's approach is a contribution to such a theory. This should be evaluated in relation to research going on in "decision theory", in philosophical theories about "explanation" and other fields, and that is not what Wicklund's book is about. My con­clusion is that I tend to doubt the value of the proposed subject "psychology of the explainer". This doubt also in­cludes LC's subject term "Explanation". Wicklund's book is hardly a contri­bution to the concept of explanation in general.


The last proposed subject which I want to discuss is "psycholo­gy of psycho­logists" (LC: "Psycho­logists-Psychology"). Such a subject does exist, and books are written about it. They can describe e.g. the recruitment of psycho­logist, the motivation for choosing the profession, biographical matters, the pro­fessional socialization and many other things. Wicklund's book is in my opinion not of this kind.


In my judgment - as already told - the subject of Wicklund's book is "Philosophy and epi­stemology of psychology". My jud­gment is of course subjective, and could be wrong, in general or in part. The only way to decide this is to analyze the arguments. The arguments about the subject of a book are fundamentally the same as arguments about the advancement of knowledge.










Hjørland, B. (1988). Information Retrieval in Psychology. Behavioral and Social Sciences Librarian, Vol 6 (3/4), 1988, 39‑64. 


Hjørland, Birger (1997): Information Seeking and Subject Representation. An Activity-theoretical approach to Information Science. Westport & London: Greenwood Press.


Vammen, T. (1986). Rent og urent: Hovedstadens piger og fruer 1880­1920. København: Gyldendal.


Welwert, C. (1984). Läsa eller lyssna? Redovisning av jämförande under­sökningar gjorda åren 1890­1980 rörande inläring vid auditiv och visuell presentation samt ett försök till utvärdering av resultaten. Malmö, CWK Gleerup.


Wicklund, R. A. (1990). Zero-variable theories and the psychology of the explainer. Berlin: Springer.



Birger Hjørland

Last edited: 20-09-2006




Let me jump into this with a specific example from PubMed that I focused on in an unpublished experiment many years ago:

PMID- 2221937
DA  - 19901115
DCOM- 19901115
LR  - 20051116
PUBM- Print
IS  - 0003-987X (Print)
VI  - 126
IP  - 10
DP  - 1990 Oct
TI  - Transfusion-associated graft-vs-host disease in patients with
      malignancies. Report of two cases and review of the literature.
PG  - 1324-9
AB  - Graft-vs-host disease can develop in immunosuppressed individuals who
      receive blood-product transfusions that contain immunocompetent
      lymphocytes. We report two cases of fatal transfusion-associated
      graft-vs-host disease that developed in patients with Hodgkin's disease
      who were undergoing therapy. We review all cases of this entity in
      patients with malignancies, represented predominantly by patients with
      hematologic malignancies. The groups at risk for development of
      transfusion-associated graft-vs-host disease, the clinical presentation
      and course, and methods of diagnosis are summarized. Prevention of this
      highly fatal condition is possible by irradiation of blood products given
      to patients at risk, but problems remain in determining the groups that
      warrant such measures. Dermatologists need to have heightened awareness of
      this entity to facilitate more complete diagnosis and allow establishment
      of effective standards of care.
AD  - Department of Dermatology, Harvard Medical School, Boston, Mass.
FAU - Decoste, S D
AU  - Decoste SD
FAU - Boudreaux, C
AU  - Boudreaux C
FAU - Dover, J S
AU  - Dover JS
LA  - eng
PT  - Case Reports
PT  - Journal Article
PT  - Review
TA  - Arch Dermatol
JT  - Archives of dermatology.
JID - 0372433
SB  - IM
CIN - Arch Dermatol. 1990 Oct;126(10):1347-50. PMID: 2221941 MH  - Adolescent MH  - Adult MH  - Blood Transfusion/*adverse effects MH  - Female MH  - Graft vs Host Disease/*etiology/pathology MH  - Hodgkin Disease/*immunology MH  - Humans MH  - Immune Tolerance MH  - Male MH  - Skin Diseases/etiology/pathology RF  - 50
EDAT- 1990/10/01
MHDA- 1990/10/01 00:01
PST - ppublish
SO  - Arch Dermatol. 1990 Oct;126(10):1324-9.

My technique was to index this article ridiculously exhaustively and give this indexing to indexers and searchers and have them cross out the terms they thought shouldn't apply.  (Naturally, indexers crossed out many more terms than searchers.)

Anyway, let's talk about the neoplasm concept.  At that time MeSH had:
Hematologic Diseases
Hodgkin's Disease

Using today's MeSH, the terms would be:
Hematologic Neoplasms
Hodgkin Disease

Let's talk in terms of today's MeSH.

Note that the title says "patients with malignancies"
Note the abstract has:
patients with Hodgkin's disease
because the two cases had this disease
but it also has:
patients with hematologic malignancies as this represents predominantly the patients.

So is this article about neoplasms, hematologic malignancies, or Hodgkin disease?
If you apply the specificity principle, the original indexer was correct in that the data pertained only to two cases with HD.
However, does this represent the gist of the article?

This illustrates that the retrieval system also has something to do with choosing the correct level of specificity.  The fact that PubMed retrieval does automatic explosion means that if a searcher enters the search term:


this citation will be retrieved because the search automatically explodes the term (searches the union of the term and it's indentions), and Hodgkin Disease is in the Neoplasms tree.

Thus, indexing with the most specific term is a good thing because searching the broader term Neoplasms will retrieve this citation as well, and you don't need to cover multiple levels of specificity because of this.

However, if the retrieval system doesn't do this, then searching the broader Neoplasms would miss this citation.

Hematologic Neoplasms is another story, however, because Hodgkin Disease is not in the Hematologic Neoplasms hierarchy.  Thus searching Hematologic Neoplasms will not retrieve this citation.  I would say that this citation is definitely relevant for Hematologic Neoplasms, but that indexing term is not there.

In my experiment with 7 indexers, all did not cross out Hodgkin's Disease, but
5 did not cross out Neoplasms, and 2 did not cross out Hematologic Diseases (remember Hematologic Neoplasms was not a term then).  Actually, because the indexing of hematologic malignancies should be Neoplasms + Hematologic Diseases, it seems fair to say that two of the non-crossing out of Neoplasms go with non-crossing out of Hematologic Diseases.  So in summary,

7 used Hodgkin Disease
2 used Hematologic Diseases (as coord with Neoplasms)
5 used Neoplasms (2 as coord with Hematologic Diseases)

Indexers #1 & 3 used Hodgkin's Disease
Indexers #2 & 7 used Hodgkin's Disease + Hematologic Diseases + Neoplasms indexers #4-6 used Hodgkin's Disease + Neoplasms

but interpreting through the current MeSH:

7 used Hodgkin Disease
2 used Hematologic Neoplasms
3 used Neoplasms

Specifically, indexers #1 & 3 used Hodgkin Disease Indexers #2 & 7 used Hodgkin Disease + Hematologic Neoplasms indexers #4-6 used Hodgkin Disease + Neoplasms

So five of seven indexers assigned (i.e., did not cross out) not only Hodgkin's Disease but also at least one of the broader terms despite the specificity rule of indexing.  Two went one level broader to the hematologic malignancies, and three went two levels broader to just malignancies.

This work was done in 1991, and frankly I don't know the state of the retrieval system at that time as to whether there was automatic explode or not.  But in any case Hodgkin's Disease was not in the Hematologic Diseases hierarchy either, but at least the indexing of "hematologic malignancies" required BOTH Hematologic Diseases AND Neoplasms, so under automatic pre-explode, the Neoplasms part would retrieve this article.  Today, it's worse because as I said above Hematologic Neoplasms hierarchy does not include Hodgkin Disease.

Also, there was the matter of printed Index Medicus (which no longer exists).
According to the original indexing, this citation appeared in IM only under Hodgkin's Disease.  This means a person looking in the printed index under Hematologic Diseases or under Neoplasms would not find this article, and in my opinion this omission would be quite significant.  That is, in perusing Hematologic Diseases and Neoplasms in print, this article would be quite relevant, but it would not be there, and the print searcher would have to know that such an article was printed ONLY under some more specific Hematologic Diseases or Neoplasms term (of which there are quite a few).

So I guess what I am saying is that sometimes the gist of the article suggests multiple levels of specificity for indexing, and also the retrieval system might compensate when the specificity rule is strictly applied.

I personally would argue for using all three terms in this case.

I would say that this topic has to do with the indexing application more than with the thesaurus, in that all levels of specificity are represented in the thesaurus.  The issue is which levels the indexer selects.

Susanne Humphrey
Sigcr-l mailing list