Indexing, Qualitative studies of
If indexing theory is an important part of knowledge organization (KO), then it is important to consider good and bad indexing, indexing from different theoretical positions and other qualitative issues. Unfortunately, the literature on this issue is very limited in spite of the enormous amount of technical literature on KO.
Literature:
Andersen, J. (2004). Analyzing the role of knowledge organization in scholarly communication: An inquiry into the intellectual foundation of knowledge organization. PhD dissertation. Copenhagen: Department of Information Studies, Royal School of Library and Information Science, 2004. Available: http://www.db.dk/dbi/samling/phd/jackandersen-phd.pdf (Visited May 10, 2004).
Barber, J.; Moffat, S.; Wood, F. & Bawden, D. (1988).
Case studies of the indexing and retrieval of pharmacology papers.
Information Processing & Management, 24(2), 141-150.
See also Errors in Knowledge Organization
Birger Hjørland
Last edited: 20-09-2006
Appendix:
EXAMPLE 1
Tinne Vammen. Rent og urent. Hovedstadens piger og fruer 1880-1920. Kbh. Gyldendal, 1986.
The author is a Danish (female) historian. The title translate: Pure and impure. The maids and mistresses of the capital [Copenhagen], 1880-1920.
The author has written a book about the house in Copenhagen where she herself is living. It goes back to around the turn of the century. It focuses on the social-psychological relations between the women in the house: the mistresses and their servant girls.
The book covers a lot of different topics. It is about a very specific geographic place and a very specific historical period. It is about the relationship between two different social classes (bourgeois mistresses and working class girls). It represents an attempt to develop "mental history" (history about mentality). We have a lot of information about the girls, where they came from, what the archives tell us about their relationship with the police, with their union, and much more.
"Pure and impure" is used symbolically. It mean that the girls had to clean the laundry. It also has a certain sexual symbolism: the girls typically came to Copenhagen from the countryside. Some of them ended up in prostitution.
What are the subjects of this work?
This book has distinctively different user groups. It is a rather obvious acquisition for a public library in Copenhagen as a contribution to local history. (It could also serve as an example of local history in the specific place in the countryside, from which the girls typically emigrated). So, one subject is:
1) Local history, social aspects. Copenhagen, 1880-1920.
The book is a typical product of the movement "Women's studies", the attempt to illuminate women's role in society. Therefore, the book is clearly an example of:
2) Women's history. Copenhagen, 1880-1920
It is also an example of:
3) Social history. Working class women. Copenhagen 1880-1920
It could serve sociologists and other social scientists interested in:
4) Class-relationships. Copenhagen 1880-1920
It also represents an attempt to go beyond "material" history into the mentality of people. Therefore:
5) Psychohistory / The history of mentality.
6) Family studies, sex and prostitution, historical material. Copenhagen 1880-1920
are also the subjects of this book.
A library serving the relevant police or union could use this book to inform its readers about the relationship with the people.
The subject is thus also:
7) Police-population relationships. Prostitutes. Copenhagen. 1880-1920.
8) Union for housemaids, early history, aspects of.
The final subject to be mentioned here applies to the symbolism of the title. The author or publisher might have tried to sell copies of the book because of is sexual symbolism? In this case, the implied subject could be related to
9) Pornography (As such it would probably be a disappointment to most readers).
or
10) Books of general interest, easy readers
What then should be chosen as the subject of the book?
Well, the answer is that different epistemic communities have different needs: that an information service should analyze the subject according to the potential questions of its primary user group. A public library in Copenhagen or an information service for the Unions, should analyze according to their specific "epistemic interests" as indicated above.
A general library like The Royal Library or Library of Congress, should not mention all these specific subjects, but should estimate its most probable long-term utility. If the book represents an important methodological break-through in psycho-history or brings information about women's history, which could serve as important sources for further study, this book should be classified as psycho-history and women's history, respectively. Otherwise, it should only be placed as local history, Copenhagen, 1880-1920 (This last proposal would be my suggestion).
If many libraries' different subject descriptions of this book are merged in one database (a union catalog) this book would be visible from many different epistemic interests. This would be an ideal situation.
EXAMPLE 2
Let us consider the book by Robert A. Wicklund with the title: "Zero-Variable Theories and the Psychology of the Explainer" (Wicklund, 1990)?
According to the title of the book, it is about certain kinds of theories ("Zero-Variable Theories" and about "the psychology of the explainer", the latter subject is related to "the psychology of science".
If you read the book, you will see that "zero-variable theories" are not favorably evaluated, they are described as a kind of simplistic theory and the book tries to explain why this kind of theory is produced so frequently in modern psychology. Why do so many psychologists (or why do so many explainers in general) tend to use this kind of simplistic theory instead of more varied theories?
In the preface to the book in question, the following appears: "The reader should not suppose that this is a book about the philosophy of social science, or about moral pronouncement on what is good or bad in ancient and current psychological theorizing. Instead, the reader is invited to consider the psychological side of the explainer."
The Library of Congress' (LC) has made an analysis. LC writes in its "Cataloging-in-Publication Data" the following subject terms: "1. Psychology - Philosophy. 2. Psychologists-Psychology. 3. Explanation".
This means that LC, in its first selection of subject terms, is disposed not to follow Wicklund's statement in the preface, while the two following subject statements can be said to be in accordance with the self-understanding of the book, this applies especially to the last subject expression.
My own subject analysis is the following:
The book is important, because it deals with a neglected issue in psychological research, or psychology as a science: The apparent decay in the theoretical level in psychology. This condition is illustrated by a number of concrete analyses of psychological theories, which in the succeeding psychological research has become substantially reduced. One such example is the almost classical theory of personality by H. A. Murray from 1938.
In my opinion, the most essential thing about Wicklund's book is in particular the concrete documentation of the apparent decline in psychological theory. There are many books about the philosophy and methodology of psychology, giving direction to the science of psychology, but there are relatively few books documenting the apparent decline in theory. It seems as if psychology does not exploit the best of its own theory and knowledge from philosophy and other sciences. How can this be explained?
Wicklund sees the documentation of the theoretical decline as something less important in his book. His main interest is to use this material to give an explanation not only about the condition of psychology, but about the psychology of explainers in general. The material which I consider having the most potential value, is for the author of the book only a minor thing.
This means there is a marked difference between the author's (Wicklund's) and my own judgment about what the potential value of the book is, what its epistemological potential is. And therefore what its subject is. This book has - as any book - an unlimited number of properties. To analyze a book's subject is to choose the properties which have the greatest potentials for human knowledge. Therefore my subject analysis is different from that of the author as indicated by the title and the quoted sentences from the preface.
The reason that Wicklund's and my own analysis of the central subject of the book differ so much lies in my professional evaluation of Wicklund's explanation, which I will characterize as being too individualistic: Wicklund seeks explanation of the decline in psychological theory in psychological mechanisms in the persons producing those theories.
Certainly Wicklund, in connection with his explanation, writes about interesting and relevant psychological phenomena (such as rumors and competition) which should be a part of the pattern of explanation, but in my opinion, a broader cultural and social description is needed as a background for the understanding of these mechanisms.
In my opinion the documented examples of decline in psychological theory can in part be traced to the market for psychological books (and the market for psychologists!). In a long period after World War II, the market for psychological books was "seller's market", and it was all too easy to sell even very poorly written psychology books (and to do poor research). This phenomenon is described in an article by Jürgen Kagelmann, psychological consultant for "Psychologie Verlags Union", München in the magazine "Psychologie Heute" October 1988. Kagelmann's main point is that the far too easy sales possibilities in the 1970's made an overwhelming production of psychological books of a very doubtful quality. All that could be printed between to covers was thrown on the market, and the market was insatiable. This is an example of a non-individualistic explanation, which in my opinion comes closer to the truth than Wicklund's explanation, even if this is not a full explanation.
Therefore, in my opinion, Wicklund has a tendency to individualize and psychologize a social problem, and his book contains in a way a contradiction. Wicklund acts in this book also in the role of "explainer", and he too has a tendency toward a very simplistic, positivistic theory, which the book is actually meant to fight against.
The epistemological potential of Wicklund's book lies, in my opinion, especially in its documentation of certain conditions in psychological science which it is important to set right. Therefore the subject of the book is the epistemology of psychology, methodology, theory of science and philosophy. In my opinion, LC was right in its first selection of subject terms (Psychology - philosophy), which, as mentioned, was in contradiction to Wicklund's statement in the preface.
I would not consider "Zero-Variable theories" the subject of the book. It is hardly a concept with a future, not even as an explanation of the decline in theorizing. It is an open question, whether what has been called "variable psychology" (Holzkamp, 1983, p. 522), is a valuable concept.
As regards the proposed subject "psychology of the explainer" it is for me a theoretical question whether it is meaningful to search for such a theory and - even supposing it is - whether Wicklund's approach is a contribution to such a theory. This should be evaluated in relation to research going on in "decision theory", in philosophical theories about "explanation" and other fields, and that is not what Wicklund's book is about. My conclusion is that I tend to doubt the value of the proposed subject "psychology of the explainer". This doubt also includes LC's subject term "Explanation". Wicklund's book is hardly a contribution to the concept of explanation in general.
The last proposed subject which I want to discuss is "psychology of psychologists" (LC: "Psychologists-Psychology"). Such a subject does exist, and books are written about it. They can describe e.g. the recruitment of psychologist, the motivation for choosing the profession, biographical matters, the professional socialization and many other things. Wicklund's book is in my opinion not of this kind.
In my judgment - as already told - the subject of Wicklund's book is "Philosophy and epistemology of psychology". My judgment is of course subjective, and could be wrong, in general or in part. The only way to decide this is to analyze the arguments. The arguments about the subject of a book are fundamentally the same as arguments about the advancement of knowledge.
References
Hjørland, B. (1988). Information Retrieval in Psychology. Behavioral and Social Sciences Librarian, Vol 6 (3/4), 1988, 39‑64.
Hjørland, Birger (1997): Information Seeking and Subject Representation. An Activity-theoretical approach to Information Science. Westport & London: Greenwood Press.
Vammen, T. (1986). Rent og urent: Hovedstadens piger og fruer 18801920. København: Gyldendal.
Welwert, C. (1984). Läsa eller lyssna? Redovisning av jämförande undersökningar gjorda åren 18901980 rörande inläring vid auditiv och visuell presentation samt ett försök till utvärdering av resultaten. Malmö, CWK Gleerup.
Wicklund, R. A. (1990). Zero-variable theories and the psychology of the explainer. Berlin: Springer.
Birger Hjørland
Last edited: 20-09-2006
Let me jump into this with a specific example from PubMed
that I focused on in an unpublished experiment many years ago:
PMID- 2221937
OWN - NLM
STAT- MEDLINE
DA - 19901115
DCOM- 19901115
LR - 20051116
PUBM- Print
IS - 0003-987X (Print)
VI - 126
IP - 10
DP - 1990 Oct
TI - Transfusion-associated graft-vs-host disease in patients with
malignancies. Report of two cases and review of the literature.
PG - 1324-9
AB - Graft-vs-host disease can develop in immunosuppressed individuals who
receive blood-product transfusions that contain immunocompetent
lymphocytes. We report two cases of fatal transfusion-associated
graft-vs-host disease that developed in patients with Hodgkin's disease
who were undergoing therapy. We review all cases of this entity in
patients with malignancies, represented predominantly by patients with
hematologic malignancies. The groups at risk for development of
transfusion-associated graft-vs-host disease, the clinical presentation
and course, and methods of diagnosis are summarized. Prevention of this
highly fatal condition is possible by irradiation of blood products given
to patients at risk, but problems remain in determining the groups that
warrant such measures. Dermatologists need to have heightened awareness of
this entity to facilitate more complete diagnosis and allow establishment
of effective standards of care.
AD - Department of Dermatology, Harvard Medical School, Boston, Mass.
FAU - Decoste, S D
AU - Decoste SD
FAU - Boudreaux, C
AU - Boudreaux C
FAU - Dover, J S
AU - Dover JS
LA - eng
PT - Case Reports
PT - Journal Article
PT - Review
PL - UNITED STATES
TA - Arch Dermatol
JT - Archives of dermatology.
JID - 0372433
SB - AIM
SB - IM
CIN - Arch Dermatol. 1990 Oct;126(10):1347-50. PMID: 2221941 MH - Adolescent
MH - Adult MH - Blood Transfusion/*adverse effects MH - Female MH - Graft vs
Host Disease/*etiology/pathology MH - Hodgkin Disease/*immunology MH - Humans
MH - Immune Tolerance MH - Male MH - Skin Diseases/etiology/pathology RF -
50
EDAT- 1990/10/01
MHDA- 1990/10/01 00:01
PST - ppublish
SO - Arch Dermatol. 1990 Oct;126(10):1324-9.
My technique was to index this article ridiculously exhaustively and give this
indexing to indexers and searchers and have them cross out the terms they
thought shouldn't apply. (Naturally, indexers crossed out many more terms than
searchers.)
Anyway, let's talk about the neoplasm concept. At that time MeSH had:
Neoplasms
Hematologic Diseases
Hodgkin's Disease
Using today's MeSH, the terms would be:
Neoplasms
Hematologic Neoplasms
Hodgkin Disease
Let's talk in terms of today's MeSH.
Note that the title says "patients with malignancies"
Note the abstract has:
patients with Hodgkin's disease
because the two cases had this disease
but it also has:
patients with hematologic malignancies as this represents predominantly the
patients.
So is this article about neoplasms, hematologic malignancies, or Hodgkin
disease?
If you apply the specificity principle, the original indexer was correct in that
the data pertained only to two cases with HD.
However, does this represent the gist of the article?
This illustrates that the retrieval system also has something to do with
choosing the correct level of specificity. The fact that PubMed retrieval does
automatic explosion means that if a searcher enters the search term:
Neoplasms
this citation will be retrieved because the search automatically explodes the
term (searches the union of the term and it's indentions), and Hodgkin Disease
is in the Neoplasms tree.
Thus, indexing with the most specific term is a good thing because searching the
broader term Neoplasms will retrieve this citation as well, and you don't need
to cover multiple levels of specificity because of this.
However, if the retrieval system doesn't do this, then searching the broader
Neoplasms would miss this citation.
Hematologic Neoplasms is another story, however, because Hodgkin Disease is not
in the Hematologic Neoplasms hierarchy. Thus searching Hematologic Neoplasms
will not retrieve this citation. I would say that this citation is definitely
relevant for Hematologic Neoplasms, but that indexing term is not there.
In my experiment with 7 indexers, all did not cross out Hodgkin's Disease, but
5 did not cross out Neoplasms, and 2 did not cross out Hematologic Diseases
(remember Hematologic Neoplasms was not a term then). Actually, because the
indexing of hematologic malignancies should be Neoplasms + Hematologic Diseases,
it seems fair to say that two of the non-crossing out of Neoplasms go with
non-crossing out of Hematologic Diseases. So in summary,
7 used Hodgkin Disease
2 used Hematologic Diseases (as coord with Neoplasms)
5 used Neoplasms (2 as coord with Hematologic Diseases)
Indexers #1 & 3 used Hodgkin's Disease
Indexers #2 & 7 used Hodgkin's Disease + Hematologic Diseases + Neoplasms
indexers #4-6 used Hodgkin's Disease + Neoplasms
but interpreting through the current MeSH:
7 used Hodgkin Disease
2 used Hematologic Neoplasms
3 used Neoplasms
Specifically, indexers #1 & 3 used Hodgkin Disease Indexers #2 & 7 used Hodgkin
Disease + Hematologic Neoplasms indexers #4-6 used Hodgkin Disease + Neoplasms
So five of seven indexers assigned (i.e., did not cross out) not only Hodgkin's
Disease but also at least one of the broader terms despite the specificity rule
of indexing. Two went one level broader to the hematologic malignancies, and
three went two levels broader to just malignancies.
This work was done in 1991, and frankly I don't know the state of the retrieval
system at that time as to whether there was automatic explode or not. But in
any case Hodgkin's Disease was not in the Hematologic Diseases hierarchy either,
but at least the indexing of "hematologic malignancies" required BOTH
Hematologic Diseases AND Neoplasms, so under automatic pre-explode, the
Neoplasms part would retrieve this article. Today, it's worse because as I said
above Hematologic Neoplasms hierarchy does not include Hodgkin Disease.
Also, there was the matter of printed Index Medicus (which no longer exists).
According to the original indexing, this citation appeared in IM only under
Hodgkin's Disease. This means a person looking in the printed index under
Hematologic Diseases or under Neoplasms would not find this article, and in my
opinion this omission would be quite significant. That is, in perusing
Hematologic Diseases and Neoplasms in print, this article would be quite
relevant, but it would not be there, and the print searcher would have to know
that such an article was printed ONLY under some more specific Hematologic
Diseases or Neoplasms term (of which there are quite a few).
So I guess what I am saying is that sometimes the gist of the article suggests
multiple levels of specificity for indexing, and also the retrieval system might
compensate when the specificity rule is strictly applied.
I personally would argue for using all three terms in this case.
I would say that this topic has to do with the indexing application more than
with the thesaurus, in that all levels of specificity are represented in the
thesaurus. The issue is which levels the indexer selects.
Susanne Humphrey
humphrey@nlm.nih.gov
_______________________________________________
Sigcr-l mailing list
Sigcr-l@asis.org
http://mail.asis.org/mailman/listinfo/sigcr-l