Machine-aided indexing

Machine-aided indexing (MAI)

Machine-aided indexing may be considered a semi-automatic form of indexing as opposed to full-automatic indexing.

Golub (2005) writes that machine aided indexing (MAI) is an approach, that has been used to suggest controlled vocabulary terms to be assigned to a document:

"Document classification is a library science approach. The tradition of automating the process of subject determination of a document and assigning it to a term from a controlled vocabulary partly has its roots in machine-aided indexing (MAI). MAI has been used to suggest controlled vocabulary terms to be assigned to a document.
The automated part of this approach differs from the previous two [text categorization and clustering] in that it is generally not based on either supervised or unsupervised learning. Neither do documents and classes get represented by vectors. In document classification, the algorithm typically compares extracted terms from the text to be classified, to mapped terms from the controlled vocabulary (string-tostring matching). At the same time, this approach does share similarities with text categorization and document clustering: the pre-processing of documents to be classified includes stop-words removal; stemming can be conducted; words or phrases from the text of documents to be classified are extracted and weights are assigned to them based on different heuristics; Web-page characteristics have been explored, although to a lesser degree." (Golub, 2005, p. 29)

Literature:

Golub, K. (2005). Automated subject classification of textual web pages, for browsing. Lund: Lund University, Department of Information Technology. Available: http://www.it.lth.se/koraljka/Lund/publ/LicE.pdf

Jacquemin, C.; Daille, B.; Royaute, J. & Polanco, X. (2002). In vitro evaluation of a program for machine-aided indexing
Information Processing & Management, 38(6), 765-792.
Abstract: This article presents the human evaluation of ILIAD, a program for machine-aided indexing (MAI). It consists of two language
engineering modules and is designed to assist expert librarians in
computer-aided indexing and document analysis. Our aim is the expert
evaluation of automatic multi-word term indexing. Evaluation is
performed by documentary engineers. Cataloging and indexing are their principal tasks. They also have a good scientific knowledge of the
domain to which the indexed documents belong.
We first present the ILIAD program and the two systems submitted to
this evaluation, the methodology (protocol) adopted, the differences
between the protocol and the implementation, and the results of these evaluations. Human evaluation is divided into three parts: firstly the evaluation of controlled indexing, then free indexing and finally term variant extraction performed during controlled indexing. Finally, we analyze the relevance of this evaluation by calculating the agreement frequency and the Kappa coefficient and propose some future developments.

Klingbie, P. H. (1973). Machine-aided indexing of technical literature. Information Storage & Retrieval, 9(2), 79-84.

Klingbie, P. H. (1973). Technique for machine-aided indexing. Information Storage & Retrieval, 9(9), 477-494.

Klingbie, P. H. & Rinker, C. C. (1976). Evaluation of machine-aided indexing. Information Processing & Management, 12(6), 351-366.

Lucey, J. (1993). Machine aided indexing. Journal of the American Society for Information Science, 44(7), 430. (Letter).

Milstead, J. L. (1992). Methodologies for subject analysis in bibliographical databases. Information Processing & Management, 28(3), 407-431.
Abstract: Techniques and methodologies for subject analysis have changed in recent years, and current research indicates that the changes may be accelerating. The review reported in this paper was undertaken to aid managers of databases in determining if new and little-known capabilities would improve the cost-effectiveness of subject analysis operations. Sophisticated computer aids to routine procedures in subject analysis seem likely to be valuable, although issues of capital investment might limit their application in a given situation. Operational machine-aided and automatic indexing systems were found to form a continuum. The same system can be used for automatic indexing (without human review of individual documents) and machine-aided indexing (with human review) for different applications. Commercial automatic indexing packages were also reviewed. The overall conclusion was that database producers should begin working seriously on upgrading their thesauri and codifying their indexing policies as a means of moving toward development of machine aids to indexing, but that fully automatic indexing is not yet ready for wholesale implementation. The primary obstacle to development of automatic indexing is the lack of machine "understanding" of natural language. Research in artificial intelligence and knowledge bases is attacking this problem, but there is still much work to be done. Recommendations for action include: increasing the power of the indexer interface; studying indexing policies; enrichment of thesauri; taking steps that will contribute to later development of knowledge bases; considering development of machine-aided indexing; and applying the findings of natural language processing research.

Silvester, J. P.; Genuardi, M. T & Klingbie, P. H. (1994). Machine-aided indexing at NASA. Information Processing & Management, 1994, 30(5), 631-645.
Abstract: This report describes the NASA Lexical Dictionary (NLD), a
machine-aided indexing system used online at the National Aeronautics
and Space Administration's Center for AeroSpace Information (CASI).
This system automatically suggests a set of candidate terms from NASA's
controlled vocabulary for any designated natural language text input.
The system is comprised of a text processor that is based on the
computational, nonsyntactic analysis of input text and an extensive
knowledge base that serves to recognize and translate text-extracted
concepts. The functions of the various NLD system components are
described in detail, and production and quality benefits resulting from
the implementation of machine-aided indexing at CASI are discussed.

Silvester, J. P. & Klingbie, P. H. (1993). An operational system for subject switching between controlled vocabularies. Information Processing & Management, 29(1), 47-59.
Abstract: The NASA system of automatically converting sets of terms
assigned by Department of Defense indexers to sets of NASA's authorized terms is described. This little-touted system, which has been operating successfully since 1983, matches concepts, rather than words. Subject Switching uses a translation table, known as the Lexical Dictionary, accessed by a program that determines which rules to follow in making the transition from DTIC's to NASA's authorized terms. The authors describe the four phases of development of Subject Switching, changes that have been made, evaluating the system, and benefits. Benefits to NASA include saving indexers' time, the addition of access points for documents indexed, the utilization of other government indexing, and a contribution towards the now-operational NASA, online, interactive, machine aided indexing.