Machine-aided indexing (MAI)
Machine-aided indexing may be considered a semi-automatic
form of indexing as opposed to full-automatic
indexing.
Golub (2005) writes that machine aided indexing (MAI) is an approach, that has
been used to suggest controlled vocabulary terms to be assigned to a document:
"Document classification is a library science
approach. The tradition of automating the process of subject determination of a
document and assigning it to a term from a controlled vocabulary partly has its
roots in machine-aided indexing (MAI). MAI has been used to suggest controlled
vocabulary terms to be assigned to a document.
The automated part of this approach differs from the previous two [text
categorization and
clustering]
in that it is generally not based on either supervised or unsupervised learning.
Neither do documents and classes get represented by vectors. In document
classification, the algorithm typically compares extracted terms from the text
to be classified, to mapped terms from the controlled vocabulary (string-tostring
matching). At the same time, this approach does share similarities with text
categorization and document clustering: the pre-processing of documents to be
classified includes stop-words removal; stemming can be conducted; words or
phrases from the text of documents to be classified are extracted and weights
are assigned to them based on different heuristics; Web-page characteristics
have been explored, although to a lesser degree." (Golub, 2005, p. 29)
Literature:
Golub, K. (2005). Automated subject classification of textual web pages, for browsing. Lund: Lund University, Department of Information Technology. Available: http://www.it.lth.se/koraljka/Lund/publ/LicE.pdf
Jacquemin, C.; Daille, B.; Royaute, J. & Polanco, X. (2002). In vitro evaluation
of a program for machine-aided indexing
Information Processing & Management, 38(6), 765-792.
Abstract: This article presents the human evaluation of ILIAD, a program for
machine-aided indexing (MAI). It consists of two language
engineering modules and is designed to assist expert librarians in
computer-aided indexing and document analysis. Our aim is the expert
evaluation of automatic multi-word term indexing. Evaluation is
performed by documentary engineers. Cataloging and indexing are their principal
tasks. They also have a good scientific knowledge of the
domain to which the indexed documents belong.
We first present the ILIAD program and the two systems submitted to
this evaluation, the methodology (protocol) adopted, the differences
between the protocol and the implementation, and the results of these
evaluations. Human evaluation is divided into three parts: firstly the
evaluation of controlled indexing, then free indexing and finally term variant
extraction performed during controlled indexing. Finally, we analyze the
relevance of this evaluation by calculating the agreement frequency and the
Kappa coefficient and propose some future developments.
Klingbie, P. H. (1973). Machine-aided indexing of technical literature.
Information Storage & Retrieval, 9(2), 79-84.
Klingbie, P. H. (1973). Technique for machine-aided indexing. Information Storage & Retrieval, 9(9), 477-494.
Klingbie, P. H. & Rinker, C. C. (1976). Evaluation of machine-aided indexing. Information Processing & Management, 12(6), 351-366.
Lucey, J. (1993). Machine aided indexing. Journal of the American Society for Information Science, 44(7), 430. (Letter).
Milstead, J. L. (1992). Methodologies for subject analysis
in bibliographical databases. Information Processing & Management, 28(3),
407-431.
Abstract: Techniques and methodologies for subject analysis have changed in
recent years, and current research indicates that the changes may be
accelerating. The review reported in this paper was undertaken to aid managers
of databases in determining if new and little-known capabilities would improve
the cost-effectiveness of subject analysis operations. Sophisticated computer
aids to routine procedures in subject analysis seem likely to be valuable,
although issues of capital investment might limit their application in a given
situation. Operational machine-aided and automatic indexing systems were found
to form a continuum. The same system can be used for automatic indexing (without
human review of individual documents) and machine-aided indexing (with human
review) for different applications. Commercial automatic indexing packages were
also reviewed. The overall conclusion was that database producers should begin
working seriously on upgrading their thesauri and codifying their indexing
policies as a means of moving toward development of machine aids to indexing,
but that fully automatic indexing is not yet ready for wholesale implementation.
The primary obstacle to development of automatic indexing is the lack of machine
"understanding" of natural language. Research in artificial intelligence and
knowledge bases is attacking this problem, but there is still much work to be
done. Recommendations for action include: increasing the power of the indexer
interface; studying indexing policies; enrichment of thesauri; taking steps that
will contribute to later development of knowledge bases; considering development
of machine-aided indexing; and applying the findings of natural language
processing research.
Silvester, J. P.; Genuardi, M. T & Klingbie, P. H. (1994).
Machine-aided indexing at NASA. Information Processing & Management, 1994,
30(5), 631-645.
Abstract: This report describes the NASA Lexical Dictionary (NLD), a
machine-aided indexing system used online at the National Aeronautics
and Space Administration's Center for AeroSpace Information (CASI).
This system automatically suggests a set of candidate terms from NASA's
controlled vocabulary for any designated natural language text input.
The system is comprised of a text processor that is based on the
computational, nonsyntactic analysis of input text and an extensive
knowledge base that serves to recognize and translate text-extracted
concepts. The functions of the various NLD system components are
described in detail, and production and quality benefits resulting from
the implementation of machine-aided indexing at CASI are discussed.
Silvester, J. P. & Klingbie, P. H. (1993). An operational
system for subject switching between controlled vocabularies. Information
Processing & Management, 29(1), 47-59.
Abstract: The NASA system of automatically converting sets of terms
assigned by Department of Defense indexers to sets of NASA's authorized terms is
described. This little-touted system, which has been operating successfully
since 1983, matches concepts, rather than words. Subject Switching uses a
translation table, known as the Lexical Dictionary, accessed by a program that
determines which rules to follow in making the transition from DTIC's to NASA's
authorized terms. The authors describe the four phases of development of Subject
Switching, changes that have been made, evaluating the system, and benefits.
Benefits to NASA include saving indexers' time, the addition of access points
for documents indexed, the utilization of other government indexing, and a
contribution towards the now-operational NASA, online, interactive, machine
aided indexing.
See also: Automatic Indexing; Indexing
Birger Hjørland
Last edited: 29-08-2006