OKAPI information retrieval system
Beaulieu & Jones (1998) explore some interface design
issues raised by the development and evaluation of a highly interactive
information retrieval (IR) system based on a probabilistic retrieval model with
relevance feedback. The Okapi system uses term frequency weighting functions to
display retrieved items in a best match ranked order; it can also find
additional items similar to those marked as relevant by the searcher.
Jones, Walker, Gatford & Do (1997) describe the heart of the Okapi system as a
formula referring to some half a dozen variables, which estimate the probability
that a given document is relevant to a given query. User interface design for
Okapi aims to present its search capabilities as clearly and simply as possible.
The evolution, and some of the functions, of the software layers are described.
Karamuftuoglu (1997) approaches IR from a semiotic point of view. Application of semiotic categories to IR reveals that the basic distinction in the retrieval interaction is between the two particular types of 'language games' known as 'denotations' and 'prescriptions'. The denotative act in IR is needed to transmit information from the database to the user of the system. The prescriptive act, however, can be used to 'invent' new connections between documents that constitute documentation systems and, thus, to create new knowledge. IR systems design practice is viewed as a social practice in which the main disjunction is between the two conflicting acts of denotation and prescription. It is the aim of the reported project to balance these two conflicting language games within the framework of the Okapi experimental information retrieval system.
Robertsen; Walker & Beaulieu (1995) describe how the Okapi system has been used
in a series of experiments on the TREC collections, investigating probabilistic
models, relevance feedback, and query expansion, and interaction issues. Some
new probabilistic models have been developed, resulting in simple weighting
functions that take account of document length and within-document and
within-query term frequency. All have been shown to be beneficial. Relevance
feedback and query expansion are seen as highly beneficial when based on large
quantities of relevance data.
Robertson Walker & Beaulieu (1997) have made use of TREC to improve some of the
automatic techniques used in Okapi, specifically the term weighting function and
the algorithms for term selection for query expansion. The consequence of this
process has been a very good showing for Okapi in terms of the TREC evaluation
results. Some of the issues around the much more difficult problem of
interactive evaluation in TREC are also discussed. Although some interesting
interactive experiments have been performed at TREC, the problems of reconciling
the requirements of the laboratory context with the concerns of interactive
retrieval are seen as still largely unresolved.
Robertson; Walker & Beaulieu (2000) write that the Okapi system has been used in
a series of experiments on the TREC collections, investigating probabilistic
models, relevance feedback and query expansion, and interaction issues. The
TREC-6 ad hoc task was used to test an application of a new relevance weighting
formula, which takes account of documents judged nonrelevant. The application
was to a form of blind feedback. In the routing task, the problem is one of
query optimization based on a training set with known relevant documents;
investigations for TREC-6 included using a form of simulated annealing for this
purpose. A significant feature of this work is the need to avoid overfitting of
the training sample. In the interactive track methodology remains the major
problem. The Okapi team has been particularly interested in the relation between
the functionalities associated with relevance feedback and the ability of
searchers to make use of these functionalities. TREC provides an excellent
environment and set of tools for, investigating automatic systems; its value for
interactive systems is not yet proven.
Literature:
Beaulieu, M. & Jones, S. (1998). Interactive searching and
interface issues in the Okapi best match probabilistic retrieval system.
Interacting with Computers, 10(3), 237-248.
Goker, A. (1997). Context learning in Okapi. Journal of Documentation,
53(1), 80-83.
Jones S; Walker S; Gatford M; Do T (1997). Peeling the onion: Okapi system
architecture and software design issues. Journal of Documentation,
53(1), 58-68.
Karamuftuoglu, M. (1997). Designing language games in
Okapi. Journal of Documentation, 53(1), 69-73.
Karamuftuoglu, M.; Jones, S.; Robertson, S.; Venuti, F. & Wang, X. K. (2002).
Challenges posed by web-based retrieval of scientific papers: Okapi
participation in TIPS. Journal of Information Science, 28(1), 3-17.
Mitev, N. N. ; Venner, G. M. ; Walker, S. (1985).
Designing an online public access catalog : Okapi, a catalogue on a local area
network. London, British library. (Library and information research report
39)
Robertson, S. E. (1997). Overview of the Okapi projects. Journal of
Documentation, 53(1), 3-7.
Robertsen, S. E.; Walker, S. & Beaulieu, M. M. (1995). Large test collection
experiments on an operational, interactive system: OKAPI at TREC. Information
Processing & Management, 31(3), 345-360.
Robertson, S. E.; Walker, S. & Beaulieu, M. (1997). Laboratory experiments with
Okapi: Participation in the TREC programme. Journal of Documentation,
53(1), 20-34.
Robertson, S. E.; Walker, S. & Beaulieu, M. (2000). Experimentation as a way of
life: Okapi at TREC. Information Processing & Management, 36(1), 95-108.
Walker, S. (1988). Improving subject access painlessly: Recent work on the OKAPI online catalog projects. Program-Automated Library and Information Systems, 22(1), 21-31.
http://www.soi.city.ac.uk/~andym/OKAPI-PACK/
TREC and OKAPI projects: http://www.comp.rgu.ac.uk/staff/asga/intel.html
Birger Hjørland
Last edited: 01-09-2006