The OKAPI information retrieval

OKAPI information retrieval system

Beaulieu & Jones (1998) explore some interface design issues raised by the development and evaluation of a highly interactive information retrieval (IR) system based on a probabilistic retrieval model with relevance feedback. The Okapi system uses term frequency weighting functions to display retrieved items in a best match ranked order; it can also find additional items similar to those marked as relevant by the searcher.

Jones, Walker, Gatford & Do (1997) describe the heart of the Okapi system as a formula referring to some half a dozen variables, which estimate the probability that a given document is relevant to a given query. User interface design for Okapi aims to present its search capabilities as clearly and simply as possible. The evolution, and some of the functions, of the software layers are described.

Karamuftuoglu (1997) approaches IR from a semiotic point of view. Application of semiotic categories to IR reveals that the basic distinction in the retrieval interaction is between the two particular types of 'language games' known as 'denotations' and 'prescriptions'. The denotative act in IR is needed to transmit information from the database to the user of the system. The prescriptive act, however, can be used to 'invent' new connections between documents that constitute documentation systems and, thus, to create new knowledge. IR systems design practice is viewed as a social practice in which the main disjunction is between the two conflicting acts of denotation and prescription. It is the aim of the reported project to balance these two conflicting language games within the framework of the Okapi experimental information retrieval system.

Robertsen; Walker & Beaulieu (1995) describe how the Okapi system has been used in a series of experiments on the TREC collections, investigating probabilistic models, relevance feedback, and query expansion, and interaction issues. Some new probabilistic models have been developed, resulting in simple weighting functions that take account of document length and within-document and within-query term frequency. All have been shown to be beneficial. Relevance feedback and query expansion are seen as highly beneficial when based on large quantities of relevance data.

Robertson Walker & Beaulieu (1997) have made use of TREC to improve some of the automatic techniques used in Okapi, specifically the term weighting function and the algorithms for term selection for query expansion. The consequence of this process has been a very good showing for Okapi in terms of the TREC evaluation results. Some of the issues around the much more difficult problem of interactive evaluation in TREC are also discussed. Although some interesting interactive experiments have been performed at TREC, the problems of reconciling the requirements of the laboratory context with the concerns of interactive retrieval are seen as still largely unresolved.

Robertson; Walker & Beaulieu (2000) write that the Okapi system has been used in a series of experiments on the TREC collections, investigating probabilistic models, relevance feedback and query expansion, and interaction issues. The TREC-6 ad hoc task was used to test an application of a new relevance weighting formula, which takes account of documents judged nonrelevant. The application was to a form of blind feedback. In the routing task, the problem is one of query optimization based on a training set with known relevant documents; investigations for TREC-6 included using a form of simulated annealing for this purpose. A significant feature of this work is the need to avoid overfitting of the training sample. In the interactive track methodology remains the major problem. The Okapi team has been particularly interested in the relation between the functionalities associated with relevance feedback and the ability of searchers to make use of these functionalities. TREC provides an excellent environment and set of tools for, investigating automatic systems; its value for interactive systems is not yet proven.

Literature:

Beaulieu, M. & Jones, S. (1998). Interactive searching and interface issues in the Okapi best match probabilistic retrieval system. Interacting with Computers, 10(3), 237-248.

Goker, A. (1997). Context learning in Okapi. Journal of Documentation, 53(1), 80-83.

Jones S; Walker S; Gatford M; Do T (1997). Peeling the onion: Okapi system architecture and software design issues. Journal of Documentation, 53(1), 58-68.

Karamuftuoglu, M. (1997). Designing language games in Okapi. Journal of Documentation, 53(1), 69-73.

Karamuftuoglu, M.; Jones, S.; Robertson, S.; Venuti, F. & Wang, X. K. (2002). Challenges posed by web-based retrieval of scientific papers: Okapi participation in TIPS. Journal of Information Science, 28(1), 3-17.

Mitev, N. N. ; Venner, G. M. ; Walker, S. (1985). Designing an online public access catalog : Okapi, a catalogue on a local area network. London, British library. (Library and information research report 39)

Robertson, S. E. (1997). Overview of the Okapi projects. Journal of Documentation, 53(1), 3-7.

Robertsen, S. E.; Walker, S. & Beaulieu, M. M. (1995). Large test collection experiments on an operational, interactive system: OKAPI at TREC. Information Processing & Management, 31(3), 345-360.

Robertson, S. E.; Walker, S. & Beaulieu, M. (1997). Laboratory experiments with Okapi: Participation in the TREC programme. Journal of Documentation, 53(1), 20-34.

Robertson, S. E.; Walker, S. & Beaulieu, M. (2000). Experimentation as a way of life: Okapi at TREC. Information Processing & Management, 36(1), 95-108.

Walker, S. (1988). Improving subject access painlessly: Recent work on the OKAPI online catalog projects. Program-Automated Library and Information Systems, 22(1), 21-31.

http://www.soi.city.ac.uk/~andym/OKAPI-PACK/

TREC and OKAPI projects: http://www.comp.rgu.ac.uk/staff/asga/intel.html

Birger Hjørland

Last edited: 01-09-2006

Home