Latent semantic indexing

Latent semantic analysis (LSA) is a technique in information retrieval (IR) invented in 1990. As an approach to IR it is based upon and related to the vector space model. It is sometimes called latent semantic indexing (LSI). (e.g., Ding, 2005; Dumais, 2004).


The underlying idea is that the aggregate of all the word contexts in which a given word does and does not appear provides a set of mutual constraints that largely determines the similarity of meaning of words and sets of words to each other. LSI, as currently practiced, induces its representations of the meaning of words and passages from analysis of text alone. "It makes no use of word order, thus of syntactic relations or logic, or of morphology. Remarkably, it manages to extract correct reflections of passage and word meanings quite well without these aids, but it must still be suspected of resulting incompleteness or likely error on some occasions". (Landauer, Foltz & Laham, 1998).






Ding, C. H. Q. (2005).  A probabilistic model for Latent Semantic Indexing. Journal of the American Society for Information Science and Technology,  56(6), 597-608.


Dumais, S. T. (2004).  Latent semantic analysis. Annual Review of Information Science and Technology, 38, 189-230.


Landauer, T. K., Foltz, P. W., & Laham, D. (1998). Introduction to Latent Semantic Analysis. Discourse Processes, 25, 259-284.


Perfetti, C. A. (1998). The limits of co-occurrence: Tools and theories in language research. Discourse Processes, 25, 363-377.


Wikipedia. The free encyclopedia. (2006). Latent semantic analysis.



See also: Natural Language Processing; Vector space model



Birger Hjørland

Last edited: 27-02-2008