Latent semantic indexing

Latent semantic analysis (LSA) is a technique in information retrieval (IR) invented in 1990. As an approach to IR it is based upon and related to the vector space model. It is sometimes called latent semantic indexing (LSI). (e.g., Ding, 2005; Dumais, 2004).


The underlying idea is that the aggregate of all the word contexts in which a given word does and does not appear provides a set of mutual constraints that largely determines the similarity of meaning of words and sets of words to each other. LSI, as currently practiced, induces its representations of the meaning of words and passages from analysis of text alone. "It makes no use of word order, thus of syntactic relations or logic, or of morphology. Remarkably, it manages to extract correct reflections of passage and word meanings quite well without these aids, but it must still be suspected of resulting incompleteness or likely error on some occasions". (Landauer, Foltz & Laham, 1998).






Birger Hjørland

Last edited: 27-02-2008