Statistically Improbable Phrases (SIPs)

Amazon.com's Statistically Improbable Phrases, or "SIPs", are the most distinctive phrases in the text of books in the Search Inside! program. To identify SIPs, the computers scan the text of all books in "Search Inside". If they find a phrase that occurs a large number of times in a particular book relative to all Search Inside books, that phrase is a SIP in that book.

 

SIPs are not necessarily improbable within a particular book, but they are improbable relative to all books in Search Inside. For example, most SIPs for a book on taxes are tax related. But because Amazon display SIPs in order of their improbability score, the first SIPs will be on tax topics that this book mentions more often than other tax books. For works of fiction, SIPs tend to be distinctive word combinations that often hint at important plot elements. One may click on a SIP to view a list of books in which the phrase occurs. One may also view a list of references to the phrase in each book.

 

EXAMPLE: The book Philosophy of Social Science: Philosophical Issues in Social Thought contains the following SIPs: SCIENTIFIC KNOWLEDGE CLAIMS, FEMINIST STANDPOINT EPISTEMOLOGY, CRITICAL REALISM, SOCIAL SCIENTIFIC KNOWLEDGE, FEMINIST EMPIRICISM, EMPIRICIST VIEW, EMPIRICIST ACCOUNT

 

 

See also: Weighting

 

 

Birger Hjørland

Last edited: 02-05-2006

HOME