Stochastic (statistical) search is on the way out
Published 1 year ago by James Simmons
There's a lot of talk about new search engines and the promising technologies behind them. One technology that has more or less recently been applied to Web search is natural language processing. NLP allows search engines such as Hakia and Powerset to return results based on the query's meaning rather than relying on keyword distribution as a means of identifying relevant Web documents.
Stochastic search methods retrieve information containing one or more words that are specified by the user. Keywords are usually used from the text body of a document or from metadata such as title, author, etc. Stochastic searches frequently utilize Boolean search strategies to maximize the efficiency of the search and return the best results, or exclude results that the user knows to be unhelpful. Searches on the 3 major search engines are accomplished using some type of statistical method for calculating the relevancy of results.
How does keyword search fall short? It falls short because the relevancy of documents is calculated based in part on the occurrences and distribution of keywords. Stochastic search methods return relevant results much of the time, however there is an incredible amount of improvement to be made. Those improvements will involve using natural language processing to extract meaning from search queries.
About the author
Trackback URL for this entry:
http://www.semanticfocus.com/blog/tr/id/724535/
Spam protection by Akismet
Post a comment


Posted by Anonymous on July 8, 2007 at 8:57am
Just to set the record straight, statistical NLP *is* the dominant form of NLP. So statistical approach to search isn't on the way out, quite the opposite in fact. See LSI for a promising application of statistical NLP: http://en.wikipedia.org/wiki/Latent_semantic_analysis
Posted by James Simmons on July 8, 2007 at 11:37am
I wasn't referring to any methods of NLP when I was was talking about statistical methods for retrieving results, however it's funny that you would draw those parallels. Thanks for keeping me on my toes.