Monday, June 8, 2009

Exploiting Web Search Engines to Search Structured Databases

Exploiting Web Search Engines to Search Structured Databases is a Microsoft paper about the integration of verticals in Web search results. The key idea is that each vertical is a-priori associated with a list of relevant entitites. When a user submit a query, those entities are extracted by the search results snippets. In this way, the query itself is implicitely expanded with vertical related entities found in search results (note that similar idea, has been investigated by Yahoo in a paper for adversiting).

The paper discusses many off-line pre-processing techniques for extracting entities before query time. A mixed approach based on trie pattern-matching and svm classification is proposed. Extracted entities are then ranked by taking into account proximity, frequency count and relevance of documents. For a given query, a vertical result is then triggered when 'good enough' entities are retrieved. A case study based on product search vertical and Microsoft Live search is discussed. Entities are extracted from Wikipedia, Trec QA, and IMDB.

The approach is quite effective and the performances are pretty good. Anyway, it can show some limits of real time verticals (such as Twitter or News) where entities are not known a priori.

1 comment:

  1. Antonio, the link to the paper doesn't appear to be working. Would you send me a PDF copy (if you have one)?

    ReplyDelete