Search Result Diversification

Studies show that the vast majority of queries to search engines are short and vague in specifying a user’s intent. Different users may have completely different information needs and goals when using precisely the same query. For example, User A is finding information about Apply Company by issuing a query "apple,", while User B is finding information related to fruit apple using the same query. When such a query is issued, search engines will return a list of documents that mix different topics. It takes time for a user to choose which information he/she wants. Search Result Diversification is an effective way to solve this problem. It provides a list of results that cover as many aspects as possible, so that most users can be satisfied by the top results.

Query Facet/Dimension Mining

We address the problem of finding multiple groups of words or phrases that explain the underlying query facets, which we refer to as query dimensions/facets. We assume that the important aspects of a query are usually presented and repeated in the query’s top retrieved documents in the style of lists, and query facets can be mined out by aggregating these significant lists.

搜索引擎在一定程度上解决了大规模网页所带来的信息过载问题。 用户可通过输出简单的关键词,即可在海量互联网内容中查找到相关网站或者网页。 但近年来随着互联网尤其是移动互联网的高速发展,互联网文档的数量、内容的丰富度和复杂度都大大增加。 互联网朝大数据时代迈进,而用户的信息需求也趋于复杂化。除了基本的信息检索需求外,对大量相关文档的深入理解与聚合分析的需求也越来越强烈, 而传统的互联网搜索引擎已经无法满足人们该类信息需求。 针对这一问题,我们提出了“互联网分析引擎”的构想。


