Zhicheng Dou

窦志成

Zhicheng Dou is a professor at School of Information, Renmin University of China since September 2014. He received his Ph.D. and B.S. degrees in computer science and technology from the Nankai University in 2008 and 2003, respectively. After getting his Ph.D. degree, he worked at Microsoft Research as a researcher for more than six years (from July 2008 to September 2014). His research interests include Information Retrieval, Search Engine, Web Data Mining, and Big Data Analysis. Zhicheng Dou is not a pure research guy - besides writing papers, he also enjoys writing codes to convert cool ideas into real systems.

窦志成博士于2003年和2008年分别获得南开大学计算机学士和博士学位,2008年博士毕业后就职于微软亚洲研究院,任研究员。 2014年9月份加入中国人民大学,任特别研究员。主要研究兴趣为信息检索、互联网搜索、数据挖掘,大数据等。 目前已在国际知名会议和学术期刊上(如SIGIR、WWW、CIKM、WSDM、EMNLP及IEEE TKDE等)发表论文20余篇。 他于2013年获得国际信息检索大会(SIGIR2013)最佳论文提名奖,2012年获得亚洲信息检索大会(AIRS 2012)最佳论文奖。 担任过多个国际学术会议(如SIGIR、WWW、KDD、WSDM、CIKM)的程序委员会成员。 是日本国立情报学研究所(NII)信息检索评测会议(NTCIR)Intent-2任务和IMINE任务的组织者之一。 目前是亚洲信息检索协会筹划指导委员会(AIRS Steering Committee)成员、美国ACM学会、IEEE会员,中国计算机学会会员,中国计算机学会大数据专家委员会通讯委员、中文信息学会信息检索专委会通讯委员、中文信息学会信息检索专委会学术工作组成员、中国中文信息学会青年工作委员会委员、中国人工智能学会智能服务专委会委员。

除研究工作外,窦志成博士乐于将研究想法实现成可运行的系统。 在亚洲研究院任职期间,他参与了多个项目的开发, 如WebStudioProjectQ、 和 WebSensor等。 他拥有多项专利,参与研发的多项技术已经成功转化到微软产品中(如必应搜索Bing和Office)。 窦志成博士还设计并开发了“时事探针”系统,在网络大数据感知和实时多维分析等技术上具有国际领先性。

Information Retrieval Web Search Data Mining Big Data
信息检索 互联网搜索 数据挖掘 大数据

Recent Publications

  • Xiaojie Wang, Zhicheng Dou, Tetsuya Sakai, and Ji-Rong Wen. Evaluating Search Result Diversity using Intent Hierarchies. In Proceedings of SIGIR, 2016.
  • Zhicheng Dou, Zhengbao Jiang, Sha Hu, Ji-Rong Wen, Ruihua Song: Automatically Mining Facets for Queries from Their Search Results. IEEE Trans. Knowl. Data Eng. (TKDE) 28(2):385-397 (2016)
  • Sha Hu, Zhicheng Dou, Xiaojie Wang, Tetsuya Sakai, and Ji-Rong Wen. 2015. Search Result Diversification Based on Hierarchical Intents. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management (CIKM '15). ACM, New York, NY, USA, 63-72.

News

  • AIRS 2016: The Twelfth AIRS (AIRS 2016) will be hosted by the Chinese Information Processing Society of China and co-organized by Tsinghua University and Renmin University of China. I serve as the general co-chair.
  • Datasets for Hierachical Search Result Diversification and Evaluation was released: Diversification , Evaluation

Search Result Diversification

Studies show that the vast majority of queries to search engines are short and vague in specifying a user’s intent. Different users may have completely different information needs and goals when using precisely the same query. For example, User A is finding information about Apply Company by issuing a query "apple,", while User B is finding information related to fruit apple using the same query. When such a query is issued, search engines will return a list of documents that mix different topics. It takes time for a user to choose which information he/she wants. Search Result Diversification is an effective way to solve this problem. It provides a list of results that cover as many aspects as possible, so that most users can be satisfied by the top results.

Query Facet/Dimension Mining

We address the problem of finding multiple groups of words or phrases that explain the underlying query facets, which we refer to as query dimensions/facets. We assume that the important aspects of a query are usually presented and repeated in the query’s top retrieved documents in the style of lists, and query facets can be mined out by aggregating these significant lists.