Tutorial: Statistical Information Retrieval Modelling: From Probability Ranking Principle to recent advances in diversity, Portfolio Theory, and beyond

Jun Wang - University College London
Kevyn Collins-Thompson  - Microsoft Research

Tutorial slides

 

Statistical modelling of Information Retrieval systems is a key driving force in the development of the information retrieval (IR) field. The objective of this tutorial is to provide a comprehensive and up-to-date introduction to statistical Information Retrieval modelling. Unlike many other theoretical IR tutorials offered in the past, we take a fresh and systematic perspective from the viewpoint of portfolio theory of information retrieval and risk management. A unified treatment and new insights will be given to reflect the recent developments of considering the ranked retrieval results as a whole. Recent research progress in diversification, risk management, and the portfolio theory of information retrieval will be covered, in addition to classic methods such as Maron and Kuhns‚ Probabilistic Indexing, Robertson-Sparck Jones model (the resulting BM25 formula) and language modelling approaches. The tutorial will also review the resulting practical algorithms of risk-aware query expansion, diverse ranking, IR metric optimization as well as their performance evaluations. Practical IR applications such as web search engines, multimedia retrieval, and collaborative filtering will also be introduced, as well as discussion of new opportunities for future research and applications that intersect among information retrieval, knowledge management, and databases.

 

Speakers:

Jun Wang - University College London

Dr. Jun Wang is Senior Lecturer (Associate Professor) in Computer Science, University College London and Founding Director of MSc/MRes Web Science programme. His main research interests are statistical modelling of information retrieval, collaborative filtering, and online advertising. He was a recipient of the SIGIR Doctoral Consortium Award in 2006, and Beyond Search award in 2007; and won the Best Paper Prize in ECIR09. Jun’s recent service includes (Senior) PCs for SIGIR, CIKM, RecSys, and ECIR, and presenting an ECIR2011 tutorial on Risk Management in Information Retrieval.

 

Kevyn Collins-Thompson  - Microsoft Research

Dr. Kevyn Collins-Thompson is a Researcher in the Context, Learning and User Experience for Search (CLUES) group at Microsoft Research.  His research lies in an area combining information retrieval and machine learning, and focuses on theoretical models, algorithms, and evaluation methods for search technology that is both reliable and effective.   His recent work has explored the use of fast optimization methods for increasing the reliability of risky algorithms such as query expansion.  Kevyn's recent service includes Program Committee roles for SIGIR, ICML, UAI, CIKM, and ACL, and co-authoring and presenting an ICML 2009 tutorial on Machine Learning and Information Retrieval and ECIR2011 on Risk Management in Information Retrieval.  Kevyn has also served as adjunct faculty at the University of Washington.