Tutorial: Information Retrieval Challenges in Computational Advertising

Andrei Broder - Yahoo! Research
Evgeniy Gabrilovich - Yahoo! Research
Vanja Josifovski - Yahoo! Research

Web advertising supports a large swath of the Internet ecosystem. It brings revenue to countless publishers that rent space on their pages for advertising, from small mom-and-pop shops to major search engines. It also provides valuable traffic to numerous commercial Web sites and has fueled the development of Web search engines.

In its core, Web advertising follows the two major models of traditional advertising. Direct advertising aims to elicit a direct response from the user. Sponsored search is an example of direct advertising where ads are shown alongside the Web search results. The sponsored search and the Web search are evaluated as two separate queries over two different corpora: the former over the crawled Web pages, and the latter over the ads provided by advertisers. Sponsored search has grown into a $10+ billion dollar industry impacting the search experience of virtually every Internet user, and is a major channel for traffic acquisition on the Web. While sponsored search is related to Web search and some of the techniques can be cross-used, the differences in the document structure, corpus size and other issues pose interesting research challenges for the Search and Information Retrieval communities.

In contrast with direct advertising, brand advertising aims to give a favorable impression of a brand or a product. Brand advertising on the Web is usually done by graphical ads (display ads) placed on the publishers' Web pages. Some display advertising includes a direct response component where the aim of the advertiser is a click on the ad that leads to a visit to the advertiser's Web site. In display advertising there is no explicit user query, and the ad selection is performed based on the page where the ad is placed (contextual advertising) or user's past activities (behavioral targeting). In both cases, sophisticated learning algorithms are needed to provide relevant ads to the user. Many of the display advertising mechanisms such as traffic forecasting, ad selection, and pricing are just starting to attract the attention of the research community.

Computational advertising is a new scientific discipline that aims to formalize the problem of finding the best ad for a given user in a specific context. In traditional advertising, the number of venues is small, the cost per venue is high, and little or no personalization is possible (as for example in print magazines). In contrast, in online advertising there are billions of opportunities (page views), hundreds of millions of ads and it is possible to provide personalization with quantifiable results. This brings the advertising into the realm of the other "computational" sciences.

The aim of this tutorial is to present the state of the art in the emerging area of computational advertising, and to expose the participants to the main research challenges in this exciting field. The tutorial does not assume any prior knowledge of Web advertising, and will begin with a comprehensive background survey of the topic. In this tutorial, we focus on one important aspect of online advertising, namely, using the user context to retrieve relevant ads. It is essential to emphasize that in most cases the context of user actions is defined by a body of text, hence the ad matching problem lends itself to many IR methods. At first approximation, the process of obtaining relevant ads can be reduced to conventional information retrieval, where one constructs a query that describes the user's context, and then executes this query against a large inverted index of ads. We show how to augment the standard information retrieval approach using query expansion and text classification techniques. We demonstrate how to employ a relevance feedback assumption and use Web search results retrieved by the query. This step allows one to use the Web as a repository of relevant query-specific knowledge. We also go beyond the conventional bag of words indexing, and construct additional features using a large external taxonomy and a lexicon of named entities obtained by analyzing the entire Web as a corpus. Computational advertising poses numerous challenges and open research problems in text summarization, natural language generation, named entity recognition, computer-human interaction, and others. Part of the tutorial will be devoted to recent research results as well as open problems, such as automatically classifying cases when no ads should be shown, and using natural language generation to automatically create advertising campaigns.

 

Speakers:

 

Andrei Broder - Yahoo! Research

Dr. Andrei Broder is a Yahoo! Research Fellow and Vice President for Computational Advertising. Previously he was an IBM Distinguished Engineer and the CTO of the Institute for Search and Text Analysis in IBM Research. From 1999 until early 2002 he was Vice President for Research and Chief Scientist at the AltaVista Company. Before that he has been a senior member of the research staff at Compaq's Systems Research Center in Palo Alto. He was graduated Summa cum Laude from the Technion - Israel Institute of Technology, and obtained his M.Sc. and Ph.D. in Computer Science at Stanford University under Don Knuth. He has published over a hundred papers and was awarded twenty patents. He is a fellow of ACM and IEEE, and served as a chair of the IEEE Technical Committee on Mathematical Foundations of Computing.  http://research.yahoo.com/Andrei_Broder

 

Evgeniy Gabrilovich - Yahoo! Research

Dr. Evgeniy Gabrilovich is a Senior Research Scientist and Manager of the NLP & IR Group at Yahoo! Research. His research interests include information retrieval, machine learning, and computational linguistics. Evgeniy is a recipient of the 2010 Karen Sparck Jones Award for his contributions to natural language processing and information retrieval. He served as a Senior PC member or Area Chair at SIGIR, AAAI, IJCAI, WWW, WSDM, EMNLP, ICDM, and ICWSM, and served on the program committees of virtually every major conference in the field. He organized a number of workshops and taught multiple tutorials at SIGIR, ACL, IJCAI, AAAI, CIKM, and EC. Evgeniy earned his MSc and PhD degrees in Computer Science from the Technion - Israel Institute of Technology.  http://research.yahoo.com/Evgeniy_Gabrilovich

 

Vanja Josifovski - Yahoo! Research

Dr. Vanja Josifovski is a Principal Research Scientist at Yahoo! Research, where he works on search and advertisement technologies for the Internet. He is currently exploring designs for the next generation ad placement platforms for textual and behavioral advertising. Previously, Vanja was a Research Staff Member at the IBM Almaden Research Center working on several projects in database runtime and optimization, federated databases, and enterprise. He earned his MSc degree from the University of Florida at Gainesville and his PhD from the Linkoping University in Sweden. Vanja has published over 50 peer reviewed publications, authored around 40 patent applications, and was on the program committees of WWW, SIGIR, ICDE, VLDB, CIKM, ICDM, KDD and other major conferences in the database, information retrieval, and search areas.

http://research.yahoo.com/Vanja_Josifovski