ECIR 2013 — Tutorials

All tutorials will take place on 24th March 2013 at Yandex Headquarters. The timetable is the same for all workshops and tutorials and can be found here.

ECIR accepted the following tutorials.

Gerard de Melo (UC Berkeley, USA), Katja Hose (Aalborg University, Denmark)
Searching the Web of Data
10:00 – 13:30
Search is currently undergoing a major paradigm shift away from the traditional document-centric "10 blue links" towards more explicit and actionable information. Users expect the system to "understand" the user's information need and respond to it more directly instead of only serving documents matching the given set of keywords. Recent advances in this area are Google Knowledge Graph, Virtual Personal Assistants such as Siri and Google Now, as well as the now ubiquitous entity-oriented vertical search results for places, products, etc. Apart from novel query understanding methods, these developments are largely driven by structured data that is blended into the Web Search experience. Structured data can be obtained from a wide variety of documents and Web sources by tapping on information extraction and semantic markup like microformats. Additionally, the Web already offers publicly accessible knowledge bases, such as DBpedia, Yago, Freebase and the Linked Open Data cloud, as also used in Google's Knowledge Graph. Providing vast amounts of information about many different types of entities, these data sets can become very large, so sophisticated techniques for organizing and querying them are required. We discuss efficient indexing and query processing techniques to tackle these challenges. Finally, we present query interpretation and understanding methods to map user queries to these structured data sources, also highlighting the recent trend of virtual personal assistants like Siri.

and (University of Lugano, Switzerland)
Distributed Information Retrieval and Applications
10:00 – 13:30
Distributed Information Retrieval (DIR) is a generic area of research that brings together techniques, such as resource selection and results aggregation, dealing with data that, for organizational or technical reasons, cannot be managed centrally. Existing and potential applications of DIR methods vary from blog retrieval to aggregated search and from multimedia and multilingual retrieval to distributed Web search. In the first part of the tutorial we will briefly discuss the main DIR phases, that are resource description, resource selection, results merging and results presentation. In particular, the attendees will get familiar with the ways of building high level descriptions of searchable collections. The large and the small document approaches to resource selection will be presented as well as the classification-based approach. The main score normalization and results merging techniques will also be discussed. The first part of the tutorial will be concluded by discussing the ways of presenting search results, coming from multiple sources, to a user. The second and the main part of the tutorial will be dedicated to various applications of DIR methods. In particular, we will discuss blog, expert and desktop search as special instances of the resource selection problem. We will then talk about the rapidly developing area of aggregated search, discussing such problems as vertical selection and results aggregation. Other applications, such as multilingual and multimedia retrieval, personal meta-search and aggregated Web search, will also be mentioned. We will conclude our tutorial by presenting potential applications of DIR techniques, such as distributed Web search, enterprise search and aggregated mobile search.

Filip Radlinski (Microsoft Research, UK), Katja Hofmann (University of Amsterdam, the Netherlands)
Practical Online Retrieval Evaluation
15:00 – 19:00
Online evaluation is an evaluation technique that allows techniques developed in the information retrieval community to be assessed based on how real users actually respond to improvements made. Because this technique is directly based on observed user behavior, it is a promising alternative to traditional offline evaluation, which is based on manual relevance assessments, especially in settings where reliable assessments are difficult to obtain (e.g., personalized search) or expen- sive (e.g., search by trained experts in specialized collections). Despite its advantages, and its successful use in commercial set- tings, online evaluation is rarely employed outside of large commercial search engines due to a perception that it is impractical at small scales. The goal of this tutorial is to show how online evaluations can be con- ducted in such settings, demonstrate software to facilitate its use, and promote further research in the area. We will also contrast online eval- uation with standard offline evaluation, and provide an overview of online approaches.

Marie-Francine Moens and Ivan Vulic (University of Leuven, Belgium)
Cross-Lingual Probabilistic Topic Modeling and its Applications in Information Retrieval
15:00 – 19:00
Cross-lingual topic models are a fairly novel group of unsupervised, language-independent and generative machine learning models that can be effectively trained on a large-volume of non-parallel, comparable multilingual data (e.g., multilingual Wikipedia or news data discussing the same events). They offer an elegant way to represent content across different languages. Their probabilistic framework allows for their easy integration into a language modeling framework for cross-lingual information retrieval. The half-day tutorial will give an overview of recent advances in cross-lingual topic modeling and retrieval. It includes: (1) A high-level overview of the key intuitions and assumptions behind topic modeling in general and cross-lingual topic modeling in specific; (2) The methodology and mathematical foundations; and (3) The application of these models in various cross-lingual tasks, with a special focus on cross-lingual information retrieval models. The tutorial first introduces the concept of probabilistic topic modeling, starting from monolingual contexts, where we introduce the key intuitions and describe the most prominent monolingual models such as probabilistic semantic analysis (pLSA) and latent Dirichlet allocation (LDA). We then present a representative cross-lingual topic model called bilingual LDA (BiLDA). We explain its generative story, its training techniques (variational inference and Gibbs sampling) and its inference procedure on unseen text documents. Finally, an important part of the tutorial focuses on the applications of the cross-lingual topic models, where the emphasis is on cross-lingual retrieval models. We also present how to use the knowledge from the models for the tasks of cross-lingual event clustering, cross-lingual document classification and cross-lingual semantic similarity of words.