Building Search Engines without Human Judgments: Distant Supervision and Behavioral Data

Jiepu Jiang
Ph.D. Candidate, Department of Computer Science
University of Massachusetts Amherst
SERC 306
Monday, February 26, 2018 - 11:00
Search engines often require large-scale human annotation to improve and evaluate their services. In this talk, we introduce research towards reducing such human labor requirements, especially methods without using private user data. We first present a method called similarity-based distant supervision, which automatically generates text retrieval training labels in a new environment using only open data sources such as Wikipedia and existing test collections. The method can also be refined substantially with the help of only a few human judgments, which by themselves are insufficient to train search engines directly. Second, we also introduce methods for evaluating user experience with a new search engine service --- intelligent voice assistant --- without asking users' explicit feedback. The techniques predict user experience based on behavioral data and user modeling. The methods can diagnose the goodness of a running system instantly without interrupting users and shows excellent generalizability to new, unseen intelligent assistant tasks comparing to offline evaluation methods such as counting precision and recall.