Data Exploration and Privacy Preservation Over Hidden Web Databases

Nan Zhang
George Washington University
Location: 
Wachman 447
Date: 
Wednesday, February 27, 2013 - 11:00

A large number of online databases are hidden in the "deep web" and only accessible through restrictive search or browsing web interfaces. We consider third-party data analytics over these hidden databases, specifically the problems of crawling, sampling, and aggregate estimations. We also explain how the recent advancements of suchdata analytics techniques pose significant privacy threats to certain sensitive aggregate information over hidden databases. The protection of sensitive aggregates stands in sharp contrast to the traditional privacy problem where individual tuples must be protected while ensuring access to aggregating information. We propose privacy-preserving techniques to suppress the inference of aggregate information from hidden databases.