Deanonymizing data

Anonymity is a big deal these days. And it should be, because the proliferation of personal computing devices with multiple radio interfaces places individual privacy in question. Consider the problem of de-anonymizing the Netflix database released for the Netflix prize project. A recent paper from U. Texas showed that the records released as part of the database were not anonymous at all and given a little bit of side information, allowed easy identification of individual records in a simple way.

It is clear that some information must be removed from a database or a set of trajectories in order to prevent re-identification. But WHAT part of the information to remove is not so clear! I am pretty sure information theory must have something useful to say about this problem. In particular, the Information Bottleneck Method of Tishby et al might be useful place to look for answers. Pending job for the summer.