Achieving Privacy and Utility

 

 

The Best of Both Worlds: Achieving Privacy and Utility

Reaping the benefits of a data-rich world without sacrificing our privacy.

Summary

Every time we make a purchase, walk through a public place, use our cellphones, or go to the doctor, someone records data on our activities. This collection of data holds promise but also poses a threat. Taking medical data as an example, researchers can utilize large collections of medical data to discover new treatments, yet patients do not want their medical records made public. Theoretical computer scientists have methods to achieve both goals of utility and privacy simultaneously.

One powerful approach to this problem is given by secure multi-party computation, which allows users that hold different sets of secret data to collaborate and analyze all their data together, in such a way that no user learns anything about anyone else’s secret data except for whatever is revealed by the output of the analysis. Remarkably, theoretical computer scientists have shown that, in principle, any desired analysis can be done while maintaining this maximal level of privacy, in natural but somewhat simplified models of interaction between participants. Ongoing research is developing new techniques to allow such analyses to be efficiently carried out in a much richer and more realistic variety of software and networked settings.

But what analyses should be permitted? Even in a setting in which the data are all held, unencrypted, by a trusted curator, how can the curator provide useful information, while preserving the privacy of the individual data items? To appreciate the problem, consider a “differencing attack,” in which a researcher makes two queries to the curator: How many people are HIV positive? How many people whose name is not John Q. Public are HIV positive? Exact answers to these two queries reveals the HIV status of Mr. Public. Scientists are beginning to gain an understanding of what “privacy” should mean in this setting, and to develop techniques to provide accurate, privacy-preserving answers when theoretically possible.

Rationale

Encryption secures our stored data but seems to make it inert: Can we analyze encrypted data without having to decrypt it first? Cryptographers are developing a number of techniques to accomplish this goal in different settings, leading to a wide variety of applications. For instance, scientists have begun developing techniques to write software that incorporates secret information, in such a way that the secrets are kept hidden even from the holder of the software, with applications ranging from digital rights management to homeland security. This work is closely related to one of the most surprising achievements of theoretical computer science — the development of secure multi-party computation protocols, which allow users that hold different sets of secret data to collaborate and analyze all their data together, in such a way that no user learns anything about anyone else’s secret data except for whatever is revealed by the output of the analysis. Such tools could be invaluable in allowing medical researchers to make use of the vast medical datasets held by different hospitals. For instance, researchers doing AIDS research would be able to calculate population statistics on AIDS without identifying the private individuals with the disease.

The HIV scenario mentioned in the summary highlights a weakness in the privacy guarantee offered by secure multi-party computation: the only promise is that nothing is revealed about an individual beyond what is revealed by the outcome of the computation — it does not speak to the question of which computations, or set of computations, can safely be carried out. Put differently, in multi-party computation functionality is paramount and privacy is only as good as the functionality permits. Privacy researchers are now asking a different question: when privacy is paramount, what functionality can be achieved? This question even makes sense in a setting in which the data are all held, unencrypted, by a trusted curator. How can the curator provide useful information, while preserving the privacy of the individual data items?

Contributors and Credits

Cynthia Dwork, Kristin Yvonne Rozier, Amit Sahai, Salil Vadhan

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: