Differential privacy.
Though not an entirely new concept, the phrase became part of the common lexicon earlier this week when Apple revealed it would implement the feature starting with iOS 10. But what is it? And what does it mean for users?
According to cryptographer and Johns Hopkins University professor Matthew Green, differential privacy is a term originally developed by Cynthia Dwork and Frank Mcsherry of Microsoft, Kobbi Nissim of Ben-Gurion University and Adam Smith of the Weizmann Institute of Science that defines a type of privacy protection that randomizes data without impacting the dataset. That is, a dataset covered by differential privacy measures will protect individual identities but still yield the same outputs or conclusions as a dataset that includes identifying information.
As Green put it in a Tuesday blog post, “Imagine you have two otherwise identical databases, one with your information in it, and one without it. Differential Privacy ensures that the probability that a statistical query will produce a given result is (nearly) the same whether it’s conducted on the first or second database.”
Ok, but how does it work?
In his post, Green noted there are several ways to achieve differential privacy. While Apple hasn’t been entirely forthcoming about the details of its new data collection system, it did give some hints.
During his address at Apple’s Worldwide Developers Conference on Monday, Apple senior vice president of software engineering Craig Federighi said Apple is planning to achieve differential privacy using three different methods: Hashing, subsampling and noise injection.
As explained by Wired, hashing is a one-way form of encryption that permanently scrambles data into a cipher. Instead of allowing data to be deciphered using a key as with normal forms of encryption, hashing checks new hashed inputs to see if they match the previous hash. This system avoids the need to store the original sensitive information by substituting it with the hash.
Like hashing, noise injection obscures sensitive data, but does so by adding random data points to prevent reverse engineering or cross-referencing of the data. And finally, as its name implies, subsampling looks at just a small sample of data rather than the entire dataset.
Through the use of differential privacy, Apple said it is aiming to “discover the usage patterns of a large number of users without compromising individual privacy.” As it relates to iOS 10 in particular, the technology will be used to help improve QuickType and emoji suggestions, Spotlight deep link suggestions and Lookup Hints in Notes, Apple said.
Is it really safe?
Some professionals, like Green, have “mixed feelings” about Apple’s use of differential privacy. On Twitter, Green expressed dismay that Apple seems to have skipped a step in moving to widespread deployment of differential privacy without smaller test implementations.
“If Apple is going to collect significant amounts of new data from the devices that we depend on so much, we should really make sure they’re doing it right — rather than cheering them for Using Such Cool Ideas,” Green wrote on his blog.
However, Aaron Roth, who co-authored a book on differential privacy with Dwork, expressed more faith in Apple’s approach. Though he declined to go into specifics, Roth told Wired “I think they’re doing it right.”