Wednesday, July 24, 2019

Your Data Were ‘Anonymized’? These Scientists Can Still Identify You | NYT

For details, see Estimating the success of re-identifications in incomplete datasets using generative models | Nature Communications; You’re very easy to track down, even when your data has been anonymized | MIT Technology Review notes "On average, in the US, using those three records [zip code, gender, and date of birth], you could be correctly located in an “anonymized” database 81% of the time."
"Even anonymized data sets often include scores of so-called attributes — characteristics about an individual or household. Anonymized consumer data sold by Experian, the credit bureau, to Alteryx, a marketing firm, included 120 million Americans and 248 attributes per household.

Scientists at Imperial College London and Université Catholique de Louvain, in Belgium, reported in the journal Nature Communications that they had devised a computer algorithm that can identify 99.98 percent of Americans from almost any available data set with as few as 15 attributes, such as gender, ZIP code or marital status.

Even more surprising, the scientists posted their software code online for anyone to use. That decision was difficult, said Yves-Alexandre de Montjoye, a computer scientist at Imperial College London and lead author of the new paper."
