Many online users have encountered situations where companies collect data with the promised that it is safe because the data has been anonymized -- all personally-identifiable data elements have been removed. How safe is this really? A recent study reinforced the findings that it isn't as safe as promised. Anonymized data can be de-anonymized = re-identified to individual persons.
"... data can be deanonymised in a number of ways. In 2008, an anonymised Netflix data set of film ratings was deanonymised by comparing the ratings with public scores on the IMDb film website in 2014; the home addresses of New York taxi drivers were uncovered from an anonymous data set of individual trips in the city; and an attempt by Australia’s health department to offer anonymous medical billing data could be reidentified by cross-referencing “mundane facts” such as the year of birth for older mothers and their children, or for mothers with many children. Now researchers from Belgium’s Université catholique de Louvain (UCLouvain) and Imperial College London have built a model to estimate how easy it would be to deanonymise any arbitrary dataset. A dataset with 15 demographic attributes, for instance, “would render 99.98% of people in Massachusetts unique”. And for smaller populations, it gets easier..."
"Many commonly used anonymization techniques, however, originated in the 1990s, before the Internet’s rapid development made it possible to collect such an enormous amount of detail about things such as an individual’s health, finances, and shopping and browsing habits. This discrepancy has made it relatively easy to connect an anonymous line of data to a specific person: if a private detective is searching for someone in New York City and knows the subject is male, is 30 to 35 years old and has diabetes, the sleuth would not be able to deduce the man’s name—but could likely do so quite easily if he or she also knows the target’s birthday, number of children, zip code, employer and car model."
Data brokers, including credit-reporting agencies, have collected a massive number of demographic data attributes about every persons. According to this 2018 report, Acxiom has compiled about 5,000 data elements for each of 700 million persons worldwide.
It's reasonable to assume that credit-reporting agencies and other data brokers have similar capabilities. So, data brokers' massive databases can make it relatively easy to re-identify data that was supposedly been anonymized. This means consumers don't have the privacy promised.
What's the solution? Researchers suggest that data brokers must develop new anonymization methods, and rigorously test them to ensure anonymization truly works. And data brokers must be held to higher data security standards.
Any legislation serious about protecting consumers' privacy must address this, too. What do you think?