39 posts categorized "Data Brokers" Feed

The National Auto Surveillance Database You Haven't Heard About Has Plenty Of Privacy Issues

Some consumers have heard of Automated License Plate Recognition (ALPR) cameras, the high-speed, computer-controlled technology that automatically reads and records vehicle license plates. Local governments have installed ALPR cameras on stationary objects such as street-light poles, traffic lights, overpasses, highway exit ramps, and electronic toll collection (ETC).

Mobile ALPR cameras have been installed on police cars and/or police surveillance vans. The Houston Police Department explained in this 2016 video how it uses the technology. Last year, a blog post discussed ALPR usage in San Diego and its data-sharing with Vigilant Solutions.

What you probably don't know: the auto repossession industry also uses the technology. Many "repo men" have ALPR cameras installed on their vehicles. The data they collect is fed into a massive, nationwide, and privately-owned database which archives license-plate images. Reporters at Motherboard obtained a private demo of the database tool to understand its capabilities.

Digital Recognition Network logo The demo included tracking a license plate with the vehicle owner's consent. Vice reported:

"This tool, called Digital Recognition Network (DRN), is not run by a government, although law enforcement can also access it. Instead, DRN is a private surveillance system crowdsourced by hundreds of repo men who have installed cameras that passively scan, capture, and upload the license plates of every car they drive by to DRN's database. DRN stretches coast to coast and is available to private individuals and companies focused on tracking and locating people or vehicles. The tool is made by a company that is also called Digital Recognition Network... DRN has more than 600 of these "affiliates" collecting data, according to the contract. These affiliates are paid a monthly bonus for gathering the data..."

ALPR financing image from DRN site on September 20, 2019. Click to view larger version Affiliates are rep men and others, who both use the database tool and upload images to it. DRN even offers financing to help affiliates buy ALPR cameras. The image on the right was taken from the site on September 20, 2019.

When consumers fail to pay their bills, lenders and insurance companies have valid needs to retrieve ( or repossess) their unpaid assets. Lenders hire repo men, who then use the DRN database to find vehicles they've been hired to repossess. Those applications are valid, but there are plenty of privacy issues and opportunity for abuse.

Plenty.

First, the data collection is indiscriminate and broad. As repo men (and women) drive through cities and towns to retrieve wanted vehicles, the ALPR cameras mounted on their cars scan all nearby vehicles: both moving and parked vehicles. Scans are not limited solely to vehicles they've been hired to repossess, nor to vehicles of known/suspected criminals. So, innocent consumers are caught in the massive data collection. According to Vice:

"... in fact, the vast majority of vehicles captured are connected to innocent people. DRN claims to have more than 9 billion license plate scans, according to a DRN contract obtained by Motherboard..."

Second, the data is archived forever. That can provide a very detailed history of a vehicle's (or a person's) movements:

"The results popped up: dozens of sightings, spanning years. The system could see photos of the car parked outside the owner's house; the car in another state as its driver went to visit family; and the car parked in other spots in the owner's city... Some showed the car's location as recently as a few weeks before."

Third, to facilitate searches metadata is automatically attached to the images: GPS or geolocation, date, time, day of week, and more. The metadata helps provide a pretty detailed history of each vehicle's -- or person's -- movements: where and when a vehicle ( or person) travels, patterns such as which days of the week certain locations are visited, and how long the vehicle (or person) parked at specific locations. Vice explained:

"The data is easy to query, according to a DRN training video obtained by Motherboard. The system adds a "tag" to each result, categorising what sort of location the vehicle was likely spotted at, such as "workplace" or "home."

So, DRN can help users to associate specific addresses (work, home, school, doctors, etc.) with specific vehicles. How accurate might this be? While that might help repo men and insurance companies spot fraud via out-of-state registered vehicles whose owners are trying to avoid detection and/or higher premiums, it raises other concerns.

Fourth, consumers -- vehicle owners -- have no control over the data describing them. Vehicle owners cannot opt out of the data collection. Vehicle owners cannot review nor correct any errors in their DRN profiles.

That sounds out of control to me.

The persons which the archived data directly describes have no say. None. That's a huge concern.

Also, I wonder about single females -- victims of domestic violence -- who have protective orders for their safety. Some states, such as Massachusetts, have Address Confidentiality Programs (ACPs) to protect victims of domestic violence, sexual assault, and stalkers. Does DRN accommodate ACP programs? And if so, how? And if not, why not? How does DRN prevent perps from using its database tool? (Yes, DRN access is an issue. Keep reading.) The Vice report didn't say. Hopefully, future reporting will discuss this.

Fifth, DRN is robust. It can be used to track vehicles near or in real time:

"DRN charges $20 to look up a license plate, or $70 for a "live alert", according to the contract. With a live alert, a user can enter a license plate they wish to receive updates on; when the DRN system spots the vehicle, it'll send an email to the user with the newly discovered location."

That makes DRN highly appealing to both valid users (e.g., police, repo men, insurance companies, private investigators) and bad actors posing as valid users. Who might those bad actors be? The Electronic Frontier Foundation (EFF) warned:

"Taken in the aggregate, ALPR data can paint an intimate portrait of a driver’s life and even chill First Amendment protected activity. ALPR technology can be used to target drivers who visit sensitive places such as health centers, immigration clinics, gun shops, union halls, protests, or centers of religious worship."

Sixth, is the problem of access. Anybody can use DRN. According to Vice:

"... a private investigator, or a repo man, or an insurance company does not need a warrant to search for someone's movements over years; they just need to pay to access the DRN system, or find someone willing to share or leverage their access..."

Users simply need to comply with DRN's policies. The company says that, a) users can use its database tool only for certain applications, and b) its contract prohibits users from sharing search results with third parties. We consumers have only DRN's word and assurances that it enforces its policies; and that users comply. As we have seen with Facebook data breaches, it is easy for bad actors to pose as valid users in order to doo end runs around such policies.

What are your opinions of ALPR cameras and DRN?


Study: Anonymized Data Can Not Be Totally Anonymous. And 'Homomorphic Encryption' Explained

Many online users have encountered situations where companies collect data with the promised that it is safe because the data has been anonymized -- all personally-identifiable data elements have been removed. How safe is this really? A recent study reinforced the findings that it isn't as safe as promised. Anonymized data can be de-anonymized = re-identified to individual persons.

The Guardian UK reported:

"... data can be deanonymised in a number of ways. In 2008, an anonymised Netflix data set of film ratings was deanonymised by comparing the ratings with public scores on the IMDb film website in 2014; the home addresses of New York taxi drivers were uncovered from an anonymous data set of individual trips in the city; and an attempt by Australia’s health department to offer anonymous medical billing data could be reidentified by cross-referencing “mundane facts” such as the year of birth for older mothers and their children, or for mothers with many children. Now researchers from Belgium’s Université catholique de Louvain (UCLouvain) and Imperial College London have built a model to estimate how easy it would be to deanonymise any arbitrary dataset. A dataset with 15 demographic attributes, for instance, “would render 99.98% of people in Massachusetts unique”. And for smaller populations, it gets easier..."

According to the U.S. Census Bureau, the population of Massachusetts was abut 6.9 million on July 1, 2018. How did this de-anonymization problem happen? Scientific American explained:

"Many commonly used anonymization techniques, however, originated in the 1990s, before the Internet’s rapid development made it possible to collect such an enormous amount of detail about things such as an individual’s health, finances, and shopping and browsing habits. This discrepancy has made it relatively easy to connect an anonymous line of data to a specific person: if a private detective is searching for someone in New York City and knows the subject is male, is 30 to 35 years old and has diabetes, the sleuth would not be able to deduce the man’s name—but could likely do so quite easily if he or she also knows the target’s birthday, number of children, zip code, employer and car model."

Data brokers, including credit-reporting agencies, have collected a massive number of demographic data attributes about every persons. According to this 2018 report, Acxiom has compiled about 5,000 data elements for each of 700 million persons worldwide.

It's reasonable to assume that credit-reporting agencies and other data brokers have similar capabilities. So, data brokers' massive databases can make it relatively easy to re-identify data that was supposedly been anonymized. This means consumers don't have the privacy promised.

What's the solution? Researchers suggest that data brokers must develop new anonymization methods, and rigorously test them to ensure anonymization truly works. And data brokers must be held to higher data security standards.

Any legislation serious about protecting consumers' privacy must address this, too. What do you think?


EFF Filed Lawsuit In California Against AT&T To Stop Sales Of Wireless Customers' Realtime Geolocations

The Electronic Frontier Foundation (EFF) announced on July 16th that it had filed:

"... a class action lawsuit on behalf of AT&T customers in California to stop the telecom giant and two data location aggregators from allowing numerous entities—including bounty hunters, car dealerships, landlords, and stalkers—to access wireless customers’ real-time locations without authorization. An investigation by Motherboard earlier this year revealed that any cellphone user’s precise, real-time location could be bought for just $300. The report showed that carriers, including AT&T, were making this data available to hundreds of third parties without first verifying that users had authorized such access. AT&T not only failed to obtain its customers’ express consent, making matters worse, it created an active marketplace that trades on its customers’ real-time location data..."

The lawsuit, Scott, et al. v. AT&T Inc., et al., was filed in the U.S. District Court of the Northern District of California. The suit seeks money damages and an injunction against AT&T and the named location data aggregators: LocationSmart and Zumigo. The suit alleges AT&T violated the Federal Communications Act and engaged in deceptive practices under California’s unfair competition law. It also alleges that AT&T, LocationSmart, and Zumigo have violated California’s constitutional, statutory, and common law rights to privacy. The EFF is represented by Pierce Bainbridge Beck Price & Hecht LLP.


New Vermont Law Regulating Data Brokers Drives 120 Businesses From The Shadows

In May of 2018, Vermont was the first (and only) state in the nation to enact a law regulating data brokers. According to the Vermont Secretary of State, a data broker is defined as:

"... a business, or unit or units of a business, separately or together, that knowingly collects and sells or licenses to third parties the brokered personal information of a consumer with whom the business does not have a direct relationship."

The Vermont Secretary of State's website contains links to the new law and more. This new law is important for several reasons. First, many businesses operate as data brokers. Second, consumers historically haven't known who has information about them, nor how to review their profiles for accuracy. Third,  consumers haven't been able to opt out of the data collection. Fourth, if you don't know who the data brokers are, then you can't hold them accountable if they fail with data security. According to Vermont law:

"2447. Data broker duty to protect information; standards; technical requirements (a) Duty to protect personally identifiable information. (1) A data broker shall develop, implement, and maintain a comprehensive information security program that is written in one or more readily accessible parts and contains administrative, technical, and physical safeguards that are appropriate... identification and assessment of reasonably foreseeable internal and external risks to the security, confidentiality, and integrity of any electronic, paper, or other records containing personally identifiable information, and a process for evaluating and improving, where necessary, the effectiveness of the current safeguards for limiting such risks... taking reasonable steps to select and retain third-party service providers that are capable of maintaining appropriate security measures to protect personally identifiable information consistent with applicable law; and (B) requiring third-party service providers by contract to implement and maintain appropriate security measures for personally identifiable information..."

Before this law, there was little to no oversight, no regulation, and no responsibility for data brokers to adequately protect sensitive data about consumers. A federal bill proposed in 2014 went nowhere in the U.S. Senate. You can assume that many data brokers operate in your state, too, since there's plenty of money to be made in the industry.

Portions of the new Vermont law went into effect in May, and the remainder went into effect on January 1, 2019. What has happened since then? Fast Company reported:

"So far, 121 companies have registered, according to data from the Vermont secretary of state’s office... The list of active companies includes divisions of the consumer data giant Experian, online people search engines like Spokeo and Spy Dialer, and a variety of lesser-known organizations that do everything from help landlords research potential tenants to deliver marketing leads to the insurance industry..."

The Fast Company site lists the 120 (so far) registered data brokers in Vermont. Regular readers of this blog will recognize some of the data brokers by name, since prior posts covered Acxiom, Equifax, Experian, LexisNexis, the NCTUE, Oracle, Spokeo, TransUnion, and others. (Yes, both credit reporting agencies and social media firms also operate as data brokers. Some states do it, too.) Reportedly, many privacy advocates support the new law:

"There’s companies that I’ve never heard of before," says Zachary Tomanelli, communications and technology director at the Vermont Public Interest Research Group, which supported the law. "It’s often very cumbersome [for consumers] to know where the places are that you have to go, and how you opt out."

Predictably, the industry has opposed (and continues to oppose) the legislation:

"A coalition of industry groups like the Internet Association, the Association of National Advertisers, and the National Association of Professional Background Screeners, as well as now registered data brokers such as Experian, Acxiom, and IHS Markit, said the law was unnecessary... Requiring companies to disclose breaches of largely public data could be burdensome for businesses and needlessly alarming for consumers, they argue... Other companies, like Axciom, have complained that the law establishes inconsistent boundaries around personal data used by third parties, and the first-party data used by companies like Facebook and Google."

So, no companies want consumers to own and control the data -- property -- that describes them. Real property laws matter. To learn more, read about data brokers at the Privacy Rights Clearinghouse site. Related posts in the Data Brokers section of this blog:

Kudos to Vermont lawmakers for ensuring more disclosures and transparency from the industry. Readers may ask their elected officials why their state has not taken similar action. What are your opinions of the new Vermont law?


After Promises To Stop, Mobile Providers Continued Sales Of Location Data About Consumers. What You Can Do To Protect Your Privacy

Sadly, history repeats itself. First, the history: after getting caught selling consumers' real-time GPS location data without notice nor consumers' consent, in 2018 mobile providers promised to stop the practice. The Ars Technica blog reported in June, 2018:

"Verizon and AT&T have promised to stop selling their mobile customers' location information to third-party data brokers following a security problem that leaked the real-time location of US cell phone users. Senator Ron Wyden (D-Ore.) recently urged all four major carriers to stop the practice, and today he published responses he received from Verizon, AT&T, T-Mobile USA, and SprintWyden's statement praised Verizon for "taking quick action to protect its customers' privacy and security," but he criticized the other carriers for not making the same promise... AT&T changed its stance shortly after Wyden's statement... Senator Wyden recognized AT&T's change on Twitter and called on T-Mobile and Sprint to follow suit."

Kudos to Senator Wyden. The other mobile providers soon complied... sort of.

Second, some background: real-time location data is very valuable stuff. It indicates where you are as you (with your phone or other mobile devices) move about the physical world in your daily routine. No delays. No lag. Yes, there are appropriate uses for real-time GPS location data -- such as by law enforcement to quickly find a kidnapped person or child before further harm happens. But, do any and all advertisers need real-time location data about consumers? Data brokers? Others?

I think not. Domestic violence and stalking victims probably would not want their, nor their children's, real-time location data resold publicly. Most parents would not want their children's location data resold publicly. Most patients probably would not want their location data broadcast every time they visit their physician, specialist, rehab, or a hospital. Corporate executives, government officials, and attorneys conducting sensitive negotiations probably wouldn't want their location data collected and resold, either.

So, most consumers probably don't want their real-time location data resold publicly. Well, some of you make location-specific announcements via posts on social media. That's your choice, but I conclude that most people don't. Consumers want control over their location information so they can decide if, when, and with whom to share it. The mass collection and sales of consumers' real-time location data by mobile providers prevents choice -- and it violates persons' privacy.

Third, fast forward seven months from 2018. TechCrunch reported on January 9th:

"... new reporting by Motherboard shows that while [reseller] LocationSmart faced the brunt of the criticism [in 2018], few focused on the other big player in the location-tracking business, Zumigo. A payment of $300 and a phone number was enough for a bounty hunter to track down the participating reporter by obtaining his location using Zumigo’s location data, which was continuing to pay for access from most of the carriers. Worse, Zumigo sold that data on — like LocationSmart did with Securus — to other companies, like Microbilt, a Georgia-based credit reporting company, which in turn sells that data on to other firms that want that data. In this case, it was a bail bond company, whose bounty hunter was paid by Motherboard to track down the reporter — with his permission."

"Everyone seemed to drop the ball. Microbilt said the bounty hunter shouldn’t have used the location data to track the Motherboard reporter. Zumigo said it didn’t mind location data ending up in the hands of the bounty hunter, but still cut Microbilt’s access. But nobody quite dropped the ball like the carriers, which said they would not to share location data again."

The TechCrunch article rightly held offending mobile providers accountable. Example: T-Mobile's chief executive tweeted last year:

Then, Legere tweeted last week:

The right way? In my view, real-time location never should have been collected and resold. Almost a year after reports first surfaced, T-Mobile is finally getting around to stopping the practice and terminating its relationships with location data resellers -- two months from now. Why not announce this slow wind-down last year when the issue first surfaced? "Emergency assistance" is the reason we are supposed to believe. Yeah, right.

The TechCrunch article rightly took AT&T and Verizon to task, too. Good. I strongly encourage everyone to read the entire TechCrunch article.

What can consumers make of this? There seem to be several takeaways:

  1. Transparency is needed, since corporate privacy policies don't list all (or often any) business partners. This lack of transparency provides an easy way for mobile providers to resume location data sales without notice to anyone and without consumers' consent,
  2. Corporate executives will say anything in tweets/social media. A healthy dose of skepticism by consumers and regulators is wise,
  3. Consumers can't trust mobile providers. They are happy to make money selling consumers' real-time location data, regardless of consumers' desires not for our data to be collected and sold,
  4. Data brokers and credit reporting agencies want consumers' location data,
  5. To ensure privacy, consumers also must take action: adjust the privacy settings on your phones to limit or deny mobile apps access to your location data. I did. It's not hard. Do it today, and
  6. Oversight is needed, since a) mobile providers have, at best, sloppy to minimal oversight and internal processes to prevent location data sales; and b) data brokers and others are readily available to enable and facilitate location data transactions.

I cannot over-emphasize #5 above. What issues or takeaways do you see? What are your opinions about real-time location data?


Some Surprising Facts About Facebook And Its Users

Facebook logo The Pew Research Center announced findings from its latest survey of social media users:

  • About two-thirds (68%) of adults in the United States use Facebook. That is unchanged from April 2016, but up from 54% in August 2012. Only Youtube gets more adult usage (73%).
  • About three-quarters (74%) of adult Facebook users visit the site at least once a day. That's higher than Snapchat (63%) and Instagram (60%).
  • Facebook is popular across all demographic groups in the United States: 74% of women use it, as do 62% of men, 81% of persons ages 18 to 29, and 41% of persons ages 65 and older.
  • Usage by teenagers has fallen to 51% (at March/April 2018) from 71% during 2014 to 2015. More teens use other social media services: YouTube (85%), Instagram (72%) and Snapchat (69%).
  • 43% of adults use Facebook as a news source. That is higher than other social media services: YouTube (21%), Twitter (12%), Instagram (8%), and LinkedIn (6%). More women (61%) use Facebook as a news source than men (39%). More whites (62%) use Facebook as a news source than nonwhites (37%).
  • 54% of adult users said they adjusted their privacy settings during the past 12 months. 42% said they have taken a break from checking the platform for several weeks or more. 26% said they have deleted the app from their phone during the past year.

Perhaps, the most troubling finding:

"Many adult Facebook users in the U.S. lack a clear understanding of how the platform’s news feed works, according to the May and June survey. Around half of these users (53%) say they do not understand why certain posts are included in their news feed and others are not, including 20% who say they do not understand this at all."

Facebook users should know that the service does not display in their news feed all posts by their friends and groups. Facebook's proprietary algorithm -- called its "secret sauce" by some -- displays items it thinks users will engage with = click the "Like" or other emotion buttons. This makes Facebook a terrible news source, since it doesn't display all news -- only the news you (probably already) agree with.

That's like living life in an online bubble. Sadly, there is more.

If you haven't watched it, PBS has broadcast a two-part documentary titled, "The Facebook Dilemma" (see trailer below), which arguable could have been titled, "the dark side of sharing." The Frontline documentary rightly discusses Facebook's approaches to news, privacy, its focus upon growth via advertising revenues, how various groups have used the service as a weapon, and Facebook's extensive data collection about everyone.

Yes, everyone. Obviously, Facebook collects data about its users. The service also collects data about nonusers in what the industry calls "shadow profiles." CNet explained that during an April:

"... hearing before the House Energy and Commerce Committee, the Facebook CEO confirmed the company collects information on nonusers. "In general, we collect data of people who have not signed up for Facebook for security purposes," he said... That data comes from a range of sources, said Nate Cardozo, senior staff attorney at the Electronic Frontier Foundation. That includes brokers who sell customer information that you gave to other businesses, as well as web browsing data sent to Facebook when you "like" content or make a purchase on a page outside of the social network. It also includes data about you pulled from other Facebook users' contacts lists, no matter how tenuous your connection to them might be. "Those are the [data sources] we're aware of," Cardozo said."

So, there might be more data sources besides the ones we know about. Facebook isn't saying. So much for greater transparency and control claims by Mr. Zuckerberg. Moreover, data breaches highlight the problems with the service's massive data collection and storage:

"The fact that Facebook has [shadow profiles] data isn't new. In 2013, the social network revealed that user data had been exposed by a bug in its system. In the process, it said it had amassed contact information from users and matched it against existing user profiles on the social network. That explained how the leaked data included information users hadn't directly handed over to Facebook. For example, if you gave the social network access to the contacts in your phone, it could have taken your mom's second email address and added it to the information your mom already gave to Facebook herself..."

So, Facebook probably launched shadow profiles when it introduced its mobile app. That means, if you uploaded the address book in your phone to Facebook, then you helped the service collect information about nonusers, too. This means Facebook acts more like a massive advertising network than simply a social media service.

How has Facebook been able to collect massive amounts of data about both users and nonusers? According to the Frontline documentary, we consumers have lax privacy laws in the United States to thank for this massive surveillance advertising mechanism. What do you think?


Billions Of Data Points About Consumers Exposed During Data Breach At Data Aggregator

It's not only social media companies and credit reporting agencies that experience data breaches where massive amounts of sensitive, personal information about millions of consumers are exposed and/or stolen. Data aggregators and analytics firms also have data breaches. Wired Magazine reported:

"The sales intelligence firm Apollo sent a notice to its customers disclosing a data breach it suffered over the summer... Apollo is a data aggregator and analytics service aimed at helping sales teams know who to contact, when, and with what message to make the most deals... Apollo also claims in its marketing materials to have 200 million contacts and information from over 10 million companies in its vast reservoir of data. That's apparently not just spin. Night Lion Security founder Vinny Troia, who routinely scans the internet for unprotected, freely accessible databases, discovered Apollo's trove containing 212 million contact listings as well as nine billion data points related to companies and organizations. All of which was readily available online, for anyone to access. Troia disclosed the exposure to the company in mid-August."

This is especially problematic for several reasons. First, data aggregators like Apollo (and social media companies and credit reporting agencies) are high-value targets: plenty of data is stored in one location. That's both convenient and risky. It also places a premium upon data security.

When data like this is exposed or stolen, it makes it easy for fraudsters, scammers, and spammers to create sophisticated and more effective phishing (and vishing) attacks to trick consumers and employees into revealing sensitive payment and financial information.

Second, data breaches like this make it easier for governments' intelligence agencies to compile data about persons and targets. Third, Apollo's database reportedly also contained sensitive data about clients. That's proprietary information. Wired explained:

"Some client-imported data was also accessed without authorization... Customers access Apollo's data and predictive features through a main dashboard. They also have the option to connect other data tools they might use, for example authorizing their Salesforce accounts to port data into Apollo..."

Salesforce, a customer relationship management (CRM) platform, uses cloud services and other online technologies to help its clients, companies with sales representatives, to manage their sales, service, and marketing activities. This breach also suggests that some employee training is needed about what to, and what not to upload, to outsourcing vendor sites. What do you think?


Health Insurers Are Vacuuming Up Details About You — And It Could Raise Your Rates

[Editor's note: today's guest post, by reporters at ProPublica, explores privacy and data collection issues within the healthcare industry. It is reprinted with permission.]

By Marshall Allen, ProPublica

To an outsider, the fancy booths at last month’s health insurance industry gathering in San Diego aren’t very compelling. A handful of companies pitching “lifestyle” data and salespeople touting jargony phrases like “social determinants of health.”

But dig deeper and the implications of what they’re selling might give many patients pause: A future in which everything you do — the things you buy, the food you eat, the time you spend watching TV — may help determine how much you pay for health insurance.

With little public scrutiny, the health insurance industry has joined forces with data brokers to vacuum up personal details about hundreds of millions of Americans, including, odds are, many readers of this story. The companies are tracking your race, education level, TV habits, marital status, net worth. They’re collecting what you post on social media, whether you’re behind on your bills, what you order online. Then they feed this information into complicated computer algorithms that spit out predictions about how much your health care could cost them.

Are you a woman who recently changed your name? You could be newly married and have a pricey pregnancy pending. Or maybe you’re stressed and anxious from a recent divorce. That, too, the computer models predict, may run up your medical bills.

Are you a woman who’s purchased plus-size clothing? You’re considered at risk of depression. Mental health care can be expensive.

Low-income and a minority? That means, the data brokers say, you are more likely to live in a dilapidated and dangerous neighborhood, increasing your health risks.

“We sit on oceans of data,” said Eric McCulley, director of strategic solutions for LexisNexis Risk Solutions, during a conversation at the data firm’s booth. And he isn’t apologetic about using it. “The fact is, our data is in the public domain,” he said. “We didn’t put it out there.”

Insurers contend they use the information to spot health issues in their clients — and flag them so they get services they need. And companies like LexisNexis say the data shouldn’t be used to set prices. But as a research scientist from one company told me: “I can’t say it hasn’t happened.”

At a time when every week brings a new privacy scandal and worries abound about the misuse of personal information, patient advocates and privacy scholars say the insurance industry’s data gathering runs counter to its touted, and federally required, allegiance to patients’ medical privacy. The Health Insurance Portability and Accountability Act, or HIPAA, only protects medical information.

“We have a health privacy machine that’s in crisis,” said Frank Pasquale, a professor at the University of Maryland Carey School of Law who specializes in issues related to machine learning and algorithms. “We have a law that only covers one source of health information. They are rapidly developing another source.”

Patient advocates warn that using unverified, error-prone “lifestyle” data to make medical assumptions could lead insurers to improperly price plans — for instance raising rates based on false information — or discriminate against anyone tagged as high cost. And, they say, the use of the data raises thorny questions that should be debated publicly, such as: Should a person’s rates be raised because algorithms say they are more likely to run up medical bills? Such questions would be moot in Europe, where a strict law took effect in May that bans trading in personal data.

This year, ProPublica and NPR are investigating the various tactics the health insurance industry uses to maximize its profits. Understanding these strategies is important because patients — through taxes, cash payments and insurance premiums — are the ones funding the entire health care system. Yet the industry’s bewildering web of strategies and inside deals often have little to do with patients’ needs. As the series’ first story showed, contrary to popular belief, lower bills aren’t health insurers’ top priority.

Inside the San Diego Convention Center last month, there were few qualms about the way insurance companies were mining Americans’ lives for information — or what they planned to do with the data.

The sprawling convention center was a balmy draw for one of America’s Health Insurance Plans’ marquee gatherings. Insurance executives and managers wandered through the exhibit hall, sampling chocolate-covered strawberries, champagne and other delectables designed to encourage deal-making.

Up front, the prime real estate belonged to the big guns in health data: The booths of Optum, IBM Watson Health and LexisNexis stretched toward the ceiling, with flat screen monitors and some comfy seating. (NPR collaborates with IBM Watson Health on national polls about consumer health topics.)

To understand the scope of what they were offering, consider Optum. The company, owned by the massive UnitedHealth Group, has collected the medical diagnoses, tests, prescriptions, costs and socioeconomic data of 150 million Americans going back to 1993, according to its marketing materials. (UnitedHealth Group provides financial support to NPR.) The company says it uses the information to link patients’ medical outcomes and costs to details like their level of education, net worth, family structure and race. An Optum spokesman said the socioeconomic data is de-identified and is not used for pricing health plans.

Optum’s marketing materials also boast that it now has access to even more. In 2016, the company filed a patent application to gather what people share on platforms like Facebook and Twitter, and link this material to the person’s clinical and payment information. A company spokesman said in an email that the patent application never went anywhere. But the company’s current marketing materials say it combines claims and clinical information with social media interactions.

I had a lot of questions about this and first reached out to Optum in May, but the company didn’t connect me with any of its experts as promised. At the conference, Optum salespeople said they weren’t allowed to talk to me about how the company uses this information.

It isn’t hard to understand the appeal of all this data to insurers. Merging information from data brokers with people’s clinical and payment records is a no-brainer if you overlook potential patient concerns. Electronic medical records now make it easy for insurers to analyze massive amounts of information and combine it with the personal details scooped up by data brokers.

It also makes sense given the shifts in how providers are getting paid. Doctors and hospitals have typically been paid based on the quantity of care they provide. But the industry is moving toward paying them in lump sums for caring for a patient, or for an event, like a knee surgery. In those cases, the medical providers can profit more when patients stay healthy. More money at stake means more interest in the social factors that might affect a patient’s health.

Some insurance companies are already using socioeconomic data to help patients get appropriate care, such as programs to help patients with chronic diseases stay healthy. Studies show social and economic aspects of people’s lives play an important role in their health. Knowing these personal details can help them identify those who may need help paying for medication or help getting to the doctor.

But patient advocates are skeptical health insurers have altruistic designs on people’s personal information.

The industry has a history of boosting profits by signing up healthy people and finding ways to avoid sick people — called “cherry-picking” and “lemon-dropping,” experts say. Among the classic examples: A company was accused of putting its enrollment office on the third floor of a building without an elevator, so only healthy patients could make the trek to sign up. Another tried to appeal to spry seniors by holding square dances.

The Affordable Care Act prohibits insurers from denying people coverage based on pre-existing health conditions or charging sick people more for individual or small group plans. But experts said patients’ personal information could still be used for marketing, and to assess risks and determine the prices of certain plans. And the Trump administration is promoting short-term health plans, which do allow insurers to deny coverage to sick patients.

Robert Greenwald, faculty director of Harvard Law School’s Center for Health Law and Policy Innovation, said insurance companies still cherry-pick, but now they’re subtler. The center analyzes health insurance plans to see if they discriminate. He said insurers will do things like failing to include enough information about which drugs a plan covers — which pushes sick people who need specific medications elsewhere. Or they may change the things a plan covers, or how much a patient has to pay for a type of care, after a patient has enrolled. Or, Greenwald added, they might exclude or limit certain types of providers from their networks — like those who have skill caring for patients with HIV or hepatitis C.

If there were concerns that personal data might be used to cherry-pick or lemon-drop, they weren’t raised at the conference.

At the IBM Watson Health booth, Kevin Ruane, a senior consulting scientist, told me that the company surveys 80,000 Americans a year to assess lifestyle, attitudes and behaviors that could relate to health care. Participants are asked whether they trust their doctor, have financial problems, go online, or own a Fitbit and similar questions. The responses of hundreds of adjacent households are analyzed together to identify social and economic factors for an area.

Ruane said he has used IBM Watson Health’s socioeconomic analysis to help insurance companies assess a potential market. The ACA increased the value of such assessments, experts say, because companies often don’t know the medical history of people seeking coverage. A region with too many sick people, or with patients who don’t take care of themselves, might not be worth the risk.

Ruane acknowledged that the information his company gathers may not be accurate for every person. “We talk to our clients and tell them to be careful about this,” he said. “Use it as a data insight. But it’s not necessarily a fact.”

In a separate conversation, a salesman from a different company joked about the potential for error. “God forbid you live on the wrong street these days,” he said. “You’re going to get lumped in with a lot of bad things.”

The LexisNexis booth was emblazoned with the slogan “Data. Insight. Action.” The company said it uses 442 non-medical personal attributes to predict a person’s medical costs. Its cache includes more than 78 billion records from more than 10,000 public and proprietary sources, including people’s cellphone numbers, criminal records, bankruptcies, property records, neighborhood safety and more. The information is used to predict patients’ health risks and costs in eight areas, including how often they are likely to visit emergency rooms, their total cost, their pharmacy costs, their motivation to stay healthy and their stress levels.

People who downsize their homes tend to have higher health care costs, the company says. As do those whose parents didn’t finish high school. Patients who own more valuable homes are less likely to land back in the hospital within 30 days of their discharge. The company says it has validated its scores against insurance claims and clinical data. But it won’t share its methods and hasn’t published the work in peer-reviewed journals.

McCulley, LexisNexis’ director of strategic solutions, said predictions made by the algorithms about patients are based on the combination of the personal attributes. He gave a hypothetical example: A high school dropout who had a recent income loss and doesn’t have a relative nearby might have higher than expected health costs.

But couldn’t that same type of person be healthy? I asked.

“Sure,” McCulley said, with no apparent dismay at the possibility that the predictions could be wrong.

McCulley and others at LexisNexis insist the scores are only used to help patients get the care they need and not to determine how much someone would pay for their health insurance. The company cited three different federal laws that restricted them and their clients from using the scores in that way. But privacy experts said none of the laws cited by the company bar the practice. The company backed off the assertions when I pointed that the laws did not seem to apply.

LexisNexis officials also said the company’s contracts expressly prohibit using the analysis to help price insurance plans. They would not provide a contract. But I knew that in at least one instance a company was already testing whether the scores could be used as a pricing tool.

Before the conference, I’d seen a press release announcing that the largest health actuarial firm in the world, Milliman, was now using the LexisNexis scores. I tracked down Marcos Dachary, who works in business development for Milliman. Actuaries calculate health care risks and help set the price of premiums for insurers. I asked Dachary if Milliman was using the LexisNexis scores to price health plans and he said: “There could be an opportunity.”

The scores could allow an insurance company to assess the risks posed by individual patients and make adjustments to protect themselves from losses, he said. For example, he said, the company could raise premiums, or revise contracts with providers.

It’s too early to tell whether the LexisNexis scores will actually be useful for pricing, he said. But he was excited about the possibilities. “One thing about social determinants data — it piques your mind,” he said.

Dachary acknowledged the scores could also be used to discriminate. Others, he said, have raised that concern. As much as there could be positive potential, he said, “there could also be negative potential.”

It’s that negative potential that still bothers data analyst Erin Kaufman, who left the health insurance industry in January. The 35-year-old from Atlanta had earned her doctorate in public health because she wanted to help people, but one day at Aetna, her boss told her to work with a new data set.

To her surprise, the company had obtained personal information from a data broker on millions of Americans. The data contained each person’s habits and hobbies, like whether they owned a gun, and if so, what type, she said. It included whether they had magazine subscriptions, liked to ride bikes or run marathons. It had hundreds of personal details about each person.

The Aetna data team merged the data with the information it had on patients it insured. The goal was to see how people’s personal interests and hobbies might relate to their health care costs. But Kaufman said it felt wrong: The information about the people who knitted or crocheted made her think of her grandmother. And the details about individuals who liked camping made her think of herself. What business did the insurance company have looking at this information? “It was a dataset that really dug into our clients’ lives,” she said. “No one gave anyone permission to do this.”

In a statement, Aetna said it uses consumer marketing information to supplement its claims and clinical information. The combined data helps predict the risk of repeat emergency room visits or hospital admissions. The information is used to reach out to members and help them and plays no role in pricing plans or underwriting, the statement said.

Kaufman said she had concerns about the accuracy of drawing inferences about an individual’s health from an analysis of a group of people with similar traits. Health scores generated from arrest records, home ownership and similar material may be wrong, she said.

Pam Dixon, executive director of the World Privacy Forum, a nonprofit that advocates for privacy in the digital age, shares Kaufman’s concerns. She points to a study by the analytics company SAS, which worked in 2012 with an unnamed major health insurance company to predict a person’s health care costs using 1,500 data elements, including the investments and types of cars people owned.

The SAS study said higher health care costs could be predicted by looking at things like ethnicity, watching TV and mail order purchases.

“I find that enormously offensive as a list,” Dixon said. “This is not health data. This is inferred data.”

Data scientist Cathy O’Neil said drawing conclusions about health risks on such data could lead to a bias against some poor people. It would be easy to infer they are prone to costly illnesses based on their backgrounds and living conditions, said O’Neil, author of the book “Weapons of Math Destruction,” which looked at how algorithms can increase inequality. That could lead to poor people being charged more, making it harder for them to get the care they need, she said. Employers, she said, could even decide not to hire people with data points that could indicate high medical costs in the future.

O’Neil said the companies should also measure how the scores might discriminate against the poor, sick or minorities.

American policymakers could do more to protect people’s information, experts said. In the United States, companies can harvest personal data unless a specific law bans it, although California just passed legislation that could create restrictions, said William McGeveran, a professor at the University of Minnesota Law School. Europe, in contrast, passed a strict law called the General Data Protection Regulation, which went into effect in May.

“In Europe, data protection is a constitutional right,” McGeveran said.

Pasquale, the University of Maryland law professor, said health scores should be treated like credit scores. Federal law gives people the right to know their credit scores and how they’re calculated. If people are going to be rated by whether they listen to sad songs on Spotify or look up information about AIDS online, they should know, Pasquale said. “The risk of improper use is extremely high. And data scores are not properly vetted and validated and available for scrutiny.”

As I reported this story I wondered how the data vendors might be using my personal information to score my potential health costs. So, I filled out a request on the LexisNexis website for the company to send me some of the personal information it has on me. A week later a somewhat creepy, 182-page walk down memory lane arrived in the mail. Federal law only requires the company to provide a subset of the information it collected about me. So that’s all I got.

LexisNexis had captured details about my life going back 25 years, many that I’d forgotten. It had my phone numbers going back decades and my home addresses going back to my childhood in Golden, Colorado. Each location had a field to show whether the address was “high risk.” Mine were all blank. The company also collects records of any liens and criminal activity, which, thankfully, I didn’t have.

My report was boring, which isn’t a surprise. I’ve lived a middle-class life and grown up in good neighborhoods. But it made me wonder: What if I had lived in “high risk” neighborhoods? Could that ever be used by insurers to jack up my rates — or to avoid me altogether?

I wanted to see more. If LexisNexis had health risk scores on me, I wanted to see how they were calculated and, more importantly, whether they were accurate. But the company told me that if it had calculated my scores it would have done so on behalf of their client, my insurance company. So, I couldn’t have them.

ProPublica is a Pulitzer Prize-winning investigative newsroom. Sign up for their newsletter.

 


How to Wrestle Your Data From Data Brokers, Silicon Valley — and Cambridge Analytica

[Editor's note: today's guest post, by reporters at ProPublica, discusses data brokers you may not know, the data collected and archived about consumers, and options for consumers to (re)gain as much privacy as possible. It is reprinted with permission.]

By Jeremy B. Merrill, ProPublica

Cambridge Analytica thinks that I’m a "Very Unlikely Republican." Another political data firm, ALC Digital, has concluded I’m a "Socially Conservative," Republican, "Boomer Voter." In fact, I’m a 27-year-old millennial with no set party allegiance.

For all the fanfare, the burgeoning field of mining our personal data remains an inexact art.

One thing is certain: My personal data, and likely yours, is in more hands than ever. Tech firms, data brokers and political consultants build profiles of what they know — or think they can reasonably guess — about your purchasing habits, personality, hobbies and even what political issues you care about.

You can find out what those companies know about you but be prepared to be stubborn. Very stubborn. To demonstrate how this works, we’ve chosen a couple of representative companies from three major categories: data brokers, big tech firms and political data consultants.

Few of them make it easy. Some will show you on their websites, others will make you ask for your digital profile via the U.S. mail. And then there’s Cambridge Analytica, the controversial Trump campaign vendor that has come under intense fire in light of a report in the British newspaper The Observer and in The New York Times that the company used improperly obtained data from Facebook to help build voter profiles.

To find out what the chaps at the British data firm have on you, you’re going to need both stamps and a "cheque."

Once you see your data, you’ll have a much better understanding of how this shadowy corner of the new economy works. You’ll see what seemingly personal information they know about you … and you’ll probably have some hypotheses about where this data is coming from. You’ll also probably see some predictions about who you are that are hilariously wrong.

And if you do obtain your data from any of these companies, please let us know your thoughts at [email protected]. We won’t share or publish what you say (unless you tell us that’s it’s OK).

Cambridge Analytica and Other Political Consultants

Making statistically informed guesses about Americans’ political beliefs and pet issues is a common business these days, with dozens of firms selling data to candidates and issue groups about the purported leanings of individual American voters.

Few of these firms have to give your data. But Cambridge Analytica is required to do so by an obscure European rule.

Cambridge Analytica:

Around the time of the 2016 election, Paul-Olivier Dehaye, a Belgian mathematician and founder of a website that helps people exercise their data protection rights called PersonalData.IO, approached me with an idea for a story. He flagged some of Cambridge Analytica’s claims about the power of its "psychographic" targeting capabilities and suggested that I demand my data from them.

So I sent off a request, following Dehaye’s coaching, and citing the UK Data Protection Act 1998, the British implementation of a little-known European Union data-protection law that grants individuals (even Americans) the rights to see the data Europeans companies compile about individuals.

It worked. I got back a spreadsheet of data about me. But it took months, cost ten pounds — and I had to give them a photo ID and two utility bills. Presumably they didn’t want my personal data falling into the wrong hands.

How You Can Request Your Data From Cambridge Analytica:

  1. Visit Cambridge Analytica’s website here and fill out this web form.
  2. After you submit the form, the page will immediately request that you email to [email protected] a photo ID and two copies of your utility bills or bank statements, to prove your identity. This page will also include the company’s bank account details.
  3. Find a way to send them 10 GBP. You can try wiring this from your bank, though it may cost you an additional $25 or so — or ask a friend in the UK to go to their bank and get a cashier’s check. Your American bank probably won’t let you write a GBP-denominated check. Two services I tried, Xoom and TransferWise, weren’t able to do it.
  4. Eventually, Cambridge Analytica will email you a small Excel spreadsheet of information and a letter. You might have to wait a few weeks. Celeste LeCompte, ProPublica’s vice president of business development, requested her data on March 27 and still hasn’t received it.

Because the company is based in the United Kingdom, it had no choice but to fulfill my request. In recent weeks, the firm has come under intense fire after The New York Times and the British paper The Observer disclosed that it had used improperly obtained data from Facebook to build profiles of American voters. Facebook told me that data about me was likely transmitted to Cambridge Analytica because a person with whom I am "friends" on the social network had taken the now-infamous "This Is Your Digital Life" quiz. For what it’s worth, my data shows no sign of anything derived from Facebook.

What You Might Get Back From Cambridge Analytica:

Cambridge Analytica had generated 13 data points about my views: 10 political issues, ranked by importance; two guesses at my partisan leanings (one blank); and a guess at whether I would turn out in the 2016 general election.

They told me that the lower the rank, the higher the predicted importance of the issue to me.

Alongside that data labeled "models" were two other types of data that are run-of-the-mill and widely used by political consultants. One sheet of "core data" — that is, personal info, sliced and diced a few different ways, perhaps to be used more easily as parameters for a statistical model. It included my address, my electoral district, the census tract I live in and my date of birth.

The spreadsheet included a few rows of "election returns" — previous elections in New York State in which I had voted. (Intriguingly, Cambridge Analytica missed that I had voted in 2015’s snoozefest of a vote-for-five-of-these-five judicial election. It also didn’t know about elections in which I had voted in North Carolina, where I lived before I lived in New York.)

ALC Digital

ALC Digital is another data broker, which says that its info is "audiences are built from multi-sourced, verified information about an individual." Their data is distributed via Oracle Data Cloud, a service that lets advertisers target specific audience of people — like, perhaps, people who are Boomer Voters and also Republicans.

The firm brags in an Oracle document posted online about how hard it is to avoid their data collection efforts, saying, "It has no cookies to erase and can’t be ‘cleared.’ ALC Real World Data is rooted in reality, and doesn’t rely on inferences or faulty models."

How You Can Request Your Data From ALC Digital:

Here’s how to find the predictions about your political beliefs data in Oracle Data Cloud:

  1. Visit http://www.bluekai.com/registry/. If you use an ad blocker, there may not be much to see here.
  2. Click on the Partner Segments tab.
  3. Scroll on through until you find ALC Digital.

You may have to scroll for a while before you find it.

And not everyone appears to have data from ALC Digital, so don’t be shocked if you can’t find it. If you don’t, there may be other fascinating companies with data about who you are in your Oracle file.

What You Might Get Back From ALC Digital:

When I downloaded the data last year, it said I was "Socially Conservative," "Boomer Voter" — as well as a female voter and a tax reform supporter.

Recently, when I checked my data, those categories had disappeared entirely from my data. I had nothing from ALC Digital.

ALC Digital is not required to release this data. It is disclosed via the Oracle Data Cloud. Fran Green, the company’s president, said that Aristotle, a longtime political data company, “provides us with consumer data that populates these audiences.” She also said that “we do not claim to know people’s ‘beliefs.’”

Big Tech

Big tech firms like Google and Facebook tend to make their money by selling ads, so they build extensive profiles of their users’ interests and activities. They also depend on their users’ goodwill to keep us voluntarily giving them our locations, our browsing histories and plain ol’ lists of our friends and interests. (So far, these popular companies have not faced much regulation.) All three make it easy to download the data that they keep on you.

Firms like Google and Facebook firms don’t sell your data — because it’s their competitive advantage. Google’s privacy page screams in 72 point type: "We do not sell your personal information to anyone." As websites that we visit frequently, they sell access to our attention, so companies that want to reach you in particular can do so with these companies’ sites or other sites that feature their ads.

Facebook

How You Can Request Your Data From Facebook:

You of course have to have a Facebook account and be logged in:

  1. Visit https://www.facebook.com/settings on your computer.
  2. Click the “Download a copy of your Facebook data” link.
  3. On the next page, click “Start My Archive.”
  4. Enter your password, then click “Start My Archive” again.
  5. You’ll get an email immediately, and another one saying “Your Facebook download is ready” when your data is ready to be downloaded. You’ll get a notification on Facebook, too. Mine took just a few minutes.
  6. Once you get that email, click the link, then click Download Archive. Then reenter your password, which will start a zip file downloading..
  7. Unzip the folder; depending on your computer’s operating system, this might be called uncompressing or “expanding.” You’ll get a folder called something like “facebook-jeremybmerrill,” but, of course, with your username instead of mine.
  8. Open the folder and double-click “index.htm” to open it in your web browser.

What You Might Get Back From Facebook

Facebook designed its archive to first show you your profile information. That’s all information you typed into Facebook and that you probably intended to be shared with your friends. It’s no surprise that Facebook knows what city I live in or what my AIM screen name was — I told Facebook those things so that my friends would know.

But it’s a bit of a surprise that they decided to feature a list of my ex-girlfriends — what they blandly termed "Previous Relationships" — so prominently.

As you dig deeper in your archive, you’ll find more information that you gave Facebook, but that you might not have expected the social network to keep hold of for years: if you’re me, that’s the Nickelback concert I apparently RSVPed to, posts about switching high schools and instant messages from my freshman year in college.

But finally, you’ll find the creepier information: what Facebook knows about you that you didn’t tell it, on the "Ads" page. You’ll find "Ads Topics" that Facebook decided you were interested in, like Housing, ESPN or the town of Ellijay, Georgia. And, you’ll find a list of advertisers who have obtained your contact information and uploaded it to Facebook, as part of a so-called Custom Audience of specific people to whom they want to show their ads.

You’ll find more of that creepy information on your Ads Preferences page. Despite Mark Zuckerberg telling Rep. Jerry McNerney, D-Calif., in a hearing earlier this month that “all of your information is included in your ‘download your information,’” my archive didn’t include that list of ad categories that can be used to target ads to me. (Some other types of information aren’t included in the download, like other people’s posts you’ve liked. Those are listed here, along with where to find them — which, for most, is in your Activity Log.)

This area may include Facebook’s guesses about who you are, boiled down from some of your activities. Most Americans’ will have a guess about their politics — Facebook says I’m a "moderate" about U.S. Politics — and some will have a guess about so-called "multicultural affinity," which Facebook insists is not a guess about your ethnicity, but rather what sorts of content "you are interested in or will respond well to." For instance, Facebook recently added that I have a "Multicultural Affinity: African American." (I’m white — though, because Facebook’s definition of "multicultural affinity" is so strange, it’s hard to tell if this is an error on Facebook’s part.)

Facebook also doesn’t include your browsing history — the subject of back-and-forths between Mark Zuckerberg and several members of Congress — it says it keeps that just long enough to boil it down into those “Ad Topics.”

For people without Facebook accounts, Facebook says to email [email protected] or fill out an online form to download what Facebook knows about you. One puzzle here is how Facebook gathers data on people whose identities it may not know. It may know that a person using a phone from Atlanta, Georgia, has accessed a Facebook site and that the same person was last week in Austin, Texas, and before that Cincinnati, but it may not know that that person is me. It’s in principle difficult for the company to give the data it collects about logged-out users if it doesn’t know exactly who they are.

Google

Like Facebook, Google will give you a zip archive of your data. Google’s can be much bigger, because you might have stored gigabytes of files in Google Drive or years of emails in Gmail.

But like Facebook, Google does not provide its guesses about your interests, which it uses to target ads. Those guesses are available elsewhere.

How You Can Request Your Data From Google:

  1. Visit https://takeout.google.com/settings/takeout/ to use Google’s cutely named Takeout service.
  2. You’ll have to pick which data you want to download and examine. You should definitely select My Activity, Location History and Searches. You may not want to download gigabytes of emails, if you use Gmail, since that uses a lot of space and may take a while. (That’s also information you shouldn’t be surprised that Google keeps — you left it with Gmail so that you could use Google’s search expertise to hold on to your emails. )
  3. Google will present you with a few options for how to get your archive. The defaults are fine.
  4. Within a few hours, you should get an email with the subject "Your Google data archive is ready." Click Download Archive and log in again. That should start the download of a file named something like "takeout-20180412T193535.zip."
  5. Unzip the folder; depending on your computer’s operating system, this might be called uncompressing or “expanding.”
  6. You’ll get a folder called Takeout. Open the file inside it called "index.html" in your web browser to explore your archive.

What You Might Get Back From Google:

Once you open the index.html file, you’ll see icons for the data you chose in step 2. Try exploring "Ads" under "My Activity" — you’ll see a list of times you saw Google Ads, including on apps on your phone.

Google also includes your search history, under "Searches" — in my case, going back to 2013. Google knows what I had forgotten: I Googled a bunch of dinosaurs around Valentine’s Day that year… And it’s not just web searches: the Sound Search history reminded me that at some point, I used that service to identify Natalie Imbruglia’s song "Torn."

Android phone users might want to check the "Android" folder: Google keeps a list of each app you’ve used on your phone.

Most of the data contained here are records of ways you’ve directly interacted with Google — and the company really does use the those to improve how their services work for me. I’m glad to see my searches auto-completed, for instance.

But the company also creates data about you: Visit the company’s Ads Settings page to see some of the “topics” Google guesses you’re interested in, and which it uses to personalize the ads you see. Those topics are fairly general — it knows I’m interested in “Politics” — but the company says it has more granular classifications that it doesn’t include on the list. Those more granular, hidden classifications are on various topics, from sports to vacations to politics, where Google does generate a guess whether some people are politically “left-leaning” or “right-leaning.”

Data Brokers

Here’s who really does sell your data. Data brokers like the credit reporting agency Experian and a firm named Epsilon.

These sometimes-shady firms are middlemen who buy your data from tracking firms, survey marketers and retailers, slice and dice the data into “segments,” then sell those on to advertisers.

Experian

Experian is best known as a credit reporting firm, but your credit cards aren’t all they keep track of. They told me that they “firmly believe people should be made aware of how their data is being used” — so if you print and mail them a form, they’ll tell you what data they have on you.

“Educated consumers,” they said, “are better equipped to be effective, successful participants in a world that increasingly relies on the exchange of information to efficiently deliver the products and services consumers demand.”

How You Can Request Your Data From Experian:

  1. Visit Experian’s Marketing Data Request site and print the Marketing Data Report Request form.
  2. Print a copy of your ID and proof of address.
  3. Mail it all to Experian at Experian Marketing Services PO Box 40 Allen, TX 75013
  4. Wait for them to mail you something back.

What You Might Get Back From Experian:

Expect to wait a while. I’ve been waiting almost a month.

They also come up with a guess about your political views that’s integrated with Facebook — our Facebook Political Ad Collector project has found that many political candidates use Experian’s data to target their Facebook ads to likely supporters.

You should hope to find a guess about your political views that’d be useful to those candidates — as well as categories derived from your purchasing data.

Experian told me they generate the data they have about you from a long list of sources, including public records and “historical catalog purchase information” — as well as calculating it from predictive models.

Epsilon

How You Can Request Your Data From Epsilon:

  1. Visit Epsilon’s Marketing Data Summary Request form.
  2. After entering your name and address, Epsilon will answer some of those identity-verification questions that quiz you about your old addresses and cars. If your identity can’t be verified with those, Epsilon will ask you to mail in a form.
  3. Wait for Epsilon to mail you your data; it took about a week for me.

What You Might Get Back From Epsilon:

Epsilon has information on “demographics” and “lifestyle interests” — at the household level. It also includes a list of “household purchases.”

It also has data that political candidates use to target their Facebook ads, including Randy Bryce, a Wisconsin Democrat who’s seeking his party’s nomination to run for retiring Speaker Paul Ryan’s seat, and Rep. Tulsi Gabbard, D-Hawaii.

In my case, Epsilon knows I buy clothes, books and home office supplies, among other things — but isn’t any more specific. They didn’t tell me what political beliefs they believe I hold. The company didn’t respond to a request for comment.

Oracle

Oracle’s Data Cloud aggregates data about you from Oracle, but also so-called third party data from other companies.

How You Can Request Your Data From Oracle:

  1. Visit http://www.bluekai.com/registry/. If you use an ad blocker, there may not be much to see here.
  2. Explore each tab, from “Basic Info” to “Hobbies & Interests” and “Partner Segments.”

Not fun scrolling through all those pages? I have 84 pages of four pieces of data each.

You can’t search. All the text is actually images of text. Oracle declined to say why it chose to make their site so hard to use.

What You Might Get Back From Oracle:

My Oracle profile includes nearly 1500 data points, covering all aspects of my life, from my age to my car to how old my children are to whether I buy eggs. These profiles can even say if you’re likely to dress your pet in a costume for Halloween. But many of them are off-base or contradictory.

Many companies in Oracle’s data, besides ALC Digital, offer guesses about my political views: Data from one company uploaded by AcquireWeb says that my political affiliations are as a Democrat and an Independent … but also that I’m a “Mild Republican.” Another company, an Oracle subsidiary called AddThis, says that I’m a “Liberal.” Cuebiq, which calls itself a “location intelligence” company, says I’m in a subset of “Democrats” called “Liberal Professions.”

If an advertiser wants to show an ad to Spring Break Enthusiasts, Oracle can enable that. I’m apparently a Spring Break Enthusiast. Do I buy eggs? I sure do. Data on Oracle’s site associated with AcquireWeb says I’m a cat owner …

But it also “knows” I’m a dog owner, which I’m not.

Al Gadbut, the CEO of AcquireWeb, explained that the guesses associated with his company weren’t based on my personal data, but rather the tendencies of people in my geographical area — hence the seemingly contradictory political guesses. He said his firm doesn’t generate the data, but rather uploaded it on behalf of other companies. Cuebiq’s guess was a “probabilistic inference” they drew from location data submitted to them by some app on my phone. Valentina Marastoni-Bieser, Cuebiq’s senior vice president of marketing, wouldn’t tell me which app it was, though.

Data for sale here includes a long list what TV shows I — supposedly — watch.

But it’s not all wrong. AddThis can tell that I’m “Young & Hip.”

Takeaways:

The above list is just a sampling of the firms that collect your data and try to draw conclusions about who you are — not just sites you visit like Facebook and controversial firms like Cambridge Analytica.

You can make some guesses as to where this data comes from — especially the more granular consumer data from Oracle. For each data point, it’s worth considering: Who’d be in a position to sell a list of what TV shows I watch, or, at least, a list of what TV shows people demographically like me watch? Who’d be in a position to sell a list of what groceries I, or people similar to me in my area, buy? Some of those companies — companies who you’re likely paying, and for whom the internet adage that “if you’re not paying, you’re the product” doesn’t hold — are likely selling data about you without your knowledge. Other data points, like the location data used by Cuebiq, can come from any number of apps or websites, so it may be difficult to figure out exactly which one has passed it on.

Companies like Google and Facebook often say that they’ll let you “correct” the data that they hold on you — tacitly acknowledgingly that they sometimes get it wrong. But if receiving relevant ads is not important to you, they’ll let you opt-out entirely — or, presumably, “correct” your data to something false.

An upcoming European Union rule called the General Data Protection Regulation portends a dramatic change to how data is collected and used on the web — if only for Europeans. No such law seems likely to be passed in the U.S. in the near future.

ProPublica is a Pulitzer Prize-winning investigative newsroom. Sign up for their newsletter.


How To View The List Of Advertisers Tracking You On Facebook. Any Surprises On Your List?

The massive privacy and data security breach at Facebook.com involving Cambridge Analytica has heightened many users' sensitivity to the advertising practices by the social networking service. Many Facebook users want to know the exact list of advertiser tracking them.

How To View The List Of Advertisers Tracking You

Facebook Ad Preferences page. Click to view larger version How to view this list? It's easy. Sign into Facebook.com and navigate to Settings > Ads > Advertisers You've Interacted With. (When using a web browser, you'll have to click on the tiny arrow in the upper right portion of the page to access the drop-down menu.) Within the Ad Preferences page, click on the "Advertisers You've Interacted With" headline to open that module. When opened, it displays several lists of advertisers:

  1. Who've added their contact list to Facebook
  2. Whose website or app you've used,
  3. Whom you've visited, and
  4. More

The default view of list #1 displays 12 advertisers tracking you. There probably are many more in your list. Select "Show More" to view more advertisers. Facebook doesn't make it easy. The module lacks a "Show All" button, which forces users to repeatedly select "Show More." Not good. Come on Facebook! You can do better.

List #1 includes important explanatory text:

"These advertisers are running ads using a contact list they uploaded that includes your contact info. This info was collected by the advertiser, typically after you shared your email address with them or another business they've partnered with."

The key phrase to remember: or another business they've interacted with. So, list #1 includes not only advertisers but also affiliates or business partners. Not good. More Facebook being Facebook.

I selected "Show More" about two dozen times to view my complete list: 235 advertisers tracking me, and collecting data about me. 235 advertisers even though I never used the Facebook mobile app, and had already disabled the Facebook API platform on my account years ago! Not good.

Your mileage will vary. There may be fewer or more advertisers on your list.

My list #1 included both advertisers I expected and many I didn't expect. The advertisers I expected to see brands I currently do business with (e.g., Marriott Rewards, ACLU), brands I no longer do business with (e.g., Bank of America, AT&T), and/or brands whose Facebook pages I "Liked" or left comments on. The advertisers who I didn't expected to see included politicians in other states I've neither visited nor live in, brands I've never purchased nor interacted with in any manner, brands I have never "Liked," and more.

Who's on your list? A friend shared:

"I looked at my list and it's crazy. Will follow the opt-out links tomorrow and clear them out. Cardi B was in my list of FB advertisers."

A rapper? That's too funny. I guess that's to be expected if you stream and share music online via Facebook. Me? I don't stream music online because that is another way to be tracked. Instead, I enjoy listening to CDs privately in my home. I prefer to keep my home a truly private place.

What's really going on here? Why the crazy long list? Popular Science explained:

"You, can thank the "data providers" for this mess. Mark Zuckerberg spent roughly 11 hours testifying in front of Congressional committees... One thing that got very little attention was the concept of “data brokers,” middleman businesses that collect consumer information and sell it to companies. Facebook stopped using them just last month. However, that long string of companies, personalities, and alternative rock bands is a result of Facebook’s old program... after the Cambridge Analytica scandal broke, but before Mark Zuckerberg’s marathon testimony in front of Congress, Facebook announced that it was ending a program called Partner Categories, canceling a long-standing relationship between the social network and data brokers. The change was announced in a short statement, but it has big implications for your personal information and the agencies that collect and sell it."

"The ability to target advertising is what makes Facebook its money—roughly $40 billion last year... while you provide lots of user information to Facebook, advertisers typically want even more... and that’s where data brokers come in. Facebook calls on brokers like Acxiom, Epsilon, and TransUnion to act as a conduit between Facebook and individual advertisers looking to reach targeted audiences..."

Readers of this blog may recognize TransUnion, one of the three major credit reporting agencies. So, the "advertisers" on Facebook tracking you (and data harvesting) include a variety of entities: traditional advertisers, business partners, affiliates, data brokers, and their intermediaries.

It's called "surveillance capitalism" for good reasons. Many companies besides Facebook do it.

What To Do Next

It's not easy to opt out or delete items from your advertising list. For those brands and entities you have "Liked," you can visit their Facebook page and "Unlike" them. However, that won't stop them or other "advertisers" from re-targeting (and tracking) you in the future. The "Ad Preferences" page for your profile also includes the "Your Information" module where you can toggle on or off advertising based upon certain profile elements:

Your Information module within Ad Preferences. Facebook. Click to view larger version

The above image is from 2017. back then I disabled all of the active toggles you see. Deactivating these toggles might minimize the number of ads displays, but it won't stop the tracking and data collection. The Popular Science article includes links to several opt-out mechanisms for major data brokers. You could (and should) use those. However, two key problems remain.

First, these opt-out links should be easily accessible within Facebook. They aren't. This forces consumers to waste time hunting for the opt-out mechanisms, when Facebook has the expertise to provide them. Facebook probably knows that many consumers will give up and quit, rather than hunt for opt-out links. It's great that Popular Science did a lot of the work for consumers.

Second, the opt-out mechanisms offered by some data brokers are unnecessarily complex. Example: see the opt-out mechanisms offered by Experian, another credit reporting agency:

Experian opt-out site pages. Click to view larger version

Didn't know that Experian plays in both ponds: credit reporting and data brokerage? Most people probably don't know. Experian's site lacks a unified, single opt-out mechanism which forces consumers to wade through seven different mechanisms and methods; some of which are paper-based and lack an online method. Not good!

TransUnion's opt-out mechanism isn't much better. And, it raises more questions than it answers? It links to the OptOutPrescreen.com site, which I completed way back in 2007. Did my Facebook membership undo that? Or is there some other data sharing at work, which the OptOutprescreen doesn't cover? TransUnion's page doesn't explain, and nither does Facebook's page. Not good.

Some people choose to use ad-blocking software (e.g., Adblock Plus, Ghostery) to suppress the display of online ads, but that probably won't stop the tracking and data collection internal to Facebook. There's no substitute for Facebook giving its users internal tools to completely disable and opt out of the tracking and data collection.

That highlights another problem: users are automatically included, so the burden is upon users to (continually) opt out. This is Facebook's business model. The reverse should be the default. Users should not be tracked nor data harvested unless they register and opt into the program. Given the social media site's business model, even if you opt out today, there's nothing stopping Facebook from re-subscribing you in the future with any updates to its system or terms of use.

How many advertisers are on your list? 200 or more? 300? 400? Any surprises on your list?


Robotic Vacuum Cleaner Maker To Resell Data Collected Of Customers' Home Interiors

iRobot Roomba autonomous vacuum. Click to view larger image Do you use a robovac -- an autonomous WiFi-connected robotic vacuum cleaner -- in your home? Do you use the mobile app to control your robovac?

Gizmodo reports that iRobot, the maker of the Roomba robotic vacuum cleaner, plans to resell maps generated by robovacs to other smart-home device manufacturers:

"While it may seem like the information that a Roomba could gather is minimal, there’s a lot to be gleaned from the maps it’s constantly updating. It knows the floor plan of your home, the basic shape of everything on your floor, what areas require the most maintenance, and how often you require cleaning cycles, along with many other data points... If a company like Amazon, for example, wanted to improve its Echo smart speaker, the Roomba’s mapping info could certainly help out. Spatial mapping could improve audio performance by taking advantage of the room’s acoustics. Do you have a large room that’s practically empty? Targeted furniture ads might be quite effective. The laser and camera sensors would paint a nice portrait for lighting needs..."

Think about it. The maps identify whether you have one, none, or several sofas -- or other large furniture items. The maps also identify the size, square footage, of your home and the number of rooms. Got a hairy pet? If your robovac needs more frequently cleaning, that data is collected, too.

One can easily confirm this by reading the iRobot Privacy Policy:

"... Some of our Robots are equipped with smart technology which allows the Robots to transmit data wirelessly to the Service. For example, the Robot could collect and transmit information about the Robot’s function and use statistics, such as battery life and health, number of missions, the device identifier, and location mapping. When you register your Robot with the online App, the App will collect and maintain information about the Robot and/or App usage, feature usage, in-App transactions, technical specifications, crashes, and other information about how you use your Robot and the product App. We also collect information provided during set-up.

We use this information to collect and analyze statistics and usage data, diagnose and fix technology problems, enhance device performance, and improve user experience. We may use this information to provide you personalized communications, including marketing and promotional messages... Our Robots do not transmit this information unless you register your device online and connect to WiFi, Bluetooth, or connect to the internet via another method."

Everything seems focused upon making your robovac perform optimally. Seems. Read on:

"When you access the Service by or through a mobile device, we may receive or collect and store a unique identification numbers associated with your device or our mobile application (including, for example, a UDID, Unique ID for Advertisers (“IDFA”), Google Ad ID, or Windows Advertising ID), mobile carrier, device type, model and manufacturer, mobile device operating system brand and model, phone number, and, depending on your mobile device settings, your geographical location data, including GPS coordinates (e.g. latitude and/or longitude) or similar information regarding the location of your mobile device..."

Use the mobile app and your robovac's unique ID number can easily be associated with other data describing you, where you live, and your lifestyle. Valuable stuff.

Another important section of the privacy policy:

"We may share your personal information in the instances described... i) Other companies owned by or under common ownership as iRobot, which also includes our subsidiaries or our ultimate holding company and any subsidiaries it owns. These companies will use your personal information in the same way as we can under this Policy; ii) Third party vendors, affiliates, and other service providers that perform services on our behalf, solely in order to carry out their work for us, which may include identifying and serving targeted advertisements, providing e-commerce services, content or service fulfillment, billing, web site operation, payment processing and authorization, customer service, or providing analytics services.

Well, there seems to be plenty of wiggle room for iRobot to resell your data. And, that assumes it doesn't change its privacy policy to make resales easier. Note: this is not legal advice. If you want legal advice, hire an attorney. I am not an attorney.

The policy goes on to describe customers' choices with stopping or opting out of data collection programs for some data elements. If you've read that, then you know how to opt out of as much as possible of the data collection.

The whole affairs highlights the fact that the data collected from different brands of smart devices in consumers' homes can be combined, massaged, and analyzed in new ways -- ways in which probably are not apparent to consumers, and which reveal more about you than often desired. And, the whole affair is a reminder to read privacy policies before purchases. Know what valuable personal data you will give away for convenience.

Eyes wide open.

Got an autonomous robotic lawn mower? You might re-read the privacy policy for that, too.


LeapLab And Other Defendants Settled With FTC

Recently, a reader wrote via e-mail with feedback about this December 2014 blog post which discussed a lawsuit filed by the U.S. Federal Trade Commission (FTC) against a data broker, LeapLab, and other defendants. The suit alleged that the defendants sold consumers' sensitive personal information to fraudsters.

The reader was unhappy because he was unable to submit a comment on that blog post. The policy of this blog is to close comments on all blog posts after a year. The reader seemed to interpret that policy as a slight against one of the defendants. No. The closing of comments after a year is equal, consistent treatment.

The reader was also unhappy with comments posted by other readers to that 2014 blog post. Like other blogs, readers freely share their opinions and feedback in the comments section. Like other blogs, I am not responsible for readers' comments. Nor do I censor comments for content. I remind everyone to read the Terms of Service.

The reader's e-mail feedback claimed the blog post was incomplete and one sided. Today's blog post reports the rest of the story.

LeapLab and the other defendants settled the lawsuit with the FTC in February, 2016. The February 18, 2016 FTC announcement stated:

"A group of defendants have settled Federal Trade Commission charges that they knowingly provided scammers with hundreds of thousands of consumers’ sensitive personal information – including Social Security and bank account numbers. The proposed federal court orders prohibit John Ayers, LeapLab and Leads Company from selling or transferring sensitive personal information about consumers to third parties. The defendants will also be prohibited from misleading consumers about the terms of a loan offer or the likelihood of getting a loan. In addition, the settlements require the defendants to destroy any consumer data in their possession within 30 days.

The orders include a $5.7 million monetary judgment, which is suspended based on the defendants sworn inability to pay. In addition to the settlement orders, the court entered an unsuspended $4.1 million default judgment with similar prohibitions against SiteSearch, the remaining defendant in the case."

You can follow the above links to the settlement agreements between each defendant and the FTC, which were approved by the court. Links are also available on the FTC-Leaplab proceedings page.

As a solo blogger with limited resources, I do my best to get it right. There's plenty of privacy news to cover, and I should have reported the above settlement agreements sooner. Hopefully, today's blog post corrects that oversight. I sincerely thank all readers for their feedback and comments.


Facebook Doesn't Tell Users Everything it Really Knows About Them

[Editor's note: today's guest post is by reporters at ProPublica. I've posted it because, a) many consumers don't know how their personal information is bought, sold, and used by companies and social networking sites; b) the USA is capitalist society and the sensitive personal data that describes consumers is consumers' personal property; c) a better appreciation of "a" and "b" will hopefully encourage more consumers to be less willing to trade their personal property for convenience, and demand better privacy protections from products, services, software, apps, and devices; and d) when lobbyists and politicians act to erode consumers' property and privacy rights, hopefully more consumers will respond and act. Facebook is not the only social networking site that trades consumers' information. This news story is reprinted with permission.]

by Julia Angwin, Terry Parris Jr. and Surya Mattu, ProPublica

Facebook has long let users see all sorts of things the site knows about them, like whether they enjoy soccer, have recently moved, or like Melania Trump.

But the tech giant gives users little indication that it buys far more sensitive data about them, including their income, the types of restaurants they frequent and even how many credit cards are in their wallets.

Since September, ProPublica has been encouraging Facebook users to share the categories of interest that the site has assigned to them. Users showed us everything from "Pretending to Text in Awkward Situations" to "Breastfeeding in Public." In total, we collected more than 52,000 unique attributes that Facebook has used to classify users.

Facebook's site says it gets information about its users "from a few different sources."

What the page doesn't say is that those sources include detailed dossiers obtained from commercial data brokers about users' offline lives. Nor does Facebook show users any of the often remarkably detailed information it gets from those brokers.

"They are not being honest," said Jeffrey Chester, executive director of the Center for Digital Democracy. "Facebook is bundling a dozen different data companies to target an individual customer, and an individual should have access to that bundle as well."

When asked this week about the lack of disclosure, Facebook responded that it doesn't tell users about the third-party data because its widely available and was not collected by Facebook.

"Our approach to controls for third-party categories is somewhat different than our approach for Facebook-specific categories," said Steve Satterfield, a Facebook manager of privacy and public policy. "This is because the data providers we work with generally make their categories available across many different ad platforms, not just on Facebook."

Satterfield said users who don't want that information to be available to Facebook should contact the data brokers directly. He said users can visit a page in Facebook's help center, which provides links to the opt-outs for six data brokers that sell personal data to Facebook.

Limiting commercial data brokers' distribution of your personal information is no simple matter. For instance, opting out of Oracle's Datalogix, which provides about 350 types of data to Facebook according to our analysis, requires "sending a written request, along with a copy of government-issued identification" in postal mail to Oracle's chief privacy officer.

Users can ask data brokers to show them the information stored about them. But that can also be complicated. One Facebook broker, Acxiom, requires people to send the last four digits of their social security number to obtain their data. Facebook changes its providers from time to time so members would have to regularly visit the help center page to protect their privacy.

One of us actually tried to do what Facebook suggests. While writing a book about privacy in 2013, reporter Julia Angwin tried to opt out from as many data brokers as she could. Of the 92 brokers she identified that accepted opt-outs, 65 of them required her to submit a form of identification such as a driver's license. In the end, she could not remove her data from the majority of providers.

ProPublica's experiment to gather Facebook's ad categories from readers was part of our Black Box series, which explores the power of algorithms in our lives. Facebook uses algorithms not only to determine the news and advertisements that it displays to users, but also to categorize its users in tens of thousands of micro-targetable groups.

Our crowd-sourced data showed us that Facebook's categories range from innocuous groupings of people who like southern food to sensitive categories such as "Ethnic Affinity" which categorizes people based on their affinity for African-Americans, Hispanics and other ethnic groups. Advertisers can target ads toward a group 2014 or exclude ads from being shown to a particular group.

Last month, after ProPublica bought a Facebook ad in its housing categories that excluded African-Americans, Hispanics and Asian-Americans, the company said it would build an automated system to help it spot ads that illegally discriminate.

Facebook has been working with data brokers since 2012 when it signed a deal with Datalogix. This prompted Chester, the privacy advocate at the Center for Digital Democracy, to filed a complaint with the Federal Trade Commission alleging that Facebook had violated a consent decree with the agency on privacy issues. The FTC has never publicly responded to that complaint and Facebook subsequently signed deals with five other data brokers.

To find out exactly what type of data Facebook buys from brokers, we downloaded a list of 29,000 categories that the site provides to ad buyers. Nearly 600 of the categories were described as being provided by third-party data brokers. (Most categories were described as being generated by clicking pages or ads on Facebook.)

The categories from commercial data brokers were largely financial, such as "total liquid investible assets $1-$24,999," "People in households that have an estimated household income of between $100K and $125K, or even "Individuals that are frequent transactor at lower cost department or dollar stores."

We compared the data broker categories with the crowd-sourced list of what Facebook tells users about themselves. We found none of the data broker information on any of the tens of the thousands of "interests" that Facebook showed users.

Our tool also allowed users to react to the categories they were placed in as being "wrong," "creepy" or "spot on." The category that received the most votes for "wrong" was "Farmville slots." The category that got the most votes for "creepy" was "Away from family." And the category that was rated most "spot on" was "NPR."

ProPublica is a Pulitzer Prize-winning investigative newsroom. Sign up for their newsletter.


Big Data Brokers: Failing With Privacy

You may not know that hedge funds, in both the United Kingdom and in the United States, buy and sell a variety of information from data brokers: mobile app purchases, credit card purchases, posts at social networking sites, and lots more. You can bet that a lot of that mobile information includes geo-location data. The problem: consumers' privacy isn't protected consistently.

The industry claims the information sold is anonymous (e.g., doesn't identify specific persons), but researchers have it easy to de-anonymize the information. The Financial Times reported:

"The “alternative data” industry, which sells information such as app downloads and credit card purchases to investment groups, is failing to adequately erase personal details before sharing the material... big data is seen as an increasingly attractive source of information for asset managers seeking a vital investment edge, with data providers selling everything from social media chatter and emailed receipts to federal lobbying data and even satellite images from space..."

One part of the privacy problem:

“The vendors claim to strip out all the personal information, but we occasionally find phone numbers, zip codes and so on,” said Matthew Granade, chief market intelligence officer at Steven Cohen’s Point72. “It’s a big enough deal that we have a couple of full-time tech people wash the data ourselves.” The head of another major hedge fund said that even when personal information had been scrubbed from a data set, it was far too easy to restore..."

A second part of the privacy problem:

“... there is no overarching US privacy law to protect consumers, with standards set individually by different states, industries and even companies, according to Albert Gidari, director of privacy at the Stanford Center for Internet and Society..."

The third part of the privacy problem: consumers are too willing to trade personal information for convenience.


Data Breach Of Online Database Affects 154 Million U.S. Voters

An online database of voter profiles about 154 million Americans suffered a data breach. A security researcher discovered the unprotected online database. HelpNetSecurity reported:

"It was a CouchDB database that required no authentication to be accessed, hosted on Google’s Cloud services. Luckily, an ID associated with each record pointed [the security researcher] in the right direction regarding the owner of the data... the data was originally collected by a data brokerage company named L2... The client told us that they were hacked, the firewall was taken down and then the probing began..."

The voter profiles include full names, addresses, phone numbers, age, gender, marital status, estimated income, political party, congressional district affiliation, state senate district affiliation, and more:

"Some of the records also contained information about the voters’ marital status, whether they had children or owned a gun, their stance on gay marriage, the language(s) they speak, and their email address."

This is the type of information a political party would collect. The report did not state which political organization. The security researcher also discovered that the unprotected online database was accessed by others, including a user in Europe. The database is no longer online.

The report did not state who would notify affected persons, or when this might happen.


Emotional Technology: The Coming Products, Services, And Apps

A reader shared the video below with this comment:

"I don't know George, this sort of creeps me out."

My comments appear below the video:

My thoughts and reactions to the video:

  1. It should creep you out. Do you want technology between you and your spouse? During very private, intimate, face-to-face conversations? I think not.
  2. We consumers are already experiencing the beginnings of emotional technology. To make that tech work, companies must collect data about our moods and emotions. Some examples of this data capture: a) Facebook's expanded list of emojis; b) Facebook saves your unpublished and unedited comments and posts before final posting,
  3. Consumers decide when and where you want technology in your relationships. That line is already blurred. (Examples: devices with voice-recognition interfaces, such as Amazon Echo and Hello Barbie, that listen 24/7/365.)
  4. If I was a data broker, of course I'd want to capture your moods and emotions and link them to certain geo-locations and at times of day. Why? It's an opportunity to make more $$$ by selling to advertisers that emotional data so they can serve up supposedly relevant ads responding to your moods in those locations and/or times,
  5. Wearables, fitness trackers and smart homes outfitted with certain Internet-of-things devices will perform this mood data capture.
  6. Whenever somebody uses technology to offer convenience, watch out. There is usually are accompanying data capture, tracking, and privacy issues (e.g., notice, consent) embedded. Will companies adequately protect emotional information from data breaches? How will your government and law enforcement acquire, archive, and use moods information?

What are your opinions?


Voter Tracking, Data Collection, Analysis, And Privacy

While the New Hampshire primary and Iowa caucuses have passed, there are many more upcoming primaries this year before the general election in November. These primaries represent data collection opportunities for companies to learn more about voters. Marketplace reported:

"One company is tracking voter characteristics through some likely sources — their phones. Dstillery is a big data intelligence company that sells targeted advertising information about consumers to big companies like Microsoft and Comcast. But in the Iowa primary, the company tried its hand at compiling voter traits... people who loved to grill or work on their lawns overwhelmingly voted for Trump in Iowa... people who watched and supported NASCAR also happened to support Donald Trump and Hillary Clinton..."

Dstillery's has an impressive list of clients: AT&T, Cablevision, Comcast, DirecTV, Hulu, Sprint, T-Mobile, Verizon, Vonage, and many more. If you remember your college statistics classes, then you know that a correlation does not man causation. Things may happen together but it doesn't mean one causes the other. Being a NASCAR fan doesn't mean a voter will vote for certain candidates. Voting for certain candidates does not mean you will be a NASCAR fan.

This "big data" collection is also a reminder of how much we consumers share on social networking sites. All a consumer has to do is "Like" a brand (e.g., NASCAR, one of these top-10 barbeque grills, a particular politician, etc.) on Facebook.com, or "Follow" that brand (or politician) on Twitter and it is pretty easy for a big data intelligence company to collect, analyze, and compare voters preferences. (Facebook knows far more about you than you realize.) Even if you didn't "Like" or "Follow" a brand, the data collection is still pretty easy. All a big data intelligence firm has to do is troll through the metadata attached to photos you took with your phone and posted online: racetracks on Instagram, NASCAR cakes on Pinterest, or whatever else. You get the idea. The metadata attached to your photos recorded where and when you were (e.g., geo-location of the racetrack), the background scene (e.g., stands, pits, etc.), and the people (e.g., emblems on their clothes). This blog post explains what happens when you stop "Liking" posts and comments on Facebook.

The data analysis is also pretty easy because many most of you gave your mobile phone numbers to social networking sites so you could use their mobile apps. Both social networking sites and data brokers have two crucial data elements (e.g., your birth date, your phone number) to match, merge, and purge data about you. So, political campaigns (via data brokers and big data intelligence firms they hire) can understand pretty easily who actually voted, and for whom, at a particular voting location.

Is this a good thing? I guess your answer to that depends upon how much privacy you want associated with your voting activity. What you do within the voting booth may be private, but there are many players performing surveillance outside the booth to reveal what you did in the booth. And, if you aren't careful what you say in front of Internet-of-Things devices installed in your home (e.g., toys, smart televisions, smart speakers or search robots, etc.), then the data collection is probably even more extensive.

Is this a good thing?


Political Campaigns In The USA: Privacy And Security Issues

The Los Angeles Times provided a good primer about the privacy issues in the political system in the United States:

"... data for politics is not a new phenomenon. Presidential candidates began pioneering the approach more than a decade ago, and it was a key part of Barack Obama’s winning strategy in 2008 and 2012. But technological advancements, plunging storage costs and a proliferation of data firms have substantially increased the ability of campaigns to inhale troves of strikingly personal information about voters... as presidential campaigns push into a new frontier of voter targeting, scouring social media accounts, online browsing habits and retail purchasing records of millions of Americans, they have brought a privacy imposition unprecedented in politics. By some estimates, political candidates are collecting more personal information on Americans than even the most aggressive retailers... The campaigns and the data companies are cagey about what particular personal voter details they are trafficking in..."

Reportedly, one firm collected 500 data elements about each voter. That means, they know a lot about you.

What might those data elements be? Let's use Facebook.com as an example, since many consumers use the social networking services. If you are a member, you can see for yourself. Sign into your account with a web browser, select SETTINGS and then ADS. You'll see a page that looks similar to this:

Image of Facebook Ad Settings page. Click to view larger image

Chances are, your account settings were preset to automatically display targeted advertisements based upon your interests (e.g., what you "Liked," posted about, friends' posts you commented upon, even when you don't click "Like" buttons, music and fitness apps linked to your account, edited and unpublished posts, etc.). I'd already modified my account settings to suppress targeted ads, but that doesn't stop the data collection. Now, select the EDIT link next to "Ads based upon my preferences." When prompted, select the "View Ad Preferences" button. You will see a page that looks similar to this:

Image of Facebook Ad Preferences Categories page. Click to view larger image

Facebook has neatly arranged your preferences into several categories: Education, People, News and Entertainment, Travel, and more. Click on any category to view the items for that category. After selecting the "Lifestyle and Culture" category, I saw this:

Image of Facebook Lifestyle and Culture Ad Preferences view. Click to view larger image

You can click on each item to see details about that item. You can also mouseover an item to display a button to toggle on or off each item. That tells Facebook to either display or suppress targeted advertisements to you about that item. (I turned 95 percent of mine off.) If you "Like" the Facebook page for a specific brand, product, service, newspaper, organization, event, or person then the site is happy to catalog that and serve targeted ads from that entity, or other companies in that category.

This provides a huge clue as to the data elements Facebook has collected and shared with data brokers and its partners. Chances are, some of this information has already made its way via data brokers into the databases of political campaigns. You can read in this blog about data brokers and tech companies that have assisted social networking sites.

I've used Facebook.com as an example to highlight for consumers the data elements. The above images make it real. Data collected by social networking sites is so valuable, at least one credit reporting agency wanted it. As The Los Angeles reported:

"The data companies are required by law to keep the names of individuals separate from the pile of data accumulated about them. Instead, each voter is assigned an online identification number, and when a campaign wants to target a particular group – say, drivers of hybrid vehicles or gun owners – the computers coordinate a robocall, or a volunteer’s canvassing list, or a digital advertisement with relevant accounts. Since campaigns are ultimately in the business of finding particular people and getting them to show up to vote, some scholars are dubious their digital targeting efforts offer the same level of anonymity as those of corporations."

So, campaigns will re-assign names to information the data brokers have supposedly anonymized. Are you happy with that? Are you happy with political campaigns knowing this much about you? Are you confident that political campaigns adequately protect your personal information? Do you believe that you should have some say in what political campaigns collect and archive about you? Do you want control over your personal information?

Again, from the Los Angeles Times article:

"There is a tremendous amount of data out there and the question is what types of controls are in place and how secure is it,” said Craig Spiezle, executive director of the nonprofit Online Trust Alliance. The group’s recent audit of campaign websites for privacy, security and consumer protection gave three-quarters of the candidates failing grades... An exhaustive paper [New York University School of Law researcher] Rubenstein recently published on voter privacy found that “political dossiers may be the largest unregulated assemblage of personal data in contemporary American life.” Basic privacy guidelines that apply to other industries don’t appear to apply to candidates. Some do not even have clear privacy policies posted on their websites..."

Now you have an idea of what data is out there about you. If you want to turn off targeted ads displayed by Facebook, you can. You can't stop the data collection though. The data collection, archiving, and resale is part of most social networking sites' business models.

Are political campaigns reselling data to make money? Are you interested in what political campaigns have collected about you? Do you think it's accurate?


Leaked Documents From The Ashley Madison Data Breach Highlight The Company's Technology Vendors

The fallout continues from the data breach at infidelity website Ashley Madison. Besides several class-action lawsuits filed against Ashley Madison, Forbes magazine reported that stolen documents highlight the company's information technology (I.T.) vendor relationships:

"In response to challenges of the data’s authenticity, Impact Team began a second series of dumps, including what appears to be essentially all corporate records, including source code, internal business documents and corporate emails of Avid Life Media/Ashley Madison... Within those hundreds of thousands of documents is one entitled Areas of Concern – Customer Data (abbreviated in this article, AoC)... The needle in the treasure trove haystack of corporate data... In the AoC, the IT business practices of Avid/Ashley Madison began to emerge, including its relationships with third party vendors. New Relic is mentioned as one of three third party IT vendors to Avid. Also mentioned in that document as vendors are OnX (publicly reported as being an Ashley Madison vendor) and Redis/Memcached (alternative open source caching tools)... The AoC identifies New Relic as being a customer data “concern” (worry), by mentioning that it could employ “a hacker/bad actor” who could gain access to customer data. There was nothing in the AoC to indicate any reason to call out New Relic as a third party vendor presenting particular customer data security risks."

Assuming the leaked documents are accurate, one reason why this is important:

"The existence of third party IT vendors may be of interest to the increasing numbers of plaintiffs suing Avid and Ashley Madison. These plaintiffs have, to date, apparently not named these vendors as defendants."

Noel Biderman, the chief executive at Avid Life Media, Ashley Madison's parent company, resigned last week. The Wired article highlighted another reason:

"... the Missouri suit states that its anonymous plaintiff paid a $19 fee to have Ashley Madison delete her personal information from its servers but failed to deliver on that service."


Class-Action Lawsuits Filed Against Medical Informatics Engineering And Experian

Medical Informatics Engineering logo One result of the Medical Informatics Engineering (MIE) data breach has been a class-action lawsuit filed against MIE. The Journal Gazette reported on July 31:

"James Young, a patient whose medical information was compromised, filed the paperwork Wednesday in U.S. District Court in Fort Wayne. The Indianapolis man is seeking to create a class action, which would allow others who had personal information stolen in the data breach to join the lawsuit... Young alleges that MIE failed "to take adequate and reasonable measures to ensure its data systems were protected," failed to stop the breach and failed to notify customers ina timely manner."

In a Sunday, August 2 article, the Fort Wayne, Indiana-based Journal Gazette described the wide range of companies that access consumers' medical records:

"A lot more people than you realize, including your employer, your bank, state and federal agencies, insurance companies, drug companies, marketers, medical transcribers and the public, if your health records are subpoenaed as part of a court case. All those entities can access your records without getting special permission from you, according to Patient Privacy Rights."

Austin, Texas-based Patient Privacy Rights is an education, privacy, and advocacy organization dedicated to helping consumers regain control over their personal health information.

The Journal Gazette news article was the first report I've read disclosing the total number of breach victims. Reportedly, MIE sent 3.1 million breach notices to affected consumers nationwide. Help Net Security reported a total of nearly 5.5 million consumers in the U.S. affected. That includes 1.5 million consumers affected in Indiana, and 3.9 million consumers in other states. Compromised or stolen data goes as far back as 1997. Reportedly, the Indiana Attorney General's office has begun an investigation.

The Journal Gazette news article also discussed some of the ways stolen medical information can be misused:

"An unethical provider could bill an insurance company or the federal government for health care that it never gave you. Any amount not covered would then be billed directly to you, which could affect your credit score... Then there’s the issue of using sensitive medical information for marketing – or even for blackmail. Let’s say someone was treated for AIDS, hepatitis C or a sexually transmitted disease. A company selling prescription drugs or other products might like to target that patient for advertising. But sending brochures or coupons in the mail could tip off others about the condition. Someone with those or similar medical conditions could face discrimination in hiring..."

Experian logoIn a separate case, a class-action was filed against the credit reporting service Experian. The Krebs On Security blog reported on July 21:

"The suit alleges that Experian negligently violated consumer protection laws when it failed to detect for nearly 10 months that a customer of its data broker subsidiary was a scammer who ran a criminal service that resold consumer data to identity thieves... The lawsuit comes just days after a judge in New Hampshire handed down a 13-year jail sentence against Hieu Minh Ngo, a 25-year-old Vietnamese man who ran an ID theft service variously named Superget.info and findget.me. Ngo admitted hacking into or otherwise illegally gaining access to databases belonging to some of the world’s largest data brokers, including a Court Ventures— a company that Experian acquired in 2012. He got access to some 200 million consumer records by posing as a private investigator based in the United States... The class action lawsuit, filed July 17, 2015 in the U.S. District Court for the Central District of California, seeks statutory damages for Experian’s alleged violations of, among other statutes, the Fair Credit Reporting Act (FCRA)..."

I included information about both class-actions in a single blog post since both companies are of interest to consumers affected by MIE's data breach. MIE has offered breach victims two years of free credit monitoring services from Experian.