< Onymos Blog

The White House Says Anonymizing Data Doesn’t Work

Data security practices

We tend to believe our data is safe when we share it with the various organizations we encounter online and in person. We might get a targeted ad or push notification here and there, but most of us accept that’s just “the way things are.”

Is our data actually safe though? According to the White House, if it ever was, it’s not anymore. 

“Entities in the United States healthcare market can access bulk sensitive personal data…”

In an executive order issued on February 28, 2024, the White House wrote, “Entities in the United States healthcare market can access bulk sensitive personal data, including personal health data and human genomic data, through partnerships and agreements with United States healthcare providers and research institutions. Even if such data is anonymized, pseudonymized, or de-identified, advances in technology, combined with access by countries of concern to large data sets, increasingly enable countries of concern that access this data to re-identify or de-anonymize data, which may reveal the exploitable health information of United States persons.”

To understand the reasoning behind the White House’s remarks and the executive order itself, we need to understand what “anonymization” and “pseudonymization” actually mean.

  • Anonymization: Anonymization is a data security measure where identifiable information (like “Name,” “Date of Birth,” “Address,” “SSN,” etc.) is removed or significantly modified. Anonymization was thought to be irreversible, meaning that once the data was anonymized, it could no longer be reassociated with the original individual. 

    Until recently, anonymized data was no longer considered personal data under existing privacy regulations. Your anonymized data was free to be used for analysis and research.
  • Pseudonymization: Similar to anonymization, pseudonymization involves replacing certain identifiable information with pseudonyms. For example, the name “Albert Einstein” could be replaced with “John Doe.”

    The difference is pseudonymization is intended to be reversible, meaning that with additional information (stored separately) on how a pseudonym was created, it can be decoded. That’s why pseudonymized data is considered personal data under privacy regulations.

    This is commonly used for clinical trials, where the person performing the analysis will not know who the real patient is. The pseudonymization can be reversed after the analysis is complete (and if required).

SaaS + social media + AI = vulnerability

As individuals (and organizations) we assume a lot about our data security. If you share your Protected Health Data (PHI) or Personally Identifiable Information (PII) at a clinic or hospital, you probably assume that the facility will carefully protect it.

And, collectively, we’ve always believed that even if that facility’s data security were to be breached by a threat actor, de-identification (anonymization or pseudonymization) would help keep us safe.

Except, now, we know that’s not true.

There are three things that make us more vulnerable to threat actors today than we have been in the past: the ubiquity of social media, the interconnectedness of our software infrastructure, and AI.

How it might happen

Imagine threat actors gain access to some of a medical facility’s data through a careless third-party software vendor. At this point, they aren’t able to identify individual patients because the data is “anonymized,” but they might be able to see that patients are being treated for specific conditions and on what dates those patients were admitted or discharged. That’s enough to start de-anonymizing it.

They could cross-reference data from other leaks, such as transaction history from credit cards, to identify individuals who made payments to that medical facility on certain dates or in certain amounts. They could use social media to look up individuals in the town or neighborhood the medical facility serves and analyze their post histories, timestamps, and photos.

If that all seems too time-consuming to do manually, you’re probably right — but with AI, they don’t have to do it manually. They can replace all of that manual fact-finding with machine learning to drastically reduce the time it takes to detect the patterns that expose our anonymity.

Putting it all together, the threat actors can now quite clearly identify who is undergoing treatment for sensitive conditions like drug addiction, cancer, or STDs — and then use that information to leverage those individuals.

The data security battle is not lost (yet)

While the White House’s new executive order paints a scary picture of our modern data security practices, we still have tools to fight back.

Become more aware of the data trails you leave behind. We need to feel confident saying, “No!” to requests for our information. Does your salon need your email? Does the grocery store need your phone number? Does Facebook need your address?

Organizations need to think like this too — does a third-party software or service need your data to function? If so, you might be better off finding a partner who prioritizes your security and control over their own profit.

We have to reimagine data sharing and create new norms. If we don’t, the biggest data breaches in history are still in the future.

Ask us if we've already built the solution you need

Building new apps from scratch is a waste of your developers’ time and skills. Get core features your app needs now — because we already built them for you.

Talk to an expert

We know app dev

What does the latest iOS release tell us about Apple’s strategy? Does tech have an innovation problem? Is your team ready for a passwordless future? Subscribe to our blog for:

  • Trends in app development
  • Research reports
  • Demo videos and more

Subscribe to the Onymos blog