“Does every company have horrible data quality?”

On Reddit’s r/analytics, user CafinatedPepsi writes, “Been in my first role as a data analyst for a bit over a year now. Every analysis I’ve done has some different issue – missing data, data is incorrect, etc. I’ve gotten very good at backing into numbers & making assumptions which make sense in the context of the business, but it makes any automation very difficult (almost every project requires some aspect of manual entry, to varying degrees). Is this problem widespread across the industry, or is my company the exception?”
The top comment answers, “There are only two types of data – missing, and bad.”
Statistically speaking, they’re right.
In an era of AI copilots, real-time dashboards, and cloud platforms that promise to “unify everything,” most companies still struggle with basic data quality.
A 2024 survey revealed that “underperforming AI programs/models built using low-quality or inaccurate data cost companies up to 6% of annual revenue on average.” A 2025 study found that “64% of organizations identify data quality as their top data integrity challenge, with 67% lacking complete trust in their data for decision-making.”
Compared to other sectors, healthcare has some of the richest, most critical data of all. And it’s also the lowest quality.
Multiple studies and expert analyses point to healthcare as one of the most fragmented, error-prone, and inconsistent industries when it comes to data. The WHO estimates that as much as half of all medical errors are due to administrative errors (e.g., transcription errors, i.e., bad data).
“The healthcare system I used to work with was so janky they hired 3 people full time as data quality officers.”
– u/50_61S—–165_97E
So, why is healthcare’s data quality so bad in particular?
One of the biggest culprits is systemic fragmentation. Health data has to move back and forth across different software platforms and health systems in different formats and without consistent standards, making data reconciliation difficult at best.
In a thread titled “What’s your favorite data quality horror story?” u/Professional_Bad_536 says, in part, “Ingesting healthcare [data] for multiple health systems and creating one unified model… it’s a neverending nightmare.”
But data doesn’t just move between systems. It moves between people. From front-desk staff to clinicians to billing coders, each handoff is an opportunity for data to be misinterpreted, delayed, or lost. These communication gaps introduce errors that most structured systems, no matter how capable they are, simply can’t prevent.
“The data is always bad, and good luck trying to convince management to enforce better data quality.”
– u/HardCiderAristotle
Another Redditor posts, “10 years on the tech field, let me tell you ‘having AI’ (or until a few years ago ‘a ML model’) is much more important to business stakeholders than data quality.”
If that sounds like the reality at your organization, then you might be in a lot of trouble. This is the Age of AI, there’s no doubt about it. But AI runs on data.
Forbes has reported that as many as 80% of all AI projects fail. The successful ones, they say, were treated as data projects first: “It might seem somewhat obvious to many that AI projects are data projects, but perhaps the AI failures need to understand this at a greater level of detail. What makes an AI system work isn’t specific code, but rather the data.”
Data quality isn’t just about getting AI to work, though. It’s about getting everything to work. It’s the foundation of every forecast, every decision, every claim, every diagnosis. When it’s bad, everything built on top of it is bad. So, given all the challenges, how do you make it actually good?
It starts with human-in-the-loop automation: capturing clean data at the source, reconciling it across systems, and flagging issues before they escalate. But it also means making data sovereignty and portability an organizational priority. When you own your data, control how it moves, and avoid vendor lock-in, you’re far more capable of maintaining consistency and accuracy over time. True data quality isn’t just about fixing today’s errors… It’s about making sure tomorrow’s data is better by design.