What if the Data is Wrong
Updated: Oct 11
In the summer of 2021, I was flying out of Baltimore and struck up a conversation with a woman who worked as a doctor at Walter Reed, one of the leading US hospitals.
Like just about everyone at the time, I was spending all my free moments diving into Covid charts and data like an amateur statistician, trying to suss out what the future would bring. Finally(!), I had a chance to speak with someone about the subject who knew more than the average person.
“I noticed there was a huge spike in covid cases in India recently, that just shot up out nowhere. It went crazy high, and then fell just as abruptly” I said, “Do you have any idea what they did and how this happened?”
She looked at me seriously and said, “The Indian government is putting out fake numbers. They want to protect their positions and are covering up the problem.”
This answer struck me as true. If there’s one thing that holds up throughout history, it is that governments lie. The powerful like their elevated position in society just fine and are willing to lie to protect themselves from criticism. But it also struck me that if the government of India would lie about their covid numbers, it’s likely my government would do the same.
Two years later, it looks like I was right to be suspicious. I lost faith in the covid infection rate stats when they started including something called “asymptomatic covid.”, that is to say, covid but with no symptoms. What? I’m sorry but that’s like saying there’s a heat wave without a corresponding increase in temperature.
But more damning was the New York Times admitting the covid death toll was exaggerated. In an article printed on July 17, 2023, they wrote: “The official number [of covid deaths] is probably an exaggeration because it includes some people who had virus when they died even though it was not the underlying cause of death. Other C.D.C. data suggests that almost one-third of official recent Covid deaths have fallen into this category. A study published in the journal Clinical Infectious Diseases came to similar conclusions.”
On this at least, it looks like the conspiracy theorists were right – the government really did count car accident victims as covid deaths if the person had a positive test result at the time of the accident.
This was, I admit, a bit of a shock for me when I realized a significant amount of the data I had been pouring over was fake. I’ve spent the last 25 years working in technology, a field where we’ve been trained to venerate and deify statistics and data. For at least the past ten years, the phrases most commonly heard in my industry are “data-driven marketing” and “data-driven decision making.”
It’s a nice concept – that the answer can be deduced in statistical analysis. The answer is in there, the theory holds, you just need to dig to uncover the truth hidden in the numbers. Tech marketers like myself need this reassurance. We are under a lot of pressure because our budgets are finite and have to be allocated in a way that is profitable or we could lose our job. It’s wonderful to think, “We will spend this money this way because the data indicates that is the best course of action.”
But that is all thrown out the window if the underlying data is false. What if, as they were with covid, the numbers are fake? We all seem to have forgotten the quote often attributed to Mark Twain: “There are three types of lies - lies, damn lies, and statistics.” All decision making is made meaningless if the underlying data is wrong and that, unfortunately, seems to be the trend we are seeing not in Twitter posts but at the highest echelons of society.
In June, Harvard Business School professor Francesca Gino was placed on leave by the University after several people, including a colleague, came forward with claims that she tampered with data in at least four papers. The professor was known for writing dozens of widely read studies in the field of behavioral science and consulting for some of the world's biggest companies like Goldman Sachs and Google, as well as dispensing advice on news outlets, like The New York Times, The Wall Street Journal and NPR.
Then in July, the President of Stanford University, one of the most prestigious universities in the world, resigned after an investigation found 12 academic papers he authored contained manipulated data. The manipulations included lab panels that had been stitched together, panel backgrounds that were digitally altered and blot results taken from other research papers.
These two recent events are part of an ongoing phenomenon called “The Replication Crisis”, a crisis initiated in the early 2010s after it was discovered many scientific studies are difficult or impossible to reproduce. To cite one example, a 2012 paper by C. Glenn Begley, a biotech consultant working at Amgen, and Lee Ellis, a medical researcher at the University of Texas, found that only 11% of 53 pre-clinical cancer studies had replications that could confirm conclusions from the original studies.
This discovery and ongoing crisis has called into question the credibility of theories based upon our entire body of scientific research. There’s a tremendous social cost here – how can we choose an effective cancer treatment if only ten percent of research can be verified by replication? Further, which reports are part of the ten percent? It’s impossible to pick out the true from the false when the vast majority is meaningless noise.
Why do they do it? In academic circles the pressure is “publish or perish.” To secure a tenured position, academics need to get published in as many A-level publications as possible. These journals want research that is ground-breaking enough that they earn media coverage. The temptation is there for academics to tinker with the numbers to deliver the revolutionary discovery the journals crave, especially after years of conducting expensive experiments which may prove fruitless.
There’s also the additional temptation to launch oneself into celebrity status, deliver a TED Talk watched by millions, become an internationally recognized expert on NPR, and give paid speeches at Goldman Sachs. This can be done by creating a false positive, to massage the data to manufacture a meaningful message where there is none.
These fake studies certainly paid off for Gino. In the 2019-20 academic year, she earned more than one million dollars at Harvard, in addition to a lucrative side gig giving paid speeches for major corporations. Her speaking fees were in the tens of thousands of dollars and she claims to have travelled to 40 states and 30 countries giving them.
Don’t Trust the Science?
This is all a pretty bleak worldview and I’ve never been a fan of documenting problems without also listing solutions. What can we do?
My first suggestion is to stop blindly trusting the data and the science built upon it. Science is a constantly evolving field where nothing is certain; even gravity is listed in physics textbooks as a theory. During Covid we seemed to have forgotten that science is skeptical first and foremost, filled with rigorous debate over everything.
We should all be deeply skeptical of people who claim to have easy, neat solutions to enormous problems, even if they have an advanced degree and are wearing a lab coat. When you hear a new idea or concept, judge it by its merits, not by the person who said it.
Further, trust in science is built upon reproducibility – when an experiment can be repeated over and over again. This is the best kind of science and that reproducibility takes time, often years.
The academic field should pivot away from the pursuit of celebrity by publishing sensational studies and get back to the job of teaching and real research. A scientific journal that publishes all research, whether it succeeds or fails, would help alleviate the pressure felt by academics. The industry needs to start valuing truth over attention-grabbing headlines before people tune out completely.