simpson-paradox

The Simpson Paradox And Why It Matters In Business

In statistics, the Simpson Paradox happens when a trend clearly shows up in clusters/brackets of data. But it disappears or at worse it reverses when the data is grouped and combined. In short, the Simpson paradox shows that when the data moves from clusters to combined data, it hides several distributions, which end up creating a biased overall effect.

The Simpson paradox origin story

As Tom Grigg explained extremely well, the Simpson paradox took its name from Edward Hugh Simpson thanks to a technical paper in 1951.

Yet it was made popular when another statistician, Peter Bickel, was called – in 1971 – to analyze the admission data at UC Berkley’s suspected gender bias.

As the story goes, the university feared a lawsuit, so they had the data analyzed by Bickel.

When the data was combined it really gave the impression that more males had been selected over women.

In fact, of the total male applicants, 44% were selected and of the total female applicants 35% were selected.

Yet when the data were analyzed by the department, it showed something completely different.

In four out of the six departments analyzed, the admissions were biased toward women.

But, as women applied to departments where fewer applicants were selected when the data got combined it gave an impression of bias toward male applicants.

Understanding the Simpson paradox

A good example is Nassim Taleb’s video on the topic.

While this is related to vaccine data, it can be easily translated into business as we’ll see.

As Taleb explained in relation to the vaccine data.

When the data are grouped under the same umbrella, after having been analyzed in clusters and homogeneous groups, it suddenly gives an opposite effect.

It’s like the data not only doesn’t give the same result when analyzed in brackets, but it gives the reverse effect.

This is what happens when the Simpson paradox messes up the statistics data.

Why? Intuitively, when data, before compared under brackets, get combined it disperses, thus making that worthless for the initial scope.

In the case, of the vaccine, because many people over 60s were vaccinated, and a few people under 20s were vaccinated, when the data gets combined it’s skewed toward the mortality of people over 60s, thus creating a bias, and.

Beware of the Lurking variable

To keep things short, hidden variables in the combined spurs the overall analysis, making it worthless.

This is known as a “lurking variable” or a variable that affects the data at the point of creating a “spurious association” (in short, the cause-effect relationship ceases).

The Simpson paradox in business

The Simpson paradox can hide in many of the business and marketing analyses out there, as when the data is combined it’s easy to mistake a correlation with causation.

Take the case of, as explained by adexchanger.com, for instance, when deciding on a programmatic campaign, when looking at the data for gender only, it shows how the male budget has seemingly more conversions, thus skewing the data toward males.

Yet from an age analysis, you figure that females between 18-24 have higher conversion rates.

If you don’t understand this bias, it’s easy to overspend on an audience that is overrepresented not because it’s more aligned with your audience, but rather because you’re reading the data in the wrong way.

And as you can imagine, this can have substantial consequences on your bottom line (money wasted on ineffective campaigns, and lost revenues as you’re not targeting the right audience).

Key takeaways

  • The Simpson paradox is an effect that in statistics and probability can create biased analyses. In fact, when present the data combined from an analysis gives a reverse effect compared to the data analyzed in buckets.
  • The Simpson paradox can create biased analyses also in business and marketing creating overspending toward the wrong audience.
  • The Simpson paradox also makes it much harder to make decisions in business when doing statistical analysis.

Related Business Concepts

heuristic
As highlighted by German psychologist Gerd Gigerenzer in the paper “Heuristic Decision Making,” the term heuristic is of Greek origin, meaning “serving to find out or discover.” More precisely, a heuristic is a fast and accurate way to make decisions in the real world, which is driven by uncertainty.
recognition-heuristic
The recognition heuristic is a psychological model of judgment and decision making. It is part of a suite of simple and economical heuristics proposed by psychologists Daniel Goldstein and Gerd Gigerenzer. The recognition heuristic argues that inferences are made about an object based on whether it is recognized or not.
representativeness-heuristic
The representativeness heuristic was first described by psychologists Daniel Kahneman and Amos Tversky. The representativeness heuristic judges the probability of an event according to the degree to which that event resembles a broader class. When queried, most will choose the first option because the description of John matches the stereotype we may hold for an archaeologist.
take-the-best-heuristic
The take-the-best heuristic is a decision-making shortcut that helps an individual choose between several alternatives. The take-the-best (TTB) heuristic decides between two or more alternatives based on a single good attribute, otherwise known as a cue. In the process, less desirable attributes are ignored.
biases
The concept of cognitive biases was introduced and popularized by the work of Amos Tversky and Daniel Kahneman since 1972. Biases are seen as systematic errors and flaws that make humans deviate from the standards of rationality, thus making us inept at making good decisions under uncertainty.
bundling-bias
The bundling bias is a cognitive bias in e-commerce where a consumer tends not to use all of the products bought as a group, or bundle. Bundling occurs when individual products or services are sold together as a bundle. Common examples are tickets and experiences. The bundling bias dictates that consumers are less likely to use each item in the bundle. This means that the value of the bundle and indeed the value of each item in the bundle is decreased.
barnum-effect
The Barnum Effect is a cognitive bias where individuals believe that generic information – which applies to most people – is specifically tailored for themselves.
nudge-theory
Nudge theory argues positive reinforcement and indirect suggestion is an effective way to influence the behavior and decision making of individuals or groups. Nudge theory was an idea first popularized by behavioral economist Richard Thaler and political scientist Cass Sunstein. However, the pair based much of their theory on heuristic research conducted by psychologists Daniel Kahneman and Amos Tversky in the 1970s.
bullwhip-effect
The bullwhip effect describes the increasing fluctuations in inventory in response to changing consumer demand as one moves up the supply chain. Observing, analyzing, and understanding how the bullwhip effect influences the whole supply chain can unlock important insights into various parts of it.
einstellung-effect
Maslow’s Hammer, otherwise known as the law of the instrument or the Einstellung effect, is a cognitive bias causing an over-reliance on a familiar tool. This can be expressed as the tendency to overuse a known tool (perhaps a hammer) to solve issues that might require a different tool. This problem is persistent in the business world where perhaps known tools or frameworks might be used in the wrong context (like business plans used as planning tools instead of only investors’ pitches).
hawthorne-effect
The Hawthorne Effect refers to an inclination of some people to work harder or perform better when they know they are being observed. The effect is most associated with those who are experiment participants, who alter their behavior due to the attention they are receiving and not due to any manipulation of independent variables. Therefore, the Hawthorne Effect describes the tendency for a person to change their behavior with the awareness that they are being observed.
simpson-paradox
In statistics, the Simpson Paradox happens when a trend clearly shows up in clusters/brackets of data. But it disappears or at worse it reverses when the data is grouped and combined. In short, the Simpson paradox shows that when the data moves from clusters to combined data, it hides several distributions, which end up creating a biased overall effect.
nudge-theory
Nudge theory argues positive reinforcement and indirect suggestion is an effective way to influence the behavior and decision making of individuals or groups. Nudge theory was an idea first popularized by behavioral economist Richard Thaler and political scientist Cass Sunstein. However, the pair based much of their theory on heuristic research conducted by psychologists Daniel Kahneman and Amos Tversky in the 1970s.

Main Free Guides:

Scroll to Top
FourWeekMBA
[class^="wpforms-"]
[class^="wpforms-"]