Thomas Bayes was an 18th century minister, the fruit of whose work I currently study.
Bayes was curious about probabilities, which in the 1700s primarily meant things like predicting dice rolls, coin flips, and the position of billiard balls. We don’t flip coins very often so here’s a more modern example that can be used to understand his line of study.
A Covid test says that you have Covid. The test is 95% accurate and would sometimes yield a false positive, telling that you have Covid while in reality, you don’t, measured during the pandemic. It’s 2026 and you’re positive. Do you really have Covid? Intuitively, you say “Yes, 95% chance is a lot”. But if you test the 1700 population England with the same test, 5.5 million people in total, you’d get 275000 false positives (or less, assuming part of the accuracy issues are false negatives). We tested 1700 England and declared a Covid pandemic 300 years before it happened.
The missing piece, according to Bayes, is the prior probability: how likely it was that you had Covid before taking the test. If Covid is very common, a positive result strongly suggests that you are infected. However, if Covid is rare and only a small fraction of the population is infected, even a highly accurate test can produce enough false positives that a positive result may be meaningless and using even a very accurate test is counter-productive.
So, Thomas Bayes came up with the following theorem:
The probability of a hypothesis given some evidence equals the likelihood of observing that evidence if the hypothesis were true, multiplied by the prior belief in the hypothesis, and divided by the overall probability of observing the evidence. In practice, it provides a formal way of answering the question: “Given what I already believed, how much should this new information change my mind?
Bayes’ theorem combines the test accuracy with the prior likelihood of infection to estimate the actual probability that you have Covid.
That thinking is wonderful, and it created a cult following, very strong in the line of Software Engineering. However, it’s not unambiguous, and not universally applicable. Imagine I’m polling for two presidential candidates. I want to guess who will win based on the data we have, let’s say, 1000 interviews across the country. Where’s my prior knowledge? How do I fit in Bayes into that?
I studied Stats from 9th to 12th grade in high school, we had statistics every semester. Then I studied it during my bachelors, together with a separate exam in probability. That was awhile back but I remember enough that my teachers were frequentists, their approach in inference revolved around the null hypothesis and the normal distribution – you’d define a hypothesis you wanted to disprove, collect data, and calculate a p-value to decide whether the evidence was strong enough to reject it. The underlying assumption was that probability meant the long-run frequency of an event across many repeated trials, not a degree of belief. The alternative approach to look into it, introduced by Thomas Bayes was not a highlight, leaving a gap in both my knowledge, and my intuitive understanding of data, which I’m trying to fill.
Okay, so why I’m writing all of this? Because it’s in my mind. Making sense of data seems to be significantly harder than the surface level analysis. I want to improve my understanding and have acquired a collection of books on the subject. Currently reading Everything Is Predictable: How Bayes’ Remarkable Theorem Explains the World. It’s a popular science book, not a school book, but I think it’s a good introduction to this idea before looking into more complicated math. Wish me luck.
Good luck! I am one of those goofy people who actually appreciate statistics and can generally do them from scratch (after a relearning period, I rarely retain the specifics). The fun part is when you start messing around with multivariate statistics (as I occasionally do with my job). It’s interesting to take a multidimensional group of data (sometimes up to 30 dimensions or more in my job) and reduce it to visual patterns based on 2 or 3 coordinates.
LikeLike