Legal probabilism for dummies – Introduction to the central tenet and Bayes Rule
Updated: Jan 24, 2022
How can a mathematical theorem originating from the 18th century help us to bust our cheating partner and also analyze, model, and improve the evaluation of evidence and the process of decision-making in trial proceedings? Furthermore, how a big Blue Bus comes into all this? Our post clarifies everything!
The flirtatious relationship between law and probability
Probability is a branch of mathematics conceptualizes uncertainty and renders it tractable to decision-making. It may be considered a significant branch of the broader topic of “reasoning under uncertainty”. Probabilistic reasoning may be somewhat alien to ordinary people, but understanding, reaching, and reporting probabilistic judgments are not: people deal with probabilities all the time daily. For instance, the probability of rain, some team’s winning, a given election result, etc.
Given the fact that no conclusions reached in legal disputes can be stated with absolute certainty, probabilistic concepts are used daily in inferences in law as they are in other contexts. Several reasons might come to us why conclusions based on evidence are necessarily probabilistic in any context, specifically law. Our evidence is always incomplete, commonly inconclusive, often ambiguous, or dissonant, and comes to us from sources who/that have every gradation of credibility shy of perfection —these matters all influence how we assess the force of evidence and state the legal standards of proof. One important point not to miss here is that all evidence is ultimately probabilistic in nature.
The use of probabilities in legal proceedings (both criminal and civil) has a long but not well-distinguished history that has been very well documented.
The earliest reported case of a detailed probabilistic analysis presented as evidence was in the Howland case of 1867, where the claimant attempted to demonstrate that a contested signature on a will had been traced from the genuine signature by arguing that their agreement in all 30 downstrokes was extremely improbable under a binomial model.
Historically another noteworthy mention is the infamous Dreyfus case,where the statistical evidence was sadly fundamentally flawed.
The real long-drawn-out flirtation between law and probability started when psychologists first compared human judgment to probabilistic norms in the 1960s. Daniel Kahnemann and Amos Tversky picked up this idea in a whole series of studies that ultimately led to Kahnemann getting the Nobel Prize. Many early works were looking at the problem of “How much do you revise your opinion in light of newly obtained data?”.
The studies showed that
people tend not to revise their views
as much as a probabilistic analysis would suggest.
This result caused concern in the legal community. It led to the creation of a scholarship and an enormous legal literature that was prompted by the suggestion that considering that people are conservative relative to Bayesian norms in updating their views in light of the evidence, in order to help them use the evidence better, they should be coached on probabilistic norms and reasoning.
This scholarship is known as ’legal probabilism’ that relies on probability theory to analyze, model, and improve the evaluation of evidence and the process of decision-making in trial proceedings. It is a theoretical framework that helps address different questions: if we can rely on probabilities in legal decision-making and employ probabilities to clarify standards of proof or determine the probative value of evidence in dispute resolution.
Legal probabilism remains a minority view among legal scholars. Nevertheless, it attained more significant popularity in the second half of the twentieth century in conjunction with the law and economics movement. The discovery of DNA fingerprinting boosted this popularity as many legal probabilists started to focus on how probability theory could be used to quantify the strength of a DNA match. Recent work in Artificial Intelligence made it possible to use probability theory—in the form of Bayesian networks—to evaluate complex bodies of evidence consisting of multiple components.
It is noteworthy that legal probabilism is heterogeneous, including several different probabilistic conceptions about assessing the probative force or weight of evidence (such as conventional probability and Bayes’s Rule, Baconian probability or Wigmore and the fuzzy weight of evidence).
Legal probabilism is no doubt controversial. There’s a long-standing debate about whether legal proof should even be conceptualized as probabilistic instead of some other kind of logic. Legal scholars are deeply divided on this question—the critics of probabilistic proof point to the dangers of probabilistic methods of adjudication. There is a persistent attitude among some members of the legal profession that probability theory has no role to play at all in the courtroom. The proponents of probabilistic reasoning in law, in contrast, emphasize the dangers of not using such probabilistic methods.
The Bayes Theorem - the fundamentals
Bayes’s Rule (or Bayes Theorem) is a mathematical formula based on conditional probability that can be applied to update probabilities of issues in the light of new evidence. The Bayesian type of reasoning begins with a prior probability of an issue and some pertinent item of evidence. Bayes’ Theorem calculates a posterior probability for the issue, conditioned on the combined value of the prior probability and the likelihood ratio for the evidence. This posterior probability can then be treated as a new prior probability. A further additional piece of evidence can be added, and a new posterior probability can be calculated (now taking account of the original prior probability and the likelihood ratios for both pieces of evidence). The process can be repeated over and over, finally resulting in a posterior probability conditioned on the entire corpus of evidence in the case.
It is easy to see why and how Bayes very much fits into what the law is about:
we have some prior belief in some hypothesis, like whether a person is liable for damage, and then we get evidence, and we have to update our belief once we see the evidence. Bayes’ Rule is the rational way for doing that probabilistic updating.
People’s process of reasoning, the mechanisms they use for dealing with probability and frequency data mostly are not Bayesian. We use various rules-of-thumb, mental shortcuts, heuristics, which often work reasonably well but sometimes lead us awry. As Daniel Kahnemann and Amos Tversky pointed out: “In his evaluation of evidence, man is apparently not a conservative Bayesian: he is not Bayesian at all.”
It isn't easy to execute the Bayes type of reasoning despite the Bayes Theorem being the same type of probability we already know, just in reverse. In our typical probabilistic experience, we have likely seen situations where we have a known starting point, and we have to calculate the probability of an outcome. The case-book exercise we all did in high school sounds that
„if I know that I am holding a six-sided die, what is the probability that I will roll a 3?”
As mentioned, here we know the current state and want to calculate the probability of something in the future. In our other typical experience, we have also seen situations where we have two events in a series. In that case, as we learned in math class, those probabilities get multiplied together.
Now with the Bayes theorem, we can do the reverse of the same questions. Instead of asking, “What are the odds of drawing a six-sided die and rolling a 3?” we pose the question of
“I rolled a 3, what are the odds that I had a six-sided die?”
What makes it worth bothering with Bayes Theorem is that we live in a world where we frequently see outcomes but have to guess the initial events that caused those outcomes. Bayesian decision theory permits probabilities to express the decision maker’s degrees of belief in propositions and could be applied by a fact finder facing a decision about liability or guilt in a court of law. Bayesian decision theory applies in principle to any decision, particularly cognitive decisions such as choosing a hypothesis, theory, or axiom.
We can express Bayes’ Rule with the following equation:
’Prior’ means our initial estimate of probability before we know the result of our data. ’Likelihood’ means the probability that any given initial condition would produce the result that we have got. The ’Normalizing Constant’ is the sum of the probability of all the conditions which satisfy our result. ’Posterior’ is the result that we are looking for.
This equation will not be immediately understandable to us, so instead of focusing on it, it will be more intuitive to show how to solve a problem and then show how the equation fits in.
How to apply the Bayes Rule? - A not so fun example
During my studies (both legal and psychological studies), I received several explanations for the above equation, but none of them seemed a simple on-the-go explanation. So I was relieved when I got to learn that it is almost common sense that by far the best explanation comes from the book of Nate Silver titled ’The Signal and the Noise’:
Imagine that you sit next to your partner on the couch watching Netflix. You see a text on your partner’s phone with a heart emoji from an unknown number. You cannot help starting to ruminate: what is this number? Who sent the heart emoji? What are the chances that your partner is cheating on you?
The new evidence (data) you learned is the text with the heart emoji (which is B in our equation), and you are trying to estimate your hypothesis of whether your partner is cheating on you.
Following Bayes’ Rule and the above equation, you can find your partner's probability of a cheater. All you have to do is to explore three questions about your dilemma:
1. You have to forget about the evidence and ask the question: what are the chances that s/he was cheating on you? We called it the prior in our equation P(A). If you have not seen the text, what chances would you assign to your partner cheating? Luckily enough, we have hard empirical data to answer this question as respective studies have found that for any partner in any given year, that number is 4%. 2. The second question is, what are the chances that the text with the heart emoji appears, given your partner is not cheating on you. In other words, this is the best possible scenario: someone incidentally sent the text, or a close friend sent the text. It is P(B) in our euqation. Let’s assume it is a 50 percent probability. 3. The last question is the chances that the text with the heart emoji appears given that our partner is not cheating on us. In other words, it is the worst-case scenario (that is P(B/A) in our equation). If our partner is cheating, it makes sense that the new person is texting them. However, they would be more careful about it. Let’s assign to this a 70 percent chance.
In summary, you need to consider and estimate three things: a clean probability without any evidence (the possibility of the prior), the chances of the best-case scenario given the evidence, and the chances of the worst-case scenario given the evidence.
After using our equations, we conclude that the posterior P(A/B) possibility is 5.6% which is not that bad, after all.
We have to note that this example has subjective probabilities of the worst and best case probabilities. However, we can assign our weights to any given situation.
If we come home from a business trip and find a women's underwear in our bedroom, the probability of the best-case scenario would change as
it is hard to think why underwear would appear in a bedroom
if our husband is not cheating on us.
We must see that agreeing on exact and objective probability values can be extremely difficult, especially for prior probabilities. This is one of the main challenges to tackle for legal probabilism.
Ookay! But how do the legal decisions come into play? - The Blue Bus Problem
One may say that „Okay, it is all nice and terrific that now I can calculate the chances of my partner’s infidelity but how can I use the Bayes Theorem in legal settings, especially in civil law context?”.
To put the role of Bayesian reasoning in legal proceedings in context, we now will apply the Bayes Theorem to an actual civil law case that is also going down in history as one of the most reported proof-paradoxical scenarios illustrating the intuitive case against imposing sanctions based on bare statistics: this is the "Blue Bus Problem".
The “Blue Bus Problem,” is a famous hypothetical or “proof paradox” debated and discussed by legal scholars, mathematicians, and philosophers that illustrates probabilistic proof.
This thought experiment poses fundamental questions about probabilistic proof and the legal process generally. First, we present the problem and then solve it by standard Bayesian reasoning. We provide proof why the Bayesian approach to probabilistic proof is a valuable method for evaluating all forms of evidence.
The "Blue Bus Problem" is not as artificial as one might initially suppose. It is based on an actual legal case (Smith v. Rapid Transit, Inc.) and has subsequently found close analogs in actual case law.
While driving late at night on a dark, two-lane road, a person confronts an oncoming bus speeding down the road's centerline in the opposite direction. In the glare of the headlights, the person sees that the vehicle is a bus, but he cannot otherwise identify it. He swerves to avoid a collision, and his car hits a tree. The bus speeds past without stopping. The injured person later sues the Blue Bus Company. In addition to the facts stated above, he proves that the Blue Bus Company owns and operates 80% of the buses that run on the road where the accident occurred. Can he win?
The blue bus problem has been stated and restated in different ways over the years. Nevertheless, all versions of the blue bus problem share the same information content. Specifically, these various versions of the blue bus problem provide us with two pieces of information:
1. The plaintiff was injured by a blue bus.
2. The defendant operates four-fifths or 80% of all the blue buses in town.
We refer to this as the standard version of the blue bus problem.
Since this is a civil case, the plaintiff must prove her case by “a preponderance of the evidence.” Stated in probabilistic terms, it must be more likely than not that the defendant’s bus caused the plaintiff to swerve into the parked car. However, the problem in the Smith case is that the only evidence linking the defendant’s bus line to the scene of the accident is probabilistic. That is since by the plaintiff’s admission she did not see which bus was going down Main Street at the time of the accident; the only evidence linking the defendant’s bus to the scene of the accident is the defendant’s published timetable or schedule.
This problem presents a controversial legal issue:
whether this probabilistic proof alone is enough for the plaintiff to prove her case.
Should the plaintiff even be allowed to present her case to the jury?
It seems counter-intuitive that the evidence should favor the plaintiff
as he could not identify the color of the bus.
Specifically, when confronted with the "Blue Bus Problem", most people are reluctant to impose civil liability on the defendant in this case, even though there is a substantial statistical probability that it was the defendant’s blue bus that caused the plaintiff’s injuries. Thus, the blue bus problem presents a puzzle: given such a high level of probability of guilt, why does plaintiff’s case in this hypothetical not pass the “preponderance of the evidence” test?
However, a Bayesian approach to the blue bus case allows one to avoid circular reasoning and solves the problem of probabilistic proof by identifying the missing information one needs to make Bayesian inferences. In conclusion, the Bayes’ Theorem powerfully confirms this counter-intuitive result.
Let’s do some math!
We need three pieces of information to apply Bayesian methods to the "Blue Bus Problem":
1. We need the total number of buses in the population of city buses, we can call it ’base rate information’;
2. We need the number of “hits”. It is the conditional probability that a bus is blue, given that it runs on Main Street, expressed as “p(blue|Main)” in standard Bayesian notation; and,
3. We need the number of “false alarms” or the conditional probability that a bus is blue, but does not run on Main Street, expressed as “p(blue|not Main)” or p(blue|~Main)” in standard Bayesian notation.
The standard version of the "Blue Bus Problem" provides information about item no. 2 but is silent about items no.1 and 3. Therefore, we will assume the following priors:
1. 40% of all buses run through Main Street, 60% do not;
2. 80% of the buses running down Main Street are blue (this assumption is based on the standard version of the Blue Bus problem);
3. 10% of the buses that do not run on Main Street are blue.
The Bayesian question we have to answer is:
„What is the posterior probability that a bus runs on Main Street, given that it is blue?”
Our Bayesian model of the blue bus problem is designed to test or measure the strength of the plaintiff’s hypothesis that it was a blue bus that caused her to swerve. We are not asking, “What is the probability that the bus in the plaintiff’s case was blue?” Instead, we are asking: “What is the probability that a bus will be on Main Street, given that it is a blue bus?”
First, we know that 40% of all city buses run through Main Street. It is our “prior probability”: p(Main) = 0.4 We also know that 80% of the buses that run on Main Street are blue. It is our “first conditional probability”: p(blue|Main) = 0.8 In other words, the probability that a particular bus is blue, given that it runs on Main Street, is 80% or 0.8. This assumption is based on the standard format of the blue bus problem. We also know that 10% of the buses that do not run on Main Street are blue. It is our “second conditional probability”: p(blue|~Main) = 0.1 In other words, the probability that a certain bus is blue, given that it does not run on Main Street, is 10% or 0.1.
Given these pieces of information, we want to test the plaintiff’s blue bus hypothesis by solving for:
p(Main|blue) = ?
If we want to state it formally:
given our priors p(Main), p(blue|Main), and p(blue|~Main), what is the revised or posterior probability that a particular bus runs on Main Street, given that it is blue?
Now, we can test the plaintiff’s blue bus hypothesis using Bayesian reasoning:
1. First, since 40% of city buses run on Main Street, and since 80% of those buses that travel down Main Street are blue, then 32% of all city buses are blue and run through Main Street [0.4 * 0.8 = 0.32];
2. Next, since 60% of the buses do not run down Main Street, and since only 10% of those buses are blue, then 6% of all city buses are blue and do not run through Main Street [0.6 * 0.1 = 0.06];
3. Therefore, since 32% + 6% = 38%, then a total of 38% of the buses that run down Main Street are blue;
4. Lastly, since 0.32 divided by 0.38 = 0.842, this means that
there is an 84.21% posterior probability that
a certain bus runs on Main Street, given that it is blue.
Or if you want to use the already known equation: 40% * 80% / 38% = 84.21%
In other words, it is likely that the bus on Main Street that caused the plaintiff to swerve was blue. Our Bayesian approach has tested the plaintiff’s hypothesis, and the plaintiff has passed this test.
But why should this Bayesian probability be relevant at trial?
If we recall the statement of facts of the case, we will see that the injured plaintiff in the blue bus problem knows where the accident occurred (on Main Street) but does not know the color of the bus that caused her to swerve. However, if we put the problem into Bayesian context, the fact that the plaintiff knows where the accident occurred tells us something about the probable color of the bus that caused her to swerve! In brief, this is why Bayesian reasoning provides numerical verification and helps solve the blue bus problem. It is a method for testing the reliability and accuracy of a legal trial.
The main takeaway point of our post is that legal scholars, lawyers, and judges should reconsider the potential usefulness of legal probabilism at trial to test and evaluate the strength of probabilistic proof, and the law of the future should be open to probabilistic methods.
Cheat sheet for busy lawyers
Visualizing Probabilistic Proof
Year of publication
Theoretical / Conceptual framework
The author revisits the "Blue Bus Problem", a famous thoughtexperiment in law involving probabilistic proof.
Research question(s) / Hypotheses
The author presents Bayesian solutions to different versions of the Blue Bus hypothetical case. In addition, the author expresses his solutions in standard and visual formats, that is, in terms of probabilities and natural frequencies
Bayesian solutions in in standard and visual formats.
Analysis / Results
The Bayes’ Theorem powerfully confirms the standard counter-intuitive result of the Blue Bus Problem.