top of page

Clash of the Titans – Do arbitrators have the cognitive upper hand?

We are all well aware of the tired arguments about the pros of alternative dispute resolution, such as arbitration and mediation. Many of us are cognizant of the critical differences between taking a dispute to court and resolving a legal issue out of court and the widely accepted conclusion that arbitration offers more predictability to its participants. However, none of this sheds much light on the quality of the arbitrators' judgments compared to judges. It prompts the question: Do arbitrators make good decisions compared to the judges they replace?

Empirical research on arbitrators so far tends to explore issues other than the quality of their judgments compared to judges. For instance, studies have tested the concern that arbitrators tend to reach compromise decisions by "splitting the baby" but have found no evidence to substantiate this concern. Another study compared the litigation outcomes of judges and arbitrators using field data in non-civil-rights employment cases and concluded that there were no reliable differences between arbitration and litigation.

Helm et al. (2016) took a different approach when assessing the relative quality of arbitrators and judges. The researchers assumed that despite some apparent dissimilarities in their selection and retention, judges and arbitrators essentially confront the same cognitive tasks: they accept testimonial and non-testimonial evidence, consider arguments put forward by parties or their attorneys, and impose a resolution of the dispute presented to them.

Based on this assumption, the researchers tested

whether a group of distinguished, experienced arbitrators

commits the same cognitive errors of judgment that also influence judges and

whether arbitrators possess an advantage over judges

when it comes to the quality of the decisions they make?

The methodology employed to study the arbitrators' decision-making was the same methodology the researchers used to study judges decision-making for almost two decades. They asked the participants to respond to a few hypothetical cases or tests questionnaire. They typically used a between-subjects experimental design; that is, they drafted two (or more) versions of a hypothetical case in which one feature varies, and each participant reviews only one version of the hypothetical case. Differences between the aggregated decisions made by the two (or more) groups are thus attributable to the feature that they varied.

The questionnaires were administered at an annual conference of commercial arbitrators specializing in resolving commercial disputes. The conference attendees numbered approximately 110 persons, and the arbitrators returned 94 questionnaires (N = 94). The arbitrators who returned questionnaires had an average of 22 years of experience (M = 22 years), ranging from 7 to 45 years, with a median of 20.5 years.

The researchers tested the following cognitive errors and/or phenomena:

  1. Excessive reliance on intuition

  2. Conjunction fallacy

  3. Framing bias

  4. Confirmation bias

Excessive reliance on intuition

The relevant scientific literature suggests that judges rely excessively on intuition even when performing a simulated judicial task. These results suggest that, like most people, judges rely too heavily on intuitive rather than deliberative mental processing, and such excessive reliance makes them susceptible to misleading intuitions that can generate poor decision-making and predictable errors.

The researchers asked arbitrators to take the so-called Cognitive Reflection Test’ (CRT) to answer whether arbitrators also rely too heavily on intuition. This test has been specifically designed to measure if a person can suppress an incorrect intuitive response and successfully override it with deliberation. It consists of the following three questions that might look familiar to anyone who tried to complete IQ tests on the internet:

A bat and ball cost $1.10 in total. The bat costs $1.00 more than the ball. How much does the ball cost? If it takes five machines five minutes to make five widgets, how long would it take a hundred machines to make a hundred widgets? In a lake, there is a patch of lily pads. Every day, the patch doubles in size. If it takes 48 days for the patch to cover the entire lake, how long will it take for the patch to cover half of the lake?

The driving mechanism behind the CRT is that each above question immediately suggests an intuitive but incorrect answer (10 cents, 100 minutes, and 24 days). However, if we take our time and suppress the incorrect intuitive answer, we relatively easy can come to the correct conclusions that are 5 cents, 5 minutes, and 47 days.

It is noteworthy that even though the above questions are not extremely difficult,

no one performs well on this test:

for instance, undergraduates from Harvard get only about half the right questions.

On average, even the best-performing group, the MIT students, answered only slightly more than two out of three questions correctly.

As far as the performance of the judges goes, Helm et al. found that a large group of Florida trial judges scored an average of 1.23 out of three correct, and a sample of administrative law judges scored 1.33 out of three correct. In conclusion, judges performed about as well as other well-educated people on the CRT, which is not a very well result.

On the other hand, the arbitrators correctly answered a mean of 1.51 (M = 1.51) out of the three questions on the test. If we look at the 95 percent confidence interval around this mean (1.26 to 1.77), we will learn that these arbitrators performed better than the Florida judges but similarly to the tested administrative law judges.

In conclusion, the tested arbitrators performed relatively well on the CRT and better than some of the groups of the tested judges; however, they did not perform exceptionally well in absolute values. Moreover, they did not equal the performance of the MIT students. These results indicate that arbitrators rely too heavily on intuitive rather than deliberative mental processing, just like the judges.

Conjunction Fallacy

If you are a frequent reader of our blog, you’ve already definitely heard about the phenomenon that is named ’Conjunction Fallacy’. It is a well-known judgmental bias that shows that people erroneously believe that events described in more detail are more probable than those that are described in less detail. If the above does not ring any bell, it means that you still think that the probability of a terrorist attack in New York City carried out by Muslim extremists (A&B) can exceed either the probability of a terrorist attack in New York City (A) or the probability of a terrorist act carried out by Muslim extremists (B). It is a severe problem, so you should do your homework and click here.

To explore whether arbitrators would comply with the conjunction rule in a legal setting, the researchers provided them with a fictitious scenario called the "Employment Case".

The arbitrators are presiding in an arbitration involving an employment dispute between Dina El Saba, a public-sector employee, and the agency she previously worked for. The arbitrators learn that Dina had worked as an administrative assistant for a senior manager before the agency fired her. While working at the agency, Dina's employment evaluations ranged between "average" and "above average", so she claimed that unlawful discrimination must have motivated her termination. The agency claimed that it terminated Dina because she repeatedly violated workplace rules and norms: she took too many breaks during the workday and took odd days off as holidays; dressed in ways that made her co-workers and agency visitors feel uncomfortable, such as "covering herself mostly in black"; acted "odd" and "aloof"; and refused to eat lunch in the presence of male co-workers. The researcher ask the arbitrators to rank-order the likelihood of the following four options:

  1. The agency unlawfully discriminated against Dina based on her Islamic religious beliefs.

  2. The agency actively recruited a diverse workforce

  3. The agency adhered to its internal employment policies

  4. The agency actively recruited a diverse workforce but also unlawfully discriminated against Dina based on her Islamic religious beliefs

Using the standard conditional probability theorem, option four is, as a matter of deductive logic, less likely than either option one or option two and violates the conjunction rule.

The test results suggest that the arbitrators violated the conjunction rule: of the 86 arbitrators who responded to this problem, 92 percent (79 out of 86) violated the conjunction rule somehow.

An essentially identical version of this problem was given previously to a group of administrative law judges (N = 99). Eighty-five percent (84 out of 99) of the judges violated the conjunction rule somehow. It is noteworthy that even though the judges appeared to perform better than arbitrators, the difference between the performance of the administrative law judges and the arbitrators is not significant from a statistical point of view. In conclusion, arbitrators are susceptible to the conjunction fallacy.

Framing bias

Again, if you don’t know what framing bias is, you should first click here and learn about this subtle form of persuasion used by legal counsels to present their stories in the most favorable light.

To evaluate whether arbitrators are vulnerable to the framing effect, the researchers drafted a scenario designed to assess whether arbitrators place greater emphasis on losses than gains—a phenomenon often referred to as loss aversion. Specifically, the arbitrators were asked to resolve a contract dispute between buyers and sellers:

Experienced videogame collectors had entered into the contract at a convention for collectors of videogames. The dispute arose because of a mutual mistake regarding a video game's value. In one condition of the experiment, the arbitrators were told that the buyer was suing to rescind the sale of a videogame that he had purchased from the seller. The seller had offered the videogame for sale for $38,000, as he thought it was a rare vintage collector's item. After the purchase, however, the buyer discovered that the game was an extremely common videogame worth only $1. Upon this discovery, the buyer sued to rescind the transaction. In the other condition of the experiment, the arbitrators were told that the seller was suing to rescind the sale of a videogame. The seller had offered the video game for sale for $1, as he thought it was a standard game worth very little. After his sale, however, the seller discovered that the videogame was a rare vintage collector's item worth $38,000. Upon this discovery, the seller sued to rescind the transaction.

Even though, as in all framing scenarios, the two scenarios are economically equivalent, loss aversion predicts that the buyer will feel worse off than the seller. The buyer suffered an out-of-pocket loss of $38,000 in cash. The loss aversion paradigm suggests that people expect that the buyer would feel worse off because, when the absolute amount is equivalent, losses make people feel about twice as bad as gains make them feel good. However, from a logical standpoint, then, the difference in the frame should not generate a different result.

The test results indicate that the arbitrators were susceptible to the framing effect. When the plaintiff seeking a rescission was the seller of the videogame (foregone gain), 42 percent (19 out of 45) of the arbitrators awarded rescission to the plaintiff, but 86 percent (36 out of 42) awarded rescission to the buyer who wanted to get his money back (loss). This difference was statistically significant.

These results are consistent with another experiment in which the researchers tested fifty-one Utah trial court judges with the essentially same legal scenario. The results were almost identical to those of the arbitrators. Among these judges, forty-one percent (1 out of 32) granted a rescission to the disappointed seller (foregone gain), and eighty-two percent (14 out of 17) granted a rescission to the disappointed buyer (loss).

To conclude, both the arbitrators and the judges were equally susceptible to the framing bias and treated the foregone gain differently than the loss, even though both the law and the economics are the same in both conditions.

Confirmation bias

Suppose you’ve read our post about the Mother of Misconceptions, the Queen of Biases, the Inevitable Curse, and the Saviour of our Lives, who makes the arbitrators ineffective gatekeepers and once managed to deceive the whole FBI.

In that case, you are already familiar with the Wason card selection task and its modified versions adapted to legal settings, including a fictitious legal scenario, and you also know that arbitrators perform pretty poorly on these tests: nineteen percent (8 out of 43) of the subjects responded correctly. The test results, after all, suggest that arbitrators were not frequently subject to confirmation bias in the most traditional sense. On the flip side, arbitrators displayed a tendency to insist on way more evidence than needed to decide in the case.

This result is consistent with the Ohio judges to whom the researchers assigned the same file selection task. 14.2 percent (20 out of 141) answered correctly. The difference between the performance of the arbitrators and the performance of the Ohio judges was not statistically significant. As we found with the arbitrators, judges in Ohio also tended to insist on too many

files. Notably, arbitrators were not worse in this regard than judges.

Aaaand the winner is...

The overall results of Heim et al. suggest that arbitrators are not superior to judges in terms of their ability to avoid common cognitive errors in judgment:

· the arbitrators relied heavily on intuition to answer the CRT questions,

· they overwhelmingly committed the conjunction fallacy,

· they were primarily vulnerable to the framing effects, and

· they failed to respond well to the confirmation bias problems.

On the other side of the coin, even though the arbitrators performed poorly on some of the experiments in objective values, their performance was no worse than judges who responded to identical or similar problems and tests.

*In case of the CRT, the higher mean (M) value shows that arbitrators are less susceptible to excessive reliance on intutition than the judges. In the case of the other indices, the higher values show more excessive susceptibility to the given bias.


the proper conclusion is not that arbitrators are worse decision-makers than judges but that

they tend to display the same susceptibility to cognitive illusions and excessive reliance on intuition as judges and lawyers.

It is also essential to keep the above results in perspective: Heim et al. conclude that

the difference between the relative quality of litigation and arbitration likely stems from the situational advantages the arbitrators hold over judges and not from the people's abilities who make the judgments. However, this situational advantage may not be longevous due to susceptibility to cognitive error or vulnerability to extraneous biases.

Cheat sheet for busy lawyers


Are Arbitrators Human?


Year of publication

Theoretical / Conceptual framework

Research question(s) / Hypotheses


Analysis / Results

49 views0 comments
Post: Blog2 Post
bottom of page