Why write judgments[1]

The Honourable Justice S Gageler AC[2]

Forty years ago Sir Frank Kitto asked and answered a question: “Why write judgments?” Asked of individual members of multi-member appellate courts, that question has become the subject of contemporary controversy. This article reassesses the answer given by Kitto in light of modern choice theory and behavioural science. It restates, in quantitative terms, the qualitative answer given by Kitto. The answer so restated is: to maximise the probability that the court as an institution will give the best of possible judgments.

Introduction

Sir Frank Kitto was a Justice of the High Court of Australia from 1950 to 1970 when he left to become Chancellor of the University of New England. In 1973, he presented a paper to a convention of judges of the High Court and of the Supreme Courts of the States and Territories. The Family Court and the Federal Court did not then exist. The paper was published some years later.[3] It has become staple reading for newly appointed superior court judges in Australia.

The title of Kitto’s paper was in the form of a question: “Why write judgments?” Forty years on, I return to that question in the context of a contemporary controversy.

Sir Frank Kitto’s Answer

Kitto’s answer to the question was multifaceted and nuanced. He noted at the outset that an obvious purpose of a judge delivering reasons for judgment was “to satisfy a desire, which the parties to the case may be assumed to possess, that they may be told not only whether to rejoice or to be sad, but also how it was that the judge reached his ultimate conclusion”.[4] He went on to make the “cardinal point”[5] that “the delivery of reasons is part and parcel of the open administration of justice”.[6]

But the core of Kitto’s answer derived from the understanding that a judgment is the result of reasons that are justified in law and that it is the duty of a judge not only to decide a controversy between parties fairly and transparently, but “to decide correctly if he can”.[7] Kitto identified the greatest advantage of writing a judgment as lying in the discipline that the process of writing imposes on a judge doing his or her honest best to decide correctly. That is because experience teaches that:[8]

only in the throes of putting ideas down on paper, altering what has been written, altering it a dozen times if need be, putting it away until the mind has recovered its freshness, even tearing it up and starting again, can most of us hope to get, in a difficult case, the fruits of the requisite intensity of penetrating thought ...

Towards the end of the article, Kitto turned to what he described as “the really difficult question whether and when a member of a multiple court is justified in simply concurring in a judgment written by a colleague”.[9] He expressed the conviction that “on balance, the writing of individual judgments tends to produce the better work”.[10] Acknowledging that opinions differ, eschewing any “intention of being dogmatic”,[11] not thinking that “a categorical answer ought to be attempted”,[12] and accepting that “the advantage of certainty in the law”[13] could on occasions be aided by one judge concurring or joining in reasons for judgment written by another, he said that he:[14]

would urge no more than that the course of individually, exactingly, intensely, putting in writing what the judge believes ought to be said has such immense advantages that it should be followed in every case unless the reasons for departing from it, when doubtingly considered, are felt to preponderate convincingly.

Contemporary controversy

Kitto gave his answer at a time when it was the norm for the individual members of appellate courts in Australia to give separate reasons for judgment. The course he urged on the judges to whom he gave his paper was the course then generally followed in practice.

Times have changed. Now it is much more common for members of appellate courts in Australia either to join in delivering a single set of reasons for the judgment of the court or to express individual concurrence with reasons for judgment delivered by just one of them.

That contemporary practice is in the process of reappraisal. It was trenchantly criticised by Dyson Heydon, who had been a judge of the NSW Court of Appeal from 2000 to 2003 and a Justice of the High Court from 2003 to 2013, in a lecture published after his retirement.[15] The published form of the lecture makes plain in an introductory note that its author was not to be taken to be referring to the behaviour of any particular court of which he had been a member, but “to tendencies and possibilities in courts in general”.[16] The theme of the lecture was captured in its title: “Threats to judicial independence: the enemy within”. The theme was that, properly to perform the judicial function, judges must be willing and able “to work out and say what they think is right”,[17] not only independently of external influences but also independently of each other:[18]

Composite judgments raise questions. Who did the work? Did every judge understand the judgment? Did every judge closely examine it? Did a confident “specialist” assume dominance over nervous “generalists”? What, if any, compromises were made? … The same difficulty exists where a judge delivers a full judgment and the remaining judges say “I agree”. No doubt the judges do sincerely agree on something, but on what?

The contemporary practice, argued Heydon, carries risks, which include that each member of the appellate court might fail to give close personal attention to each issue in the case and that some members of the court might assent to the reasons of others either through the influence of dominant judicial personalities or through the natural tendency of persons in small groups to “flow along with what they perceive to be majority opinion”.[19] He provocatively raised the prospect of the “judicial herd”[20] careening blindly into the legal abyss.

Publication of the lecture resulted in publication of responses. One was that of Sir Anthony Mason,[21] who had been a judge of the NSW Court of Appeal from 1969 to 1972, a Justice of the High Court from 1972 to 1987, Chief Justice from 1987 to 1995, and who has been a non-permanent judge of the Court of Final Appeal of Hong Kong since 1995. Agreeing that the risks identified by Heydon exist, Mason recorded that in his judicial career spanning 45 years he had encountered neither a judge who had been dominant in practice (as distinct from in aspiration) nor a compliant judge who was inclined merely to conform.[22] He gave two justifications for the practice of delivering joint reasons for judgment. The first was that it best reflects the collective or institutional responsibility of the court for the decision made in a case. The second was that, by consolidating and clarifying the reasons of the court, it provides greater certainty.[23] He also stated two provisos that he said must be “clearly understood”.[24] The first proviso was that each judge should be under no pressure to participate in joint reasons for judgment and should be free to write his or her own reasons for judgment. The second was:[25]

that under no circumstances should a judge fail to give expression to his true view of the law for the sake of creating a false sense of unanimity or collective solidarity; in other words, compromise must not be allowed to triumph at the expense of judicial independence.

Another response was that of Peter Heerey,[26] who was a judge of the Federal Court from 1990 to 2009. “Most human decision-making is improved by frank discussion between persons with knowledge of the subject” he said, and “[i]t is hard to see why the decisions required of appellate judges should be any different”.[27] He stated:[28]

The problem with Heydon’s approach is that it sets up a paradigm of appellate judging, said to have its basis in psychological theory which, notwithstanding the lack of any support by way of empirical evidence, is necessarily applicable for all judges for all cases, however straightforward.

Joint reasons for judgment, he argued, contribute to clarity and certainty in the law. No “one modus operandi” should be imposed “on all judges on all appellate courts for all cases”.[29] An appellate judicial system that dispensed with joint reasons for judgment, and with attendant judicial discussion, “would mean longer and more delayed and more expensive cases with no demonstrable improvement in the quality of justice delivered”.[30]

Mason’s approach aligns with that of Heydon in emphasising as absolute the duty of each individual member of an appellate court to form and to act on his or her own true view of each case before the court. Their approaches also align in accepting that a practice of concurrence or participation in joint reasons for judgment poses some risks to the performance of that duty. The difference between them lies in their assessment of the magnitude of those risks. Without questioning the duty, Heerey’s approach is more pragmatic. It questions the empirical basis for the tendencies argued by Heydon to give rise to the risks. It questions the systemic costs and the benefits of seeking to avoid them.

What might empirical evidence suggest about risks to the performance of the duty that Mason and Heydon both accept to be an absolute duty? What might it suggest about the costs and benefits of seeking to avoid those risks? What might a broader appreciation of what Heerey described as “human decision-making”[31] tell us about the importance of the duty itself?

Condorcet’s jury theorem

Consideration of these questions is assisted by awareness of the work and legacy of the French mathematician and social scientist Nicolas de Condorcet. In 1785, he wrote an “Essay on the application of analysis to the probability of majority decisions”. The essay had some early influence amongst statisticians and political theorists, but was largely forgotten until it was rediscovered in the second half of the 20th century in the development of modern choice theory.[32]

The central question with which Condorcet was concerned was how likely a group is to arrive at a correct judgment given three variables. The first is the judgmental competence of the individual group members. The second is the decision-making rule or deliberation process used to aggregate individual judgments into a group decision. The third is the size of the group.

What has become known as “Condorcet’s jury theorem” amounts to this: for a group tasked with adjudicating a controversy that has two possible outcomes, where the judgmental competence of each of the individual group members is the same and exceeds 0.5 (that is to say, where each group member judging individually would be more likely to be right than wrong), and where the decision-making rule is that of a majority vote, the probability that the judgment of the group will be correct increases as the size of the group increases.[33] Condorcet himself explained it this way:[34]

One finds further that if the probable truth of the vote of each voter is greater than [0.5], that is to say if it is more probable than not that he will decide in conformity with the truth, the more the number of voters increases, the greater the probability of the truth of the decision. The limit of this probability will be certainty, from which it follows that multiplying the number of votes yields as great a probability of having a decision as desired.

What one finds from Condorcet’s jury theorem, more significantly for present purposes, is that where the judgmental competence of each individual group member is uniform, the probability of the judgment of the majority being correct will always exceed the probability of the judgment of an individual member being correct. Even where the judgmental competence of individual group members varies, the probability of the judgment of the majority being correct will, in all but cases of extreme variation, still exceed the probability of the judgment of the most competent member being correct.

The theorem can be illustrated by some simple worked examples. Assume a group, each member of which, judging individually, has an 80% probability of arriving at a correct judgment. The probability that the group, voting by majority rule, will make a correct decision (as distinct from the probability that any given decision is in fact correct):[35] for a group of three will be 90%; for a group of five will be 94%; for a group of seven will be 97%; and for a group of nine will be 98%. If each member of the group, judging individually, has instead a 90% probability of arriving at a correct judgment, the probability that the group, voting by majority rule, will make a correct decision: for a group of three will be 97%; for a group of five will be 99%; and for a group of either seven or nine will exceed 99.7%. Now change the assumption so that each member of the group judging individually has an 80% probability of arriving at a correct judgment except for one — the guru or the specialist — who alone has a 90% probability of arriving at a correct judgment. The probability that the group, now including the guru or specialist and again voting by majority rule, will make a correct decision: for a group of three will be 93%; for a group of five will be 96%; for a group of seven will be 97%; and for a group of nine will be 99%. Despite the average competence of the members becoming increasingly lower as the size of the group increases, the majority of the group that includes the guru or the specialist will judge increasingly better than the guru or the specialist would judge alone. The inclusion of the guru or specialist in the group would result in some increase in the probability of correctness of the judgment of the group of three, less for the group of five and less again for the group of seven. It would make virtually no difference to the probability of correctness of the judgment of the group of nine.

The theorem does not depend on the adoption of any naïve or absolute notion of what it means for a judgment to be “correct”: relative or evaluative terms such as “preferable” or “better” can be substituted. The most that the quality of correctness implies is that there be some external standard by reference to which judgments are made and against which they can be measured. The theorem can be applied, for example to a group choice between two possible outcomes evaluated by reference to “the common good” or “the interests of justice”, provided that the common good or the interests of justice is exogenous, in the sense that the common good or the interests of justice is conceived of independently of the decision-making procedure adopted by the group.[36]

The implications of the theorem for institutional design have, of late, been explored well beyond the boundaries of formal adjudication, although the normative conclusions to be drawn from the theorem for real-world decision-making have generally been approached with caution.[37] Acknowledging the need for caution, and accepting that any abstraction involves simplification, it remains that the theorem can readily be applied to the design of an institutional structure for formal adjudication.[38]

The classic functional definition of judicial power, formulated by Kitto himself in the course of giving separate reasons for judgment as a member of the High Court, is that it “involves, as a general rule, a decision settling for the future, as between defined persons or classes of persons, a question as to the existence of a right or obligation”.[39] It involves, ordinarily, the adjudication of a controversy that, in broad terms, has two possible outcomes. The exercise of judicial power, as Kitto was later at pains to point out in explaining why reasons for judgment should be written, of its nature requires that a court as an institution comprising a group of members who are judges should always try to get the decision right. Trying to get a decision right does not necessitate belief that a single right answer to every legal problem exists in the abstract just waiting to be discovered; merely belief that one outcome of adjudication will ordinarily be able to be shown to be preferable to another. Making a correct judgment, for present purposes, means nothing more or less than making the better judgment.

Condorcet’s jury theorem can be seen to explain and to justify as rational the institutional structure for formal adjudication that has now long existed in many countries in which provision is made for an appeal from a court generally comprised of a single judge, to an intermediate court of appeal generally comprised of either three or five judges (deciding by majority), with the possibility of a further appeal to a court of final appeal generally comprised of either seven or nine judges (again deciding by majority). Even if there were no increase in the relative competence of the judges, the probability of arriving at a correct result would necessarily increase at each of the two stages of the appellate hierarchy. It also explains the traditional practice of common law courts, before appeals were permitted by statute, of deciding difficult questions of law by taking the majority view of all available judges sitting en banc.

Applied to individual decision-making within an institutional structure of appellate courts, Condorcet’s jury theorem also explains and justifies treating as absolute the duty of each individual member of an appellate court to form and to act on his or her own true view of each case before the court. That is because it is a necessary condition for the theorem to hold that each member decides independently. Independence in the necessary sense is statistical or decisional independence (which rules out members being influenced in their own decisions by the decisions of other members), as distinct from causal or deliberative independence (which would rule out members being influenced in reasoning to their own decisions by information they have gained or realisations to which they have come in the course of deliberating with other members).[40]

The significance of independence lies in the de-correlation of error. If you and I decide independently, we might each reach a correct decision, or we might each reach a wrong decision, but the fact that one of us reaches a wrong decision, does not mean that the other of us will also reach a wrong decision. If we form part of a large enough group, our individual errors will be diluted. On the other hand, if I simply agree with your decision, any error you make, I will make too. Whatever the size of our group, my simple agreement with your decision will effectively decrease the size of the group by one. The majority decision of the group would be no better than if I was not there at all.

Accordingly, if two generalist members of a three member court simply agree with a third because he or she is properly acknowledged to have greater competence in judging a case within that field of speciality, then the size of the three person court would in practical terms be reduced to one. The result would not be bad. The result would be better than could be achieved by either of the two concurring judges deciding individually and as good as the decision of the most competent judge. But it would be likely to be sub-optimal: other than in extreme cases of variations in individual competence, the court as a whole would be likely to do better if each judge (taking into account information gained or realisations arrived at in the course of deliberating with the judge acknowledged to have greater competence) reasoned independently to his or her own decision, and the decision of the court was by majority in the event of disagreement.

Risks to independence

Enter behavioural science. The 40 years since Kitto’s lecture have seen an explosion of research into the cognitive processes of individuals and into the behaviour of groups.

More recent research has complemented Condorcet’s jury theorem’s statistical analysis of group decision-making to the extent that it highlights the significance of “cognitive diversity” among group members. The term denotes differences in the way individual group members approach problem-solving. The term denotes, more specifically:[41]

a diversity of perspectives (the way of representing situations and problems), diversity of interpretations (the way of categorizing or partitioning perspectives), diversity of heuristics (the way of generating solutions to problems), and diversity of predictive models (the way of inferring cause and effect).

Cognitive diversity provides an explanation, beyond mere random variation, as to why equally competent group members apprised of the same information might arrive at different answers to the same questions.

The most significant contribution of behavioural science, however, has been to highlight behavioural risks to the de-correlation of individual error that are inherent in small-group decision-making. The critical findings can be stated without need for much elaboration.[42] The fact is that humans are influenced in making their own judgments by their perceptions of other humans. They are influenced, in part, because their perception of what is objectively true is influenced by their perception of what others believe to be true; they shortcut the need to think and to experiment for themselves by tapping into the cumulative experience stored in the minds and observed in the behaviour of others. They are also influenced, in part, by the desire to avoid the disapproval of others; in their personal relations they ordinarily prefer friendship and solidarity to conflict, and they adjust their own behaviour accordingly. Within a small group, those two basic influences have a tendency through deliberation to produce: intellectual conformity (by which the range of views within the group tends to be decreased as members self-censor views of their own that they think other members of the group would disapprove); cascades (by which members of the group tend to adopt and to reinforce the views of others who speak first or with greatest authority); and group polarisation (by which members of the group who were like-minded before deliberation tend to end up adhering after deliberation to a more extreme position in line with their tendencies before deliberation).

Deliberation can help group decision-making, especially through the sharing of information and the testing of ideas. But the constant risk of deliberation to the quality of group decision-making is the risk of loss of independence of individual judgments within the group, and with it the loss of the benefit of the de-correlation of individual error. The risk is one to be managed. According to psychologist Daniel Kahneman, winner of a Nobel Prize for his pioneering work in behavioural economics:[43]

A simple rule can help: before an issue is discussed, all members of the [group] should be asked to write a very brief summary of their position. This procedure makes good use of the value of the diversity of knowledge and opinion in the group. The standard practice of open discussion gives too much weight to the opinions of those who speak early and assertively, causing others to line up behind them.

According to lawyer and behavioural theorist Cass Sunstein:[44]

The basic goal should be to increase the likelihood that deliberation will do what it is supposed to do: elicit information, promote creativity, improve decisions …

Frequently, a team player is thought to be someone who does not upset the group’s consensus. But it would be possible, and a lot better, to understand team players as those who increase the likelihood that the team will be right — if necessary, by disrupting the conventional wisdom.

Judges, as anecdotal evidence suggests[45] and empirical research tends to confirm,[46] are humans. The reality is that the risks to independent reasoning created by the “tendencies and possibilities” to which Heydon alluded, and that Mason accepted to exist, are risks and possibilities inherent in any human decision-making. It would be wholly reasonable to expect disciplined judges to be well-equipped to manage those risks, but it would be folly to suggest that judges as a class are wholly immune from them. Management of risks begins with their recognition.

Costs and benefits

The High Court was established in 1903 with three justices: Chief Justice Sir Samuel Griffith and Justices Sir Edmund Barton and Richard O’Connor. The early practice was for the judgment of the court to be reserved after the hearing of a case, for the justices to prepare their individual reasons for judgment separately, and for those separate reasons for judgment to be read out by their authors in order of seniority in open court on the day of delivery of the judgment. The practice meant that, at least in theory, it could happen that the first time one justice came to know of the reasons of another was when he heard them read out on the day he was to deliver his own. In a foreword, written in 1947, to a biography of Sir Edmund Barton, Sir Robert Menzies said, “[m]any times, I have reason to believe, Barton wrote separate reasons for judgment and then, on the Bench, having heard Griffith read his, put his own away, and said, “I concur”.”[47]

Sir Robert continued:[48]

The modern passion ... for the writing of long independent judgments, even though they concur in the result, had no appeal for Barton. He knew that a multiplicity of reasons sometimes obscures the result and makes it exceedingly difficult to discover the real point of decision. There is among many members of the legal profession a strong feeling, which I share, that the final court of appeal would add to the certainty and clarity of the law by reducing the present duplication and re-duplication of reasons. It may not, perhaps, be desirable to go the whole distance and have one judgment only ... But it is certainly true that Barton in his day contributed in many cases to the intelligibility of the result arrived at by the High Court by pocketing his own reasons for judgment and subscribing to others which followed substantially the same lines.

Whether Menzies accurately recounted the practice of Barton can be put to one side.[49] Relevant, for present purposes, are the suggestion and the commendation of the practice by Menzies, which reflect the systemic benefits that flow from the members of an appellate court, who have arrived at a conclusion independently of each other, agreeing on the expression of common reasons for judgment where they are able to satisfy themselves that there is no substantial difference between their individual reasons for judgment. Reasons for judgment become precedents that form part of the law and it is better, where possible without materially compromising their quality, that they be short and clear. Barton’s suggested practice of concurring with Griffith at a time after he had completed writing his own separate reasons for judgment is sufficient to demonstrate that a practice of agreement is not inherently incompatible with the maintenance of complete decisional independence. It is also sufficient to demonstrate that a practice that promotes decisional independence can come at a cost. If he adopted the practice recounted by Menzies, the cost of Barton’s independence was the time he spent writing his own separate reasons for judgment to a standard where they were ready for delivery, only to throw them away.

Once it is accepted that a practice of agreement is not inherently incompatible with the maintenance of decisional independence necessary to ensure the quality of collective decision-making, the question becomes one of whether decisional independence can be maintained at a lower cost.

It is here important to recognise that the conscientious performance of what Kitto described as the duty of a judge to decide “correctly if he can”[50] can give rise to a range of outcomes depending on the difficulty of the question and on the time the judge has in which to make up his mind. When asked, as an experienced legal practitioner, to give my opinion on a difficult legal question, I would sometimes give a client a choice. I would say:

You can pay me to think about this question for a day and I will give you an answer with 90 per cent confidence. You can pay me to think about this question for a week and I will give you an answer with 95 per cent confidence. I cannot do better than 95 per cent. You choose.

The client — even the most sophisticated and well-resourced client — would invariably choose to pay me for a day. I would reason my way through the problem to the best of my ability for a day. I would then give a qualified answer.

Having read Kahneman’s Thinking, Fast and Slow,[51] I now recognise two things about my former practice. One is that, like most humans and almost all professionals, I equated my subjective confidence in my ability to arrive at a correct decision with the objective probability of me arriving at a correct answer. Almost certainly, I over-estimated my own ability. The percentages I gave should have been lower. To make it easy to tie into the worked examples of Condorcet’s jury theorem I gave earlier, let’s say that the objective probability of me arriving at a correct answer after thinking for a day should, realistically, have been 80%, and that the objective probability of me arriving at a correct answer after thinking for a week should have been 90%. The other is that I was implicitly recognising the essential difference in reliability between “fast thinking” (which experience often allows to be undertaken rather quickly), carrying a higher risk of error, and “slow thinking” (which can only ever be undertaken laboriously), carrying a lower risk of error. Why fast thinking is possible with experience is because we are able to recognise standard patterns in problems we habitually encounter. Those standard patterns trigger information, stored in our memories, about standard solutions.[52] Further reflection may well reveal the standard pattern and standard solution to be inadequate to provide a particular solution to a particular problem. Bringing the same honest, earnest and independent analysis to the same problem, we can often give an answer that is likely to be correct after we have thought about the problem for a short time, or give an answer that is more likely to be correct after we have thought about the problem for a much longer time.

The critical point, for present purposes, is that, taking the need for each member of a multi-member court to reason independently to a judgment as given, a judgment that has the requisite independence can be formed relatively quickly, or it can be formed relatively slowly. A quicker judgment can still have a probability of correctness that is, for many purposes, acceptable. A slower judgment brings a higher probability of correctness, but at a higher cost. Given that the time each member has to engage in independent reasoning must be finite, there is an inevitable trade-off between the number of cases the court can decide and the probability that the court will decide those cases correctly.

The point can be illustrated with reference to worked examples of Condorcet’s jury theorem set out earlier. Assume that I bring to my work as an appellate judge the same industry and acuity that I brought to my work as a legal practitioner. Assume other appellate judges do the same. Assume, for the sake of simplicity, that the problems we adjudicate are no more or less difficult than the problems on which I formerly opined. Assume also for the sake of simplicity that we each take four weeks annual leave a year, do not work weekends, and do nothing on work days except think and write about our judgments. Reasoning independently, and doing our very best in every case to get it right in the time available, we could each judge 48 cases a year, each with 90% individual probability of being correct, or we could each judge in 240 cases a year, each with 80% individual probability of being correct. Sitting in courts of, say, three members, we could therefore each participate in judging 48 cases a year, each with 97% probability of reaching a correct decision, or we could participate in judging 240 cases a year, each with 89% probability of reaching a correct decision. Alternatively, sitting in courts of, say, seven, we could each participate in judging 48 cases a year each with 99.9% probability of reaching a correct decision, or we could each participate in judging 240 cases a year each with 96% probability of reaching a correct decision. Those are the outer parameters. There are endless permutations and gradations in between. You choose.

Your choice might well vary with where a particular court sits within the appellate hierarchy. It was said some time ago in England that the function of a trial judge was to be quick, courteous and wrong. It was immediately added that this did not mean that the function of the Court of Appeal was to be slow, rude and right, for the Court of Appeal would then be usurping the function of the House of Lords.[53] It was a joke. But like many jokes, it contained an insight. A two-tier system of appellate review can tolerate a greater margin of error at the level of an intermediate appellate court than it can at the level of the court of final appeal.

An intermediate court of appeal typically has very little control over its case load. In meeting that case load, the court needs, inevitably, to balance the risk of error against the time each member can realistically spend independently reasoning to a conclusion. A typical approach in three-member intermediate courts of appeal, in Australia and elsewhere, is for one member to be allocated to spend longer considering and preparing reasons for judgment in a case, in the expectation of the other two members each independently spending shorter periods, unless in those shorter periods they find themselves provisionally inclined to disagree, in which case they will likewise spend longer considering and preparing their own separate reasons for judgment. There is no reason to think that adherence to the typical approach fails to achieve an appropriate balance in typical cases. Using the assumption made earlier for the purpose of illustrating the relativities, allocating one member always to reason slowly gives the court, voting by majority, 93% probability of reaching a correct decision — better than the probability of 89% if all judges reasoned quickly, and not greatly below the probability of 97% if all judges reasoned slowly.

A court of final appeal, on the other hand, typically has a very high level of control over its case load, and typically chooses to hear and determine those cases in which it is able to conclude that its judgment will be likely to contribute systemically to the interests of justice. A court of final appeal will therefore rarely, if ever, be justified in compromising the probability of its judgments being correct. What is the point of a court of final appeal choosing to hear appeals so as to give its own answer to legal questions already answered by intermediate courts of appeal, unless the court of final appeal can ensure that the answers it gives by majority will be the best answers the court can give? A court of final appeal cannot ensure that the answers given by a majority of its members will be the best answers the court can give, except by ensuring that its members consider, and have sufficient time each to consider, those questions each to the best of his or her individual ability. If, having reasoned independently to the same conclusion, they are able to put immaterial differences aside and agree on a common form of expression of those reasons, then the systemic benefits can be expected ordinarily to outweigh the costs of doing so.

Some raw statistics perhaps provide tentative support for a broad observation that final courts of appeal do tend, in practice, to allow their individual members time to get it right. The High Court (which has seven members) now typically gives judgment in about 60 appeals each year, against the background of about 500 applications for special leave to appeal from the judgments of intermediate appellate courts.[54] In 2012, the Supreme Court of the United Kingdom (which has 12 members) gave judgment in 85 appeals, against the background of about 250 broadly equivalent applications.[55] In the same year, the Supreme Court of Canada (which has nine members) gave judgment in 75 reserved appeals against the background of about 560 broadly equivalent applications,[56] and the Supreme Court of the United States (which also has nine members) gave signed opinions in 73 appeals against the background of over 7,500 broadly equivalent applications.[57] There are marked differences in the numbers of applications to appeal. There are, in contrast, marked similarities in the numbers of judgments. The contrast suggests that the number of appeals determined by courts of final appeal correlates less with demand for authoritative answers to legal questions already answered by intermediate courts of appeal than with the time needed for members of courts of final appeal each to reason their way to giving their best answers.

Conclusion

The High Court, having been established in 1903 with three Justices, had its membership increased to five in 1906, with the appointment of Sir Isaac Isaacs and Henry Bournes Higgins, and from five to seven in 1913. Justice Higgins died in office in 1929. His contribution to the work of the High Court was celebrated in a ceremonial sitting held just a few weeks after he died. Justice Isaacs said of him then:[58]

His was a thoroughly independent mind. He sought his own solution of every problem that was brought before him, and, having reached his conclusion and considered it right, it mattered not to him whether it found favour or failed to find favour in the eyes of others. If it could be supported, well and good; but, if not, he always had the supreme consciousness of doing his duty as he saw it. He always seemed to me to illustrate in a marked manner what was once said by a great Chief Justice of Victoria (Chief Justice Higinbotham), that the responsibility of a Judge is not do what is right, but to do what he believes to be right.

Chief Justice Higinbotham spoke of judicial independence. Justice Higgins exemplified it. Condorcet’s jury theorem suggests that if each member of a multi-member appellate court follows the example of Higgins J so as to do what he or she independently believes, on adequate reflection, to be right, the court as a whole will be more likely to do what is right.

Kitto emphasised the importance of the writing of reasons for judgment to the individual reflective process. Writing reasons for judgment involves very slow thinking. It has costs. It also has great systemic benefits.

Why write judgments? Focusing on the position of an individual member of a multi-member appellate court and restating in quantitative terms the qualitative answer Kitto gave 40 years ago: to maximise the probability that the court as an institution will give the best of possible judgments.



[1] This is a revised version of the Sir Frank Kitto Lecture delivered at the University of New England Law School on 11 November 2013. A version was also delivered at the 11th Annual University of South Australia Competition and Consumer Workshop in Adelaide on 15 November 2013.

This article was originally published in (2014) 36(2) Sydney Law Review 189, and is published here with kind permission.

[2] Justice of the High Court of Australia. The author wishes to thank Brendan Lim for the maths, and to Henry Ergas and Jason Ockerby for the further conceptual comments.

[3] F Kitto, “Why write judgments?” (1992) 66 ALJ 787.

[4] ibid at 788.

[5] ibid at 789.

[6] ibid at 790.

[7] ibid at 793.

[8] ibid at 796.

[9] ibid.

[10] ibid at 797.

[11] ibid at 796.

[12] ibid.

[13] ibid at 797.

[14] ibid at 798.

[15] J D Heydon “Threats to judicial independence: the enemy within” (2013) 129 Law Quarterly Review 205.

[16] ibid at 205.

[17] ibid at 209.

[18] ibid at 212.

[19] ibid at 216.

[20] ibid at 217.

[21] A Mason, “Reflections on the High Court: its judges and judgments” (2013) 37 Australian Bar Review 102.

[22] ibid at 109.

[23] ibid at 110.

[24] ibid

[25] ibid. To similar effect, see W Douglas, “The dissent: a safeguard of democracy” (1948) 32 Journal of the American Judicature Society 104 at 106 quoting Hughes CJ.

[26] P Heerey, “The judicial herd: seduced by suave glittering phrases?” (2013) 87 ALJ 460.

[27] ibid at 461.

[28] ibid at 463.

[29] ibid.

[30] ibid.

[31] ibid at 461.

[32] H Landemore, Democratic reason: politics, collective intelligence and the rule of the many, Princeton University Press, 2013 at pp 70–75; I McLean and F Hewitt (eds), Condorcet: foundations of social choice and political theory, Edward Elgar, 1994 at pp 49–54, 73–78.

[33] S Nitzan and J Paroush, “A general theorem and eight corollaries in search of correct decision” (1994) 17 Theory and Decision 211.

[34] N de Condorcet, “Essay on the application of mathematics to the theory of decision-making”, 1785, translated in K Baker (ed), Condorcet: selected writings, Macmillan, 1976, at pp 48–49.

[35] The probability that a given decision is in fact correct is a conditional probability. It is conditional in the sense that it takes as given that particular members of the group have already in fact voted one way or another, thereby ruling out all of the other voting scenarios that were possible but that did not, in fact, occur.

[36] Landemore, above n 32, pp 208–210.

[37] ibid, p 148; H Landemore, “Collective wisdom: old and new” in H Landemore and J Elster (eds), Collective wisdom: principles and mechanisms, Cambridge University Press, 2012, p 1. See also A Vermeule, Law and the limits of reason, Oxford University Press, 2009; D Estlund, Democratic authority: a philosophical framework, Princeton University Press, 2009; R Goodin, Reflective democracy, Oxford University Press, 2003, pp 91–108; J Waldron, The dignity of legislation, Cambridge University Press, 1999, pp 124–166.

[38] P Edelman, “On legal interpretations of the Condorcet jury theorem” (2002) 31 Journal of Legal Studies 327. See also L Kornhauser and L Sager, “Unpacking the court” (1986) 96 Yale Law Journal 82.

[39] The Queen v Trade Practices Tribunal; ex p Tasmanian Breweries Pty Ltd (1970) 123 CLR 361 at 374.

[40] Landemore, above n 32, pp 73, 153–154; Vermeule, above n 37, pp 30, 73–4; Estlund, above n 37, p 225; D Estlund, “Opinion leaders, independence and Condorcet’s jury theorem” (1994) 36 Theory and Decision 131 at 138.

[41] Landemore, above n 32, p 102. See generally S Page, The difference — how the power of diversity creates better groups, firms, schools and societies, Princeton University Press, 2008; L Hong and S Page, “Some microfoundations of collective wisdom” in H Landemore and J Elster (eds), Collective wisdom: principles and mechanisms, Cambridge University Press, 2012 at p 56.

[42] See eg R Thaler and C Sunstein, Nudge: improving decisions about health, wealth and happiness, Penguin Books, 2nd edn, 2009 at pp 53–59; Vermeule, above n 37, pp 74, 146–149; C Sunstein, Why societies need dissent, Harvard University Press, 2005, pp 9–11; C Sunstein, Infotopia: how many minds produce knowledge, Oxford University Press, 2006, pp 75–102; J Surowiecki, The wisdom of crowds, Anchor Books, 2nd edn, 2005, pp 23–65.

[43] D Kahneman, Thinking, fast and slow, Penguin Group, 2011 at p 85.

[44] Sunstein, Infotopia, above n 42, pp 200–201.

[45] See eg K Llewellyn, The common law tradition: deciding appeals, W S Hein, 1996, at pp 30–31; Sunstein, Why societies need dissent, above n 42, pp 166–182.

[46] See eg C Sunstein, D Schkade and L Ellman, “Ideological voting on Federal Courts of Appeals: a preliminary investigation” (2004) 90 Virginia Law Review 301; C Sunstein et al, Are judges political? An empirical analysis of the federal judiciary, Brookings Institution Press, 2006, pp 63–78; L Epstein, W Landes and R Posner, The behavior of federal judges: a theoretical and empirical study of rational choice, Harvard University Press, 2013 especially ch 5; C Guthrie, J Rachlinski and A Wistrich, “Blinking on the bench: how judges decide cases” (2007) 93 Cornell Law Review 1.

[47] J Reynolds, Edmund Barton, 3rd edn, Angus & Robertson, 1948 at p viii.

[48] ibid.

[49] Cf G Bolton, Edmund Barton: the one man for the job, Allen & Unwin, 2000 at pp 304–305.

[50] Kitto, above n 3, 793.

[51] Kahneman, above n 43.

[52] See Kahneman, above n 43, pp 236–237.

[53] See Lord Justice Asquith, “Some aspects of the work of the Court of Appeal” (1950) 1 Journal of the Society of Public Teachers of Law 350.

[54] High Court of Australia, Annual Report 2011–2012, 2012, at p 13.

Note: In 2019–2020, the High Court heard 55 appeals and 455 special leave applications were filed: High Court of Australia, Annual Report 2019–2020, 2020, at p 20 at https://cdn.hcourt.gov.au/assets/corporate/annual-reports/HCA_Annual_Report_2019-20.pdf, accessed 5 May 2021.

[55] Supreme Court of the United Kingdom, The Supreme Court Annual Report and Accounts 2011–2012, 2012, at pp 22–24.

Note: The Supreme Court heard 81 appeals and delivered 54 judgments between 1 April 2019 and 31 March 2020. The number of applications for permission to appeal determined by the justices was 238: at www.supremecourt.uk/news/uk-supreme-courts-2019-20-annual-report-and-accounts-published.html, accessed 5 May 2021.

[56] Supreme Court of Canada, Statistics 2002–2012, 2013, at p 4.

Note: For a statistical summary from 2010 to 2020, see www.scc-csc.ca/case-dossier/stat/sum-som-2020-eng.aspx, accessed 5 May 2021.

[57] Chief Justice Roberts, Chief Justice’s Year-End Report on the Federal Judiciary, 2013, p 12.

Note: For the 2020 Year-End Report on the Federal Judiciary, see www.supremecourt.gov/publicinfo/year-end/2020year-endreport.pdf, accessed 5 May 2021.

[58] Justice Isaacs, “The late Mr Justice Higgins” (1928) 41 CLR.