One of the artificial intelligence tools presently used in dispute resolution is predictive coding. Predictive coding is a form of supervised machine learning tool that takes human review-based data input about document relevance and applies it to a larger document population. With predictive coding, relevant or responsive documents are identified by an algorithm. Predictive coding has been in use in English court litigation for a number of years, with BCLP acting in Brown v BCA Trading Ltd  EWHC 1464 (Ch) – the first contested application to use predictive coding. Its use is also permitted in the U.S. courts, some of which have recognised that the results of predictive coding review are statistically superior to a traditional manual review (Moore v Publicis Groupe 11 Civ 1279 (ALC) (AJP) II Civ 1279 (ALC) (AJP) (US District Court SDNY, 24 February 2012).
In contrast, the extent to which predictive coding is used in arbitral proceedings, and how it should be used is unclear. The 2018 Queen Mary survey indicated that 5% of respondents use artificial intelligence, including data analytics and technology assisted document review, frequently, and 3% always use it. Yet, the use of predictive coding is rarely discussed among the parties or with the tribunal at any stage of the proceedings and there are currently no rules or best practice guidance on this subject. For example, the IBA Arb 40 Guide on Technology Resources for Arbitration Practitioners does mention predictive coding as part of its list of different software tools available for document review. However, it does not address the question of whether predictive coding can be used in arbitration and if so, under what conditions, including in terms of permission and disclosure to the other parties and the tribunal.
This blog post discusses how predictive coding is likely to impact document production in international arbitration and what guidance is required regarding its use, any disclosure obligations and case management.
How does predictive coding work?
The process starts with setting parameters and identifying documents that will form the sample set of documents to be reviewed. This sample is generally reviewed by a senior lawyer with good knowledge of the case. The human reviewer codes the documents, ‘relevant’ or ‘not relevant’, or for document requests, ‘responsive’ or ‘not responsive’, which in turn trains a predictive formula (algorithm).
The algorithm is applied to the entire review set of documents either continuously or upon completion of the sample set’s review. Relevant or responsive documents are identified by the algorithm and ranked in order of relevance/responsiveness. Further documents may be reviewed by the senior lawyer to improve the algorithm. These include documents flagged by the tool as being the subject of inconsistent treatment by the algorithm on the one hand, and by the human reviewer on the other. The human review continues until an acceptable confidence and response score is reached. At this stage, documents identified by the algorithm as irrelevant/non-responsive are randomly sampled for quality control.
The drive for efficiency
Arbitration rules and arbitration laws do not address either artificial intelligence, or accepted means to search documents requested by the other side and/or ordered by the tribunal. Aside from the duty of good faith reflected in certain arbitration regimes, there is not a clear test to determine whether a party has met its obligation to search documents. However, most of the major institutional rules and arbitration laws encourage tribunals and parties to conduct arbitrations in an efficient and cost-effective manner. This implicitly includes the use of electronic tools: the use of which is encouraged by a number of guidelines including the ICC Commission Report for Managing E-Document Production and the CIArb Protocol for E-Disclosure in Arbitration. The idea that responsive documents should be searched efficiently is also reflected in Article 3.3(a)(ii) of the IBA Rules on the Taking of Evidence in International Arbitration (the “IBA Rules”).
Why use predictive coding in international arbitration?
Predictive coding is potentially more efficient than a manual search terms-based review. Members of the legal team only need to review documents identified by the algorithm as relevant/responsive (and a random sample of irrelevant/non-responsive documents for quality control). This is instead of needing to manually review documents located in the files of certain custodians, between certain date ranges, or filtered through search terms. A study in 2011 demonstrated the significant cost savings for using predictive coding technology over manual review as “the technology-assisted reviews require, on average, human review of only 1.9% of the documents, a fifty-fold savings over exhaustive manual review”.
Predictive coding generally achieves a higher level of consistency and therefore a lower risk of error (both of which can be determined from the outset). Most of the time, predictive coding is used in conjunction with keyword searches. Since less documents are reviewed by a human, the human-based review is generally carried out by a single lawyer, rather than a group of different lawyers. This not only makes the exercise less costly but limiting human review to one or a small number of more experienced reviewers, also impacts consistency and quality.
Some studies have actually concluded that predictive coding tends to be more accurate than manual review. An empirical survey in 2010 revealed that the performance of two computer systems (by two different providers) was at least as accurate as that of human review.
Some arbitration practitioners have expressed concerns about using predictive coding to search documents responsive to document requests made by another party. The concern relates to responsive documents being searched through an algorithm rather than manual review. Indeed, document production obligations are typically sanctioned by virtue of ethical rules, to which algorithms and software are not bound. This raises the question of who would be responsible for a mistake made by an algorithm.
It is certainly possible that a particular algorithm may wrongly identify a document as relevant/responsive or, more problematically, miss a relevant/responsive document. The accuracy of an algorithm and the risk of errors actually form part of any predictive coding tool. Statistics about recall rate, accuracy and the potential for errors can be run at different stages of the review. While these statistics indeed reflect a risk of error, they offer the party using predictive coding the benefit of understanding and quantifying that risk. If the use of predictive coding is shared with the other side, this benefit can be shared with them and the arbitral tribunal. This is not something that is possible with a traditional search term or human-based review.
Traditional review methods do also involve risks of errors, but these risks are simply not considered or quantified. Human mistakes rarely come to light in arbitral proceedings and the risk associated with human error may therefore be understated. This may distort general perceptions when comparing artificial intelligence with human intelligence in the context of document review. Search terms-based reviews also have a significant potential for mistakes. Some requests for documents are very difficult to capture through search terms. For example, search terms may not successfully draw documents relating to abstract or negative concepts, such as unfair prejudice or the presence or absence of control of one entity over another. Predictive coding is particularly suitable for these types of document requests.
Disclosing, agreeing and ruling on predictive coding
Whether predictive coding will be required in a particular arbitration might not be clear until document requests are exchanged. This does not mean that predictive coding cannot, or should not, be raised prior to or as part of the case management conference. At that stage, the parties could agree, and/or arbitrators could order, that a party intending to use predictive coding should disclose this to the other side and/or seek permission from the tribunal. Disclosure would ensure that the other party and the tribunal have a chance to ask questions and satisfy themselves that predictive coding was used in an appropriate manner. This would also avoid an argument that the use of predictive coding and/or failure to disclose its use constitutes a procedural irregularity and a ground for challenge. Such an argument may be raised at a later stage in the proceedings, or even following an award being rendered, which would reverse the use of predictive coding as a costs-saving measure to a costs-wasting measure. Engaging in discussions early about this issue could protect against those later arguments, or at least make the process more defensible.
Any disclosure of and/or permission for the use of predictive coding may equally have an impact on whether costs associated with it are recoverable as part of the winning party’s costs.
If predictive coding is used and appropriate disclosure is made, there will likely be discussions about how many documents should be reviewed manually and the accuracy level of the algorithm as both of these factors have a bearing on the extent of the risk that relevant documents may be missed. Parties using predictive coding should also be prepared to report on the number of documents manually reviewed for the purpose of training the algorithm. If the parties are unable to reach an agreement, the arbitral tribunal will need to rule on these issues as part of its overarching power to deal with procedural and evidentiary matters.
It is ultimately the decision of an individual party to consider whether they might want to use this tool. However, parties, counsel and arbitrators should be aware of its existence and application in the context of arbitral proceedings. Besides the interesting policy considerations arising from the development of artificial intelligence, practitioners may want to increase their practical understanding of predictive coding as they will likely need to address this new area in the future. This may be to ensure that an order to produce documents will, or has been, complied with, or to advance or oppose an application seeking permission to use predictive coding. Further, given the constant increase of documents in international arbitration and the undeniable costs-saving element to predictive coding, counsel will increasingly be expected to use predictive coding and advise clients on how to use it strategically.