← Back to all posts

Predictive Coding – The Future of Document Review?

It is no secret that we increasingly store and categorize information in digital form. Anyone who has been involved in a lawsuit is familiar with this fact. Companies and individuals now create and store their records electronically rather than in file cabinets full of paper.

Due to electronically stored information (ESI) and the use of e-mail as the primary form of communication, there has been a dramatic increase in the volume of “information” in the hands of the parties in a lawsuit. Long gone are the days where a dozen boxes or file cabinet drawers was the universe of documents. Now a lawsuit can routinely put at issue millions of emails and documents that need to be reviewed.

The process of reviewing this volume of records requires can require an extensive review process. In a traditional large firm, the review is tackled by a small army of people organized into a tiered pyramid. At the bottom is a legion of junior or contract attorneys who perform a first-level review. Their work is reviewed by a tier of more senior attorneys, who also make decisions about particularly important documents. Finally there is another level of even more senior attorneys who compile by topic the essential documents to put into the hands of the partners. All the parties in the case perform a review of their own documents and produce what they consider to be documents responsive to the other side’s discovery requests. Then both parties complete this process for the documents they receive from the other side. And then, depending on the case, both sides may need to continue reviewing and producing documents.

It hardly needs to be said that this review process becomes very expensive. Millions of pages of documents will require thousands of attorney hours to review. In some lawsuits, the document review portion of discovery will comprise the largest cost to the client. In an attempt to hold down costs, lower billing junior or contract attorneys review the bulk of the documents. But this raises an interesting question regarding the quality of the review itself. These attorneys are less experienced and in some instances less knowledgeable about all the facts or law as the senior attorneys.

I’m highlighting this issue because a recent decision from the Southern District of New York describes a technology that could represent the beginning of a change in the ESI review process. In Moore, et al. v. Publicis Groupe, et al., 11 Civ. 1279 (S.D.N.Y. Feb. 24, 2012), U.S. Magistrate Judge Peck approved the use of predictive coding by the parties in order to search and identify documents responsive to discovery requests. (Although the implementation of the technology has been stayed by the plaintiffs’ motion to recuse Judge Peck.)

Predictive coding is a process of teaching a computer program how to identify responsive documents. The computer program has access to and can ‘read’ the universe of information in the ESI. But it needs to be taught how to recognize what is important and what is not. For that, a random sample of documents are pulled from the universe and coded by senior attorneys. The documents can be coded as being responsive to particular discovery responses or as being not responsive to any request.

The coded information is then fed into the program, where sophisticated algorithms read and learn from the coding. The training process is itself iterative. The program is directed to identify a pool of responsive documents. The attorneys review this pool, re-code, and this information is fed back into the program to improve its predictions. Once trained, the computer can be set free to identify responsive documents.

Judge Peck’s decision is fascinating for many reasons, but here are three important lessons. First, Judge Peck held that attorneys can use predictive coding to identify responsive documents and still certify under Fed. R. Civ. P. 26(g) that their client’s production is “complete” and “correct.” In other words, the attorneys do not have to manually view every documents in order to comply with their discovery obligations.

Second, the ‘seed’ process of teaching the computer algorithms to identify responsive documents relies on senior attorneys. This technology is potentially disruptive to the traditional tiered document review model.

Third, will this technology represent the future of ESI review? Right now, predictive coding is relatively new technology in the context of litigation, which means that it is expensive. At this time, it may only make sense in a case involving terabytes of information. As with any technology, however, the price will drop. And then the question becomes, will the months-long document review project — and the bill that follows — be a thing of past?