Revolutionizing Document Review with AI

Book Review: Jim Sullivan, “The Book on AI Doc Review”

The thesis of the book is that computers are capable of reviewing and classifying documents better than humans. This capability is particularly significant in the realm of eDiscovery.

AI Document Review vs. Traditional Methods

As suggested by its title, the book focuses on AI document review and contrasts it with Technology Assisted Review (TAR) and predictive coding. While TAR employs humans to train the machine, AI operates on prompts to specify what to look for, without utilizing traditional “training examples.” An example instruction provided is:

“All documents where an Acme employee suggests that pricing of widgets should be modified.”

These instructions resemble a Request for Production, aligning closely with legal terminology.

Effectiveness of AI in Document Review

Mr. Sullivan claims that AI-powered review can identify over 95% of relevant documents. The book includes intriguing chapters that guide readers through a relevancy review, using random sampling to conduct quality control (QC) on the results.

Validation Techniques

To validate the results, the author employs standard metrics such as recall and precision, which have traditionally been used for keyword searches but are now applied to AI. The formulas are:

Recall = TP/(TP + FN)
Precision = TP/(TP + FP)

He introduces the concept of a confusion matrix for calculating these metrics, facilitated by a subject-matter expert. Techniques such as sampling from the “discard pile” are discussed to enhance iterative queries.

Defensibility of AI Review

Mr. Sullivan emphasizes the crucial role of validation in demonstrating high-quality output. He states:

“The only thing that matters is how you validate the results and demonstrate high-quality output.”

He outlines a straightforward process for predictive coding that applies equally to AI:

  1. Identify the review set.
  2. Train the machine.
  3. Run the documents through the classifier.
  4. Evaluate the results.

Practical Examples and Recommendations

The book provides concrete examples, particularly for the first step, recommending the removal of ROT (redundant, obsolete, or trivial) documents, as well as files without text, audio files, images, and large files, along with deduplication strategies.

Additionally, Mr. Sullivan advises on pre-validation, which involves running prompts against a random sample before applying them to the entire dataset. A subject-matter expert reviews the “hits” to establish a benchmark for recall and precision, offering a cost-saving measure.

Refining Prompts

Prompts can be refined using inclusion or exclusion criteria. For instance, an inclusion criterion might state:

“Any discussion about qualifications in hiring should be deemed relevant.”

Conversely, an exclusion criterion could be:

“Any discussion about hiring anyone other than coaches or management should be considered not relevant.”

Conclusion

Mr. Sullivan explores various review options, including AI-Powered Linear Review and AI/CAL Hybrid Review, where batches are selected using AI. He also raises important points regarding confidentiality and security, advising readers:

“If you aren’t paying for a product, you are the product.”

He provides essential questions to ask AI providers to ensure security in document review processes.

More Insights

Revolutionizing Drone Regulations: The EU AI Act Explained

The EU AI Act represents a significant regulatory framework that aims to address the challenges posed by artificial intelligence technologies in various sectors, including the burgeoning field of...

Revolutionizing Drone Regulations: The EU AI Act Explained

The EU AI Act represents a significant regulatory framework that aims to address the challenges posed by artificial intelligence technologies in various sectors, including the burgeoning field of...

Embracing Responsible AI to Mitigate Legal Risks

Businesses must prioritize responsible AI as a frontline defense against legal, financial, and reputational risks, particularly in understanding data lineage. Ignoring these responsibilities could...

AI Governance: Addressing the Shadow IT Challenge

AI tools are rapidly transforming workplace operations, but much of their adoption is happening without proper oversight, leading to the rise of shadow AI as a security concern. Organizations need to...

EU Delays AI Act Implementation to 2027 Amid Industry Pressure

The EU plans to delay the enforcement of high-risk duties in the AI Act until late 2027, allowing companies more time to comply with the regulations. However, this move has drawn criticism from rights...

White House Challenges GAIN AI Act Amid Nvidia Export Controversy

The White House is pushing back against the bipartisan GAIN AI Act, which aims to prioritize U.S. companies in acquiring advanced AI chips. This resistance reflects a strategic decision to maintain...

Experts Warn of EU AI Act’s Impact on Medtech Innovation

Experts at the 2025 European Digital Technology and Software conference expressed concerns that the EU AI Act could hinder the launch of new medtech products in the European market. They emphasized...

Ethical AI: Transforming Compliance into Innovation

Enterprises are racing to innovate with artificial intelligence, often without the proper compliance measures in place. By embedding privacy and ethics into the development lifecycle, organizations...

AI Hiring Compliance Risks Uncovered

Artificial intelligence is reshaping recruitment, with the percentage of HR leaders using generative AI increasing from 19% to 61% between 2023 and 2025. However, this efficiency comes with legal...