Goldilocks: Just-Right Tuning of BERT for Technology-Assisted Review
Eugene Yang, Sean MacAvaney, David D. Lewis, Ophir Frieder

TL;DR
This paper evaluates the effectiveness of BERT, a transformer-based model, in technology-assisted review workflows, finding that domain-specific fine-tuning is crucial for optimal performance, with mixed results across different datasets.
Contribution
It demonstrates the importance of task-specific fine-tuning of BERT in TAR workflows and highlights the impact of domain match on model effectiveness.
Findings
BERT reduces review costs by 10-15% on RCV1-v2.
Linear models outperform BERT on legal discovery datasets.
Proper fine-tuning is critical for BERT's success in TAR.
Abstract
Technology-assisted review (TAR) refers to iterative active learning workflows for document review in high recall retrieval (HRR) tasks. TAR research and most commercial TAR software have applied linear models such as logistic regression to lexical features. Transformer-based models with supervised tuning are known to improve effectiveness on many text classification tasks, suggesting their use in TAR. We indeed find that the pre-trained BERT model reduces review cost by 10% to 15% in TAR workflows simulated on the RCV1-v2 newswire collection. In contrast, we likewise determined that linear models outperform BERT for simulated legal discovery topics on the Jeb Bush e-mail collection. This suggests the match between transformer pre-training corpora and the task domain is of greater significance than generally appreciated. Additionally, we show that just-right language model fine-tuning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Software Engineering Research · Expert finding and Q&A systems
MethodsMulti-Head Attention · Linear Layer · Refunds@Expedia|||How do I get a full refund from Expedia? · WordPiece · Logistic Regression · Dense Connections · Attention Is All You Need · Residual Connection · Attention Dropout · Adam
