A Generalised and Adaptable Reinforcement Learning Stopping Method

Reem Bin-Hezam; Mark Stevenson

arXiv:2505.01907·cs.IR·July 8, 2025

A Generalised and Adaptable Reinforcement Learning Stopping Method

Reem Bin-Hezam, Mark Stevenson

PDF

1 Repo

TL;DR

This paper introduces a flexible reinforcement learning-based stopping method for Technology Assisted Review that adapts to multiple recall targets and balances recall with cost, outperforming existing approaches.

Contribution

It develops a novel RL environment, GRLStop, enabling a single model to handle various recall targets and tradeoffs, improving control and effectiveness.

Findings

01

Effective across six benchmark datasets

02

Outperforms multiple baseline methods

03

Offers greater flexibility in stopping decisions

Abstract

This paper presents a Technology Assisted Review (TAR) stopping approach based on Reinforcement Learning (RL). Previous such approaches offered limited control over stopping behaviour, such as fixing the target recall and tradeoff between preferring to maximise recall or cost. These limitations are overcome by introducing a novel RL environment, GRLStop, that allows a single model to be applied to multiple target recalls, balances the recall/cost tradeoff and integrates a classifier. Experiments were carried out on six benchmark datasets (CLEF e-Health datasets 2017-9, TREC Total Recall, TREC Legal and Reuters RCV1) at multiple target recall levels. Results showed that the proposed approach to be effective compared to multiple baselines in addition to offering greater flexibility.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

reembinhezam/grlstop
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.