ECPO: Evidence-Coupled Policy Optimization for Evidence-Certified Candidate Ranking
Miaobo Hu, Shuhao Hu, BoKun Wang, Yina Sa, Xin Wang, Xiaobo Guo, Daren Zha, Jun Xiao

TL;DR
This paper introduces ECPO, a policy optimization method that jointly improves candidate ranking and evidence certification, enhancing decision transparency and trustworthiness in decision-support systems.
Contribution
The paper presents ECPO, a novel listwise policy optimization framework that couples ranking utility with evidence validity and cycle consistency, advancing evidence-certified candidate ranking.
Findings
ECPO outperforms baseline policies in evidence certification accuracy.
ECPO effectively maximizes CertNDCG, balancing ranking quality and evidence validity.
The approach demonstrates robustness across multiple evaluation settings.
Abstract
Ranking systems used in decision-support settings should not only order candidates but also expose evidence that can be independently checked. We study evidence-certified candidate ranking: given an intent_id, a predefined plan skeleton, a window-local candidate roster, and text-derived candidate trajectories with span provenance, a system must output a Top-K list together with doc_id:span evidence certificates whose cited spans are sufficient to recover the decision. We instantiate this task on MAVEN-ERE and RAMS with fixed upstream extraction, window-local randomized candidate identifiers, skeleton-aligned trajectory supervision, hard negatives, and audit references. We introduce Evidence-Coupled Policy Optimization (ECPO), a listwise policy-optimization objective whose action is the joint object of ranking and evidence certificate. ECPO first learns an interpretable trajectory reward…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
