Counterfactual Peptide Editing for Causal TCR--pMHC Binding Inference
Sanjar Khudoyberdiev, Arman Bekov

TL;DR
This paper introduces CIP, a training framework that improves TCR-pMHC binding prediction by generating biologically constrained counterfactual peptide edits to reduce shortcut learning and enhance out-of-distribution robustness.
Contribution
CIP is a novel training method that enforces invariance to non-anchor peptide positions and sensitivity to anchor residues, improving causal TCR specificity modeling.
Findings
CIP achieves AUROC 0.831 on a challenging benchmark.
CIP reduces shortcut index by 39.7%.
Anchor-aware edits drive out-of-distribution gains.
Abstract
Neural models for TCR-pMHC binding prediction are susceptible to shortcut learning: they exploit spurious correlations in training data -- such as peptide length bias or V-gene co-occurrence -- rather than the physical binding interface. This renders predictions brittle under family-held-out and distance-aware evaluation, where such shortcuts do not transfer. We introduce \emph{Counterfactual Invariant Prediction} (CIP), a training framework that generates biologically constrained counterfactual peptide edits and enforces invariance to edits at non-anchor positions while amplifying sensitivity at MHC anchor residues. CIP augments the base classifier with two auxiliary objectives: (1) an invariance loss penalizing prediction changes under conservative non-anchor substitutions, and (2) a contrastive loss encouraging large prediction changes under anchor-position disruptions. Evaluated on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
