Region-Normalized DPO for Medical Image Segmentation under Noisy Judges
Hamza Kalisch, Constantin Seibold, Jens Kleesiek, Ken Herrmann, Frederic Jonske

TL;DR
This paper introduces Region-Normalized DPO, a novel method that improves medical image segmentation by stabilizing preference-based fine-tuning using noisy automatic quality signals, reducing harmful updates and enhancing performance.
Contribution
The paper proposes RN-DPO, a segmentation-aware normalization technique for DPO that mitigates noise bias in preference signals, leading to more stable and effective fine-tuning in medical image segmentation.
Findings
RN-DPO outperforms standard DPO in multiple medical datasets.
Normalization reduces harmful updates from noisy preferences.
RN-DPO stabilizes training and improves segmentation performance.
Abstract
While dense pixel-wise annotations remain the gold standard for medical image segmentation, they are costly to obtain and limit scalability. In contrast, many deployed systems already produce inexpensive automatic quality-control (QC) signals like model agreement, uncertainty measures, or learned mask-quality scores which can be used for further model training without additional ground-truth annotation. However, these signals can be noisy and biased, making preference-based fine-tuning susceptible to harmful updates. We study Direct Preference Optimization (DPO) for segmentation from such noisy judges using proposals generated by a supervised base segmenter trained on a small labeled set. We find that outcomes depend strongly on how preference pairs are mined: selecting the judge's top-ranked proposal can improve peak performance when the judge is reliable, but can amplify harmful…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)
