SIRR-LMM: Single-image Reflection Removal via Large Multimodal Model
Yu Guo, Zhiqiang Lao, Xiyun Song, Yubin Zhou, Heather Yu

TL;DR
This paper presents SIRR-LMM, a novel approach for single-image reflection removal that uses a large multimodal model fine-tuned with a new synthetic dataset generated via physically accurate path-tracing of glass scenarios.
Contribution
The paper introduces a new synthetic dataset generation framework and a method to fine-tune large multimodal models for improved reflection removal from single images.
Findings
Enhanced reflection removal performance over state-of-the-art methods
Physically accurate synthetic dataset improves model training
Effective use of joint captioning and LoRA fine-tuning
Abstract
Glass surfaces create complex interactions of reflected and transmitted light, making single-image reflection removal (SIRR) challenging. Existing datasets suffer from limited physical realism in synthetic data or insufficient scale in real captures. We introduce a synthetic dataset generation framework that path-traces 3D glass models over real background imagery to create physically accurate reflection scenarios with varied glass properties, camera settings, and post-processing effects. To leverage the capabilities of Large Multimodal Model (LMM), we concatenate the image layers into a single composite input, apply joint captioning, and fine-tune the model using task-specific LoRA rather than full-parameter training. This enables our approach to achieve improved reflection removal and separation performance compared to state-of-the-art methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Enhancement Techniques · Computer Graphics and Visualization Techniques · Advanced Image Processing Techniques
