Bridging Semantic Logic Gaps: A Cognition Inspired Multimodal Boundary Preserving Network for Image Manipulation Localization
Songlin Li, Zhiqing Guo, Yuanman Li, Zeyu Li, Yunfeng Diao, Gaobo Yang, and Liejun Wang

TL;DR
This paper introduces CMB-Net, a cognition-inspired multimodal network that leverages language models and boundary-preserving techniques to improve image manipulation localization by integrating semantic and visual cues.
Contribution
The paper proposes a novel multimodal network that combines language analysis with visual features and boundary preservation for enhanced image manipulation localization.
Findings
CMB-Net outperforms most existing IML models in experiments.
The use of LLM-generated textual cues improves semantic understanding.
Boundary preservation via RED maintains edge details effectively.
Abstract
The existing image manipulation localization (IML) models mainly relies on visual cues, but ignores the semantic logical relationships between content features. In fact, the content semantics conveyed by real images often conform to human cognitive laws. However, image manipulation technology usually destroys the internal relationship between content features, thus leaving semantic clues for IML. In this paper, we propose a cognition inspired multimodal boundary preserving network (CMB-Net). Specifically, CMB-Net utilizes large language models (LLMs) to analyze manipulated regions within images and generate prompt-based textual information to compensate for the lack of semantic relationships in the visual information. Considering that the erroneous texts induced by hallucination from LLMs will damage the accuracy of IML, we propose an image-text central ambiguity module (ITCAM). It…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques · Neural Networks and Applications
