Deep Cross Modal Learning for Caricature Verification and Identification(CaVINet)
Jatin Garg, Skand Vishwanath Peri, Himanshu Tolani, Narayanan C, Krishnan

TL;DR
This paper introduces CaVINet, a deep learning model for cross-modal caricature and visual face verification and recognition, achieving high accuracy without manual landmark annotations and handling extreme distortions.
Contribution
The paper presents the first deep cross-modal architecture for caricature verification that learns shared representations without relying on facial landmarks or identity labels.
Findings
Achieved 91% accuracy in verifying unseen images.
Achieved 75% accuracy in verifying unseen identities.
Attained 85% rank-1 accuracy for caricature recognition.
Abstract
Learning from different modalities is a challenging task. In this paper, we look at the challenging problem of cross modal face verification and recognition between caricature and visual image modalities. Caricature have exaggerations of facial features of a person. Due to the significant variations in the caricatures, building vision models for recognizing and verifying data from this modality is an extremely challenging task. Visual images with significantly lesser amount of distortions can act as a bridge for the analysis of caricature modality. We introduce a publicly available large Caricature-VIsual dataset [CaVI] with images from both the modalities that captures the rich variations in the caricature of an identity. This paper presents the first cross modal architecture that handles extreme distortions of caricatures using a deep learning network that learns similar…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Face and Expression Recognition · Generative Adversarial Networks and Image Synthesis
