Auxiliary Cross-Modal Representation Learning with Triplet Loss   Functions for Online Handwriting Recognition

Felix Ott; David R\"ugamer; Lucas Heublein; Bernd Bischl and; Christopher Mutschler

arXiv:2202.07901·cs.LG·September 12, 2023

Auxiliary Cross-Modal Representation Learning with Triplet Loss Functions for Online Handwriting Recognition

Felix Ott, David R\"ugamer, Lucas Heublein, Bernd Bischl and, Christopher Mutschler

PDF

Open Access

TL;DR

This paper introduces a triplet loss-based cross-modal representation learning approach for online handwriting recognition, leveraging image and time-series data to improve classification accuracy, convergence speed, and generalizability.

Contribution

It adapts triplet loss with a dynamic margin for cross-modal learning between images and time-series data, enhancing handwriting recognition performance.

Findings

01

Improved classification accuracy in handwriting recognition tasks.

02

Faster convergence and better generalization observed.

03

Enhanced adaptability between writers for online handwriting recognition.

Abstract

Cross-modal representation learning learns a shared embedding between two or more modalities to improve performance in a given task compared to using only one of the modalities. Cross-modal representation learning from different data types -- such as images and time-series data (e.g., audio or text data) -- requires a deep metric learning loss that minimizes the distance between the modality embeddings. In this paper, we propose to use the contrastive or triplet loss, which uses positive and negative identities to create sample pairs with different labels, for cross-modal representation learning between image and time-series modalities (CMR-IS). By adapting the triplet loss for cross-modal representation learning, higher accuracy in the main (time-series classification) task can be achieved by exploiting additional information of the auxiliary (image classification) task. We present a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Music and Audio Processing · Hand Gesture Recognition Systems

MethodsTriplet Loss