JEMA: A Joint Embedding Framework for Scalable Co-Learning with   Multimodal Alignment

Joao Sousa; Roya Darabi; Armando Sousa; Frank Brueckner; Lu\'is Paulo; Reis; and Ana Reis

arXiv:2410.23988·cs.CV·November 1, 2024·3 cites

JEMA: A Joint Embedding Framework for Scalable Co-Learning with Multimodal Alignment

Joao Sousa, Roya Darabi, Armando Sousa, Frank Brueckner, Lu\'is Paulo, Reis, and Ana Reis

PDF

Open Access

TL;DR

JEMA is a scalable joint embedding framework that leverages multimodal data and contrastive learning to improve process monitoring and downstream tasks in laser metal deposition, with enhanced interpretability and minimal fine-tuning.

Contribution

The paper introduces JEMA, a novel multimodal co-learning framework that improves LMD process monitoring and downstream task performance using transferable embeddings and contrastive loss.

Findings

01

8% performance increase in multimodal settings

02

1% improvement in unimodal settings

03

Effective generalization to downstream tasks like melt pool prediction

Abstract

This work introduces JEMA (Joint Embedding with Multimodal Alignment), a novel co-learning framework tailored for laser metal deposition (LMD), a pivotal process in metal additive manufacturing. As Industry 5.0 gains traction in industrial applications, efficient process monitoring becomes increasingly crucial. However, limited data and the opaque nature of AI present challenges for its application in an industrial setting. JEMA addresses this challenges by leveraging multimodal data, including multi-view images and metadata such as process parameters, to learn transferable semantic representations. By applying a supervised contrastive loss function, JEMA enables robust learning and subsequent process monitoring using only the primary modality, simplifying hardware requirements and computational overhead. We investigate the effectiveness of JEMA in LMD process monitoring, focusing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Natural Language Processing Techniques · Innovative Teaching and Learning Methods

MethodsLabel Smoothing · Position-Wise Feed-Forward Layer · Adam · Softmax · Linear Layer · Byte Pair Encoding · Dropout · Absolute Position Encodings · Transformer · Dense Connections