POV Learning: Individual Alignment of Multimodal Models using Human Perception

Simon Werner; Katharina Christ; Laura Bernardy; Marion G. M\"uller; Achim Rettinger

arXiv:2405.04443·cs.AI·June 23, 2025·1 cites

POV Learning: Individual Alignment of Multimodal Models using Human Perception

Simon Werner, Katharina Christ, Laura Bernardy, Marion G. M\"uller, Achim Rettinger

PDF

Open Access

TL;DR

This paper introduces POV Learning, a method that uses individual perception data to improve the alignment of multimodal models with personal human expectations, enhancing subjective predictive accuracy.

Contribution

It proposes a novel approach to individual alignment by integrating perception signals into machine learning models, demonstrated through a new dataset and a perception-guided transformer.

Findings

01

Exploiting perception signals improves individual predictive performance.

02

The perception-guided model outperforms baseline in subjective assessments.

03

Personalized alignment can steer AI towards individual expectations.

Abstract

Aligning machine learning systems with human expectations is mostly attempted by training with manually vetted human behavioral samples, typically explicit feedback. This is done on a population level since the context that is capturing the subjective Point-Of-View (POV) of a concrete person in a specific situational context is not retained in the data. However, we argue that alignment on an individual level can boost the subjective predictive performance for the individual user interacting with the system considerably. Since perception differs for each person, the same situation is observed differently. Consequently, the basis for decision making and the subsequent reasoning processes and observable reactions differ. We hypothesize that individual perception patterns can be used for improving the alignment on an individual level. We test this, by integrating perception information into…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems

MethodsAttention Is All You Need · Sparse Evolutionary Training · Dropout · Label Smoothing · Residual Connection · Softmax · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Linear Layer · Byte Pair Encoding