Towards Multimodal Content Representation
Harry Bunt, Laurent Romary (INRIA Lorraine - LORIA)

TL;DR
This paper discusses the development of a generic framework for multimodal content representation to enhance multimodal interfaces' understanding, generation, and management capabilities, based on insights from a dedicated workshop.
Contribution
It proposes a foundational approach to multimodal content representation, integrating insights from a workshop to guide future research and development.
Findings
Identifies objectives and constraints for multimodal content representation
Highlights the importance of a generic, flexible framework
Synthesizes workshop insights on multimodal meaning representation
Abstract
Multimodal interfaces, combining the use of speech, graphics, gestures, and facial expressions in input and output, promise to provide new possibilities to deal with information in more effective and efficient ways, supporting for instance: - the understanding of possibly imprecise, partial or ambiguous multimodal input; - the generation of coordinated, cohesive, and coherent multimodal presentations; - the management of multimodal interaction (e.g., task completion, adapting the interface, error prevention) by representing and exploiting models of the user, the domain, the task, the interactive context, and the media (e.g. text, audio, video). The present document is intended to support the discussion on multimodal content representation, its possible objectives and basic constraints, and how the definition of a generic representation framework for multimodal content representation may…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Natural Language Processing Techniques · Language, Metaphor, and Cognition
