# Crossmodal counterpoint: from music to multimedia – incongruency, cognitive dissonance, irony, and surrealism

**Authors:** Charles Spence, Nicola Di Stefano

PMC · DOI: 10.3389/fpsyg.2026.1728329 · Frontiers in Psychology · 2026-02-09

## TL;DR

This paper explores how combining mismatched audio and visual elements in art can create irony and surrealism, contrasting with typical lab studies on multisensory perception.

## Contribution

The paper introduces the concept of crossmodal counterpoint as a deliberate artistic and rhetorical device, distinct from standard cognitive dissonance theories.

## Key findings

- Artistic use of incongruent audiovisual elements invites reflection and irony rather than sensory averaging.
- Crossmodal counterpoint differs from typical lab-based multisensory integration studies.
- The paper reviews historical and theoretical perspectives on crossmodal counterpoint in multimedia.

## Abstract

Laboratory-based research on multisensory perception often presents participants with unpredictable combinations of auditory and visual stimuli that may be classed (by the experimenter) as either congruent or incongruent. Cognitive neuroscientists generally assume that congruent combinations of experimental stimuli will be processed more fluently and lead to enhanced crossmodal binding and multisensory integration than will incongruent combinations of stimuli. Typically, however, the participants involved in such laboratory research are given little if any information (or context) to explain why these seemingly random combinations of sensory stimuli are being presented. This situation differs markedly from the deliberate combination of eye and ear in an artistic context (say when music is added to film). In the latter case, conflict is sometimes introduced deliberately into a scene. A film director might, for example, choose to combine violent onscreen action with uplifting happy music. The presentation of such audiovisual emotional incongruency invites the viewer to reflect on what they are experiencing, and why this particular combination of stimuli has been chosen. Such crossmodal counterpoint (or cognitive dissonance) is sometimes used as a rhetorical device to introduce a note of irony. It is interesting to note how, in such cases, there is little sense of averaging the sensory inputs (or their emotional effects) as is so often seen when congruent inputs are presented in cognitive psychology studies. In this narrative historical review, we take a critical look at the concept of crossmodal counterpoint, and review the research and theorizing on its use.

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12926423/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12926423/full.md

## References

144 references — full list in the complete paper: https://tomesphere.com/paper/PMC12926423/full.md

---
Source: https://tomesphere.com/paper/PMC12926423