The unity of sense and mind: A review of cross-domain mapping

Qiawen Liu; Gary Lupyan

PMC · DOI:10.3758/s13423-025-02805-3·January 5, 2026

The unity of sense and mind: A review of cross-domain mapping

Qiawen Liu, Gary Lupyan

PDF

Open Access

TL;DR

This paper explores how people connect different senses and ideas, suggesting they use shared mental processes to understand similarities across diverse experiences.

Contribution

The paper introduces a unified framework showing that cross-sensory and cross-conceptual mappings share common mechanisms.

Findings

01

Cross-sensory and cross-conceptual mappings are interconnected through shared mechanisms.

02

Statistical learning, magnitude matching, valence matching, and semantic mediation underlie these mappings.

03

The framework offers new insights into human representation of similarity and connection discovery.

Abstract

If the sound of a trombone had a taste, would it be bitter? In what way is solving a puzzle like navigating a relationship? People consistently map information across sensory modalities and conceptual domains. Such cross-sensory and cross-conceptual mappings have tended to be studied separately. We argue here that these mappings share underlying mechanisms and are more interconnected than previously thought. We present evidence that these mappings arise from a combination of statistical learning, magnitude matching, valence matching, and semantic mediation, involving an interplay between perception and conception. By bringing cross-sensory and cross-conceptual mappings into a common framework, we offer new insights into how people represent similarity and highlight promising avenues for understanding how humans discover and create connections across seemingly disparate domains.

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Species1

Homo sapiens(human · species)

Figures3

Click any figure to enlarge with its caption.

Schematic illustration of tasks for evaluating people’s judgment of compatibility between cross-domain items. (a) A matching task for cross-sensory (in this case, pitch-lightness) mapping. (b) A matching task for cross-conceptual (e.g., valence-space) mapping, participants are asked to choose whether the alien on the left-hand or right-hand side is more positive/negative. (c)-(d) Participants indicate how well they think two stimuli go with each other using a Likert scale (e)-(f) Participants rate a range of cross-sensory (e.g., taste and shape) or cross-conceptual (animals and jobs) on a set

Manipulating stimuli from one domain impacts participants’ experiences in another domain. (a) Participants rated the darker salsa as spicier. (b) Participants estimated the same ambient temperature as lower after recalling a socially exclusionary experience compared to recalling a socially inclusive experience. (Color figure online)

Funding1

—NSF PAC

Keywords

Cross-domain mappingMetaphorCross-modal correspondenceSimilarity

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsChild and Animal Learning Development · Face Recognition and Perception · Action Observation and Synchronization

Full text

Introduction

People can easily compare things that belong to the same sensory modality or conceptual domains, such as telling whether two lights have the same brightness, whether two sounds have the same pitch, whether one event lasted longer than another, or whether the concept of doctor is more similar to that of nurse or plumber. It is thought we perform such comparisons by reporting on the representational overlap on the features/dimensions of interest (Nosofsky, 2011; Tversky, 1977). For example, tasked with comparing the brightness of two lights, we can attend to the dimension of interest—brightness—and report on its relative difference (Fotios & Cheal, 2010). What is interesting, however, is that people are also able to readily perform such comparisons between domains. We can consistently perform cross-domain mappings between different sensory modalities, such as matching visual brightness to auditory pitch (Marks, 1974, 1975, 1987; Martino & Marks, 1999; Melara, 1989) or matching auditory pitch to different tastes (Crisinel & Spence, 2010b). We can also consistently map across different conceptual domains, such as comparing space to time (Boroditsky, 2000; R. Núñez & Cooperrider, 2013; Santiago et al., 2007), space to number (Bueti & Walsh, 2009; de Hevia et al., 2014; de Hevia & Spelke, 2010; Shaki & Fischer, 2008), and musical instruments to human occupations (e.g., the concept of doctor is more similar to a piano than to a drum; Liu & Lupyan, 2023).

Cross-sensory and cross-conceptual mappings have largely developed as distinct research traditions with different theoretical frameworks*.* Cross-sensory mappings are predominantly studied in the context of multisensory integration—a process of combining noisy inputs from different modalities to form a more robust multimodal percept (Spence, 2011). To a lesser extent, they have also been studied in the context of synesthesia—a neurological condition in which stimulation of one sensory/cognitive pathway leads to automatic, involuntary experiences in a different sensory/cognitive pathway (Ramachandran & Hubbard, 2001) and iconicity—the resemblance between the form of a sign and its meaning (Dingemanse et al., 2015). Cross-conceptual mappings, on the other hand, have mostly been studied in the context of metaphors (Lakoff & Johnson, 1980a, 1999) and analogical reasoning (Gentner, 1983; Gentner & Markman, 1997).

Although cross-sensory and cross-conceptual mappings have often been investigated within their respective traditions, connections between the two types of mappings have been noted in various fields. Studies of metaphor by cognitive linguists have long emphasized the sensory basis of metaphors (Lakoff, 2009; Lakoff & Johnson, 1980b), and studies of cross-sensory mappings in language (e.g., linguistic synesthesia) can function as a type of cross-conceptual mapping (Lievers, 2017; Zhao et al., 2022). In the neuroscientific literature, it was conjectured that the same cortical regions (e.g., the angular gyrus) are involved in both metaphorical thinking and sensory synesthesia (Ramachandran & Hubbard, 2001). The purpose of this review is to more systematically examine the evidence of such correspondences which may provide insights into more general underlying mechanisms (Harnad, 2005; Shepard, 1987, 1994).

Our goal is to build on these insights by conducting a systematic comparative analysis of both types of mappings. We examine evidence from a range of empirical studies supporting the interplay between perceptual and conceptual processes in cross-domain mappings. This exploration leads to the conclusion that both cross-sensory and cross-conceptual mappings involve a synergy between relatively conceptual and perceptual levels of processing, enabling people to flexibly draw on varied types of information when making cross-domain mappings.

We begin with a brief overview of cross-sensory and cross-conceptual mappings. We then explore common mechanisms responsible for both types of mappings and discuss whether these mechanisms are best considered at a more conceptual or perceptual level. Finally, we identify outstanding questions and suggest the next steps.

Terminology and scope

We define cross-domain mappings as correspondences between sensory modalities or conceptual domains. Within this broader domain, we use the term cross-sensory mapping to refer to correspondences between one sensory modality (i.e., sight, touch, smell, hearing, taste) and another sensory modality. For example, matching auditory pitch to visual brightness (Spence, 2011; Spence & Parise, 2012). Researchers have studied cross-sensory mappings between a variety of sensory modalities: vision-audition (see Spence, 2011, for a review), taste/flavor-audition (see Guedes et al., 2023; Knöferle & Spence, 2012, for reviews), touch-audition (e.g., Yau et al., 2009), olfaction-audition (e.g., Speed, Atkinson, et al., 2021a, 2021b), vision-taste/flavor (e.g., Motoki & Velasco, 2021), vision-touch (e.g., Martino & Marks, 2000) vision-olfaction (e.g., Demattè et al., 2009), taste-touch (e.g., Pramudya et al., 2020). It is likely that cross-sensory mappings exist between all conceivable domain pairs, and this body of research continues to expand. Table 1 lists 72 empirical investigations of cross-sensory mappings, showing for each study the specific domains involved and the observed associations. Table 1. Selected (Although not exhaustive, the selection of studies in Table 1 and Table 2 aims to cover a wide range of cross-domain mapping, a variety of empirical tasks and populations over the past five decades, and highlighted those that provide evidence for perceptual and/or conceptual-level explanations of cross-domain mappings.) cross-sensory mappings alongside the evidence favoring perceptual and/or conceptual-level explanations. The numbers in the parentheses that follow the claim correspond to the superscript numerals in the ‘Studies’ columnDomainsMappingDescriptionStudiesEvidence favoring perceptual explanationEvidence favoring conceptual explanationHearing & SightPitch–sizeHigher pitch to smaller, lower pitch to bigger(Antović et al., 2020; Bonetti & Costa, 2018; Cuturi et al., 2021^1^; Eitan et al., 2014; Evans & Treisman, 2010; Fernández-Prieto et al., 2015^2^; Gallace & Spence, 2006; Mondloch & Maurer, 2004; Parise & Spence, 2009; Tonelli et al., 2017)Visually impaired children show less pitch-size correspondence, with residual vision positively predict the strength of association (1) Observed in 6-month-old infants (2)Pitch–heightHigher pitch to upper, lower pitch to lower(Antović et al., 2020; Chiou & Rich, 2012; Dolscheid et al., 2013, 2014^1^, 2020^2^, 2023^3^; Evans & Treisman, 2010; Fernandez-Prieto et al., 2017^4^; Holler et al., 2022^5^; Korzeniowska et al., 2019^6^; Melara & Marks, 1990^7^; Melara & O’Brien, 1987; Parise et al., 2014^8^; Parkinson et al., 2012^9^; Rusconi et al., 2006; P. Walker et al., 2010^10^, 2014^11^, 2018^12^)Observed in newborn (12), and 3 to 4 month-old infants (10–11) and 4-month-old infants across cultures (1); Observed in non-human animals like dogs (6); Large-scale natural auditory scene statistics reflect the same mapping (8); Present in populations that lack high/low labels of pitch (9)Observed using verbal labels (high/low) and visual bigrams (LO and HI) (7); Language-dependent pitch-height association patterns (2–5)Pitch–brightnessHigher pitch to brighter, lower pitch to darker(Anikin & Johansson, 2019; Ludwig et al., 2011^1^; Marks, 1974; Martino & Marks, 1999^2^)Observed in chimpanzees (1)Observed using verbal labels (black/white) and semantically related words (night and day) (2)Pitch–shapeHigher pitch to sharper, lower pitch to more round(Marks, 1987; L. Walker et al., 2012^1^; P. Walker, 2012^2^)Observed using pitch/shape-related words (2); Aligned on a set of semantic dimensions (1)Pitch–saturationHigher pitch to more saturated color(Anikin & Johansson, 2019)Loudness–brightnessLouder to brighter in infants; flexible matching between louder to darker or louder to brighter in adulthood(Johansson et al., 2024; Lewkowicz & Turkewitz, 1980^1^; Marks, 1974^2^; Smith & Sera, 1992^3^; J. C. Stevens & Marks, 1965^4^)Observed in 3 to 4-week-old infants based on intensity matching (1); The matching between loudness and brightness is predicted by the psychophysical power law of sensory magnitude (4)Adults (but not young children) can flexibly make cross-domain mapping between either loudness-brightness or loudness-dimness, depending on whether light or dark ends are treated as “more” (2–3)Loudness–heightLouder to upper, quieter to lower(Bruzzi et al., 2017; Fernandez-Prieto et al., 2017; Puigcerver et al., 2019)Loudness–saturationLouder to more saturated color(Anikin & Johansson, 2019; Whiteford et al., 2018^1^)Explained by alignment on emotional dimensions. Partialling out the variance explained by emotion mediation eliminated the effect of correspondence between modality-specific perceptual features (1);Loudness–angularityLouder to more angular, quieter to more curved(Blazhenkova & Kumar, 2018^1^)Observed using verbal labels (e.g., loud, quiet) (1)Tempo–saturationFaster music to more saturated color; slower music to less saturated color(Palmer et al., 2013^1^, 2016^2^; Whiteford et al., 2018^3^)Cross-cultural similarity between Mexican and U.S participants (1) (though this has yet to be investigated in non-western cultures)Explained by alignment on emotional dimensions, and the effects are highly specific to particular emotions (e.g., happy, sad, angry, calmness etc.) rather than to the generalized affective dimensions like valence and potency. (1–3)Partialling out the variance explained by emotional mediation eliminated the effect of correspondence between modality-specific perceptual features. (3);Tempo–hueFaster music to yellower/warmer color; slower music to bluer/cooler colorMode–saturationMusic in major mode to more saturated color; music in minor mode to less saturated color(Palmer et al., 2013^1^, 2016^2^)Explained by alignment on emotional mediation, and the effects are highly specific to particular emotions (e.g., happy, sad, angry, calmness etc.) rather than to the generalized affective dimensions like valence and potency. (2)Mode–brightnessMusic in major mode to brighter color; music in minor mode to darker colorTempo–brightnessFaster music to brighter color; slower music to darker colorMode–hueMusic in major mode to yellower/warmer color; music in minor mode to bluer/cooler colorTempo–hueFaster music to yellower/warmer color; slower music to bluer/cooler colorHearing & TasteTaste intensity–loudnessMore intense taste to louder sound(Marks, 1988^1^; Wang et al., 2016^2^)Taste concentration – loudness is matched on intensity (1–2)Taste quality–timbreFor example, bitter to brass instruments, sweet to piano(Crisinel & Spence, 2010b, 2011^1^)Taste-timbre are matched on valence (1)Taste quality–pitchSweet, sour, and peppermint tastes to high pitch, umami and bitter tastes to low pitch(Crisinel & Spence, 2009^1^, 2010a^2^, 2010b, 2011; Wang et al., 2016)Observed using names of food items (1–2)Pitch–tactile sizeHigher pitch to smaller tactile size(L. Walker et al., 2012^1^; P. Walker & Smith,1985^2^)Observed using verbal labels (up, down, etc.) (2); Explained by alignment on a set of semantic dimensions (e.g., fast/slow, sharp/blunt, etc.) (1)Hearing & TouchPitch–vibrotactile frequencyHigher pitch to higher frequency of tactile vibration(Ro et al., 2009; Yau et al., 2009)Pitch–tactile heightHigher spatial location to higher pitch(Occelli et al., 2009)Pitch–odorFruity/menthol odor (e.g., lemon, raspberry) to higher pitch, onion odor to low pitch(Crisinel & Spence, 2012; Speed, Croijmans, et al., 2021a, 2021b) Sound texture–tactile texture(Bulusu & Lazar, 2024)Hearing & SmellReal-world odor–soundFor example, cinnamon/orange/clove odor with Christmas carols(Seo et al., 2014)Sight & TasteBrightness–taste intensityMore intense taste (e.g., spicier) to brighter ambient environment(Xu & Labroo, 2014)Saturation–taste intensityMore intense taste to more saturated color(Saluja & Stevenson, 2018^1^; Shermer & Levitan, 2014^2^)Matching between valence of tastant and color (1); Correlation between magnitude of tastant concentration and color saturation (1–2)Hue–taste qualityRed/pink-sweet, blue-salty, yellow-sour/sweet, green/black-bitter, white - tasteless/salty(Saluja & Stevenson, 2018; Wan et al., 2014^1^; Woods & Spence, 2016^2^)Cross-cultural association between bitter and black, salty and white, sour and green, sweet and pink found across China, India, Malaysia, and the U.S. participants (1)Observed using taste/color words (e.g., sweet, yellow) (1–2)Hue–piquenessRed is spicier than blue(Shermer & Levitan, 2014)Angularity–taste qualitySweet/umami to round shapes, sour/salty/bitter/spicy to angular shapes(Blazhenkova & Kumar, 2018^1^; Chuquichambi et al., 2024; Motoki & Velasco, 2021^2^; Seo et al., 2010^3^; Turoman et al., 2018^4^; Velasco et al., 2016^5^; Wan et al., 2014^6^)Explained by alignment on valence and intensity (2); congruent pairs induce higher amplitudes and shorter latencies in the N1 peak of olfactory ERPs, associated with the early sensory processing of stimuli (3)Observed using verbal labels (e.g., sour, sweet) (1, 5–6)*;*Sight & SmellColor–odorFragrant: e.g., floral family to warm colors and fresh family to cool colors; DKNY perfume with saturated orange and yellow, kouros with saturated blue;Natural odors: e.g., cinnamic aldehyde to red; strawberry odor to red; maple syrup odor to brown/orange(de Valk et al., 2017^1^; Deroy et al., 2013; Gilbert et al., 1996; Goubet et al., 2018^2^; Y.-J. Kim, 2013; Demattè et al., 2006^3^; Schifferstein & Tanudjaja, 2004^4^; Speed & Majid, 2018^5^)Explained by alignment on semantic dimensions (e.g., happy/unhappy, wild/lazy, etc.) (4); Observed using verbal labels (e.g., strawberry odors, pink) (3)Odor-color associations differ depending on how odors are linguistically described (1–2,5);Angularity–odorFor example, lemon/pepper to angular shape, raspberry/vanilla to round shape(Hanson-Vaux et al., 2013^1^; Speed, Croijmans, et al., 2021^2^)Angularity and odor are matched on valence and intensity (1)Not observed in children under age 6 years (2)Sight & TouchBrightness–tactile sizeBrighter to smaller(L. Walker et al., 2012^1^; P. Walker & Walker, 2012^2^)Explained by alignment on a set of semantic dimensions (fast-slow, sharp-blunt, etc.) (1–2)Brightness–vibrotactile frequencyDarker to low vibrotactile frequency; brighter to high vibrotactile frequency(Martino & Marks, 2000)Saturation–vibrotactile amplitudeHigher vibration amplitude to higher saturation(T. Yuan et al., 2023^1^)Saturation and vibration matched on intensity (1)Angularity–smoothnessCurved shapes to smooth texture, angular shapes to rough texture(Blazhenkova & Kumar, 2018^1^)Observed using verbal labels (e.g., smooth, rough) (1)Touch & TasteTexture–taste qualitiesFor example, towel to sweet, linen to salty, stainless steel/rougher texture to sour, and cardboard materials to bitter(Pramudya et al., 2020^1^; Slocombe et al., 2016^2^)Explained by alignment on pleasantness (2)Explained by alignment on specific emotional concept (e.g., curious, happy, peaceful, etc.) (1)Touch & SmellTexture–odorsmoothness to lemon/menthol odor, roughness to onion odor(Speed et al., 2021^1^)Associations are more likely to be observed in older age groups (1)Smell & TasteOdor–taste qualitiesFor example, sugar to sweet odor, citric-acid to sour odor(Stevenson et al., 1999; Stevenson & Boakes, 2004)

We use the term cross-conceptual mapping to refer to correspondences between elements from different conceptual domains. A conceptual domain is a set of related concepts. Such groupings of similar concepts are similar to the linguistic notion of “semantic fields” (Akmajian et al., 2001; Brinton, 2000), but while semantic fields tend to be concerned with word meanings, conceptual domains may include not only conventional superordinate categories such as animals and musical instruments, but also more schematically structured domains such as concepts related to life, relationship, and time (Lakoff & Johnson, 1999; Mandler, 1992). For example, a conceptual domain of life encompasses a set of related concepts that include birth, death, reproduction, and growth. Cross-conceptual mappings could be established between concrete domains (e.g., physical space, human body, sensory experience) and more abstract ones (e.g., time, numbers, emotions, valenced concepts, and power). For example, representations of words related to time, like “before” and “after,” or durations like “short” and “long” appear to have a spatial component (see Bender & Beller, 2014; R. Núñez & Cooperrider, 2013, for a review). Evidence for this spatial component comes from phenomena such as the Spatial-Temporal Association of Response Codes (STEARC effect), where people’s responses to past or short-term events are faster with the left hand, and future or long-term events are faster with the right hand (Anelli et al., 2018; Ishihara et al., 2008; Santiago et al., 2007; Vallesi et al., 2008). Similar patterns of spatialization are also seen for numerical concepts as reflected in Spatial-Numerical Association of Response Codes (SNARC effect), where people are faster to respond to relatively small numbers with their left-hand side and faster to relatively large numbers with their right-hand side (see Fischer & Shaki, 2014; Toomarian & Hubbard, 2018, for reviews). Table 2 lists 84 empirical investigations of cross-conceptual mappings, showing for each study the specific domains involved and the observed associations. Table 2. Selected cross-conceptual mappings and the literature, and evidence favoring perceptual/conceptual explanation. The number in parentheses following each evidence corresponds to the superscript in the ‘Studies’ columnDomainsMappingDescriptionStudiesEvidence favoring perceptual explanationEvidence favoring conceptual explanationPower and SpacePowerfulness–verticalityPowerful is spatially upper, powerless is spatially lower(Dahl & Adachi, 2013^1^; Giessner et al., 2011^2^; Giessner & Schubert, 2007^3^; Meier & Dionne, 2009; Niedeggen et al., 2017; Schoel et al., 2014; Schubert, 2005^4^)Observed in chimpanzees (1);Observed using linguistic stimuli (e.g., master, servant, leader, etc., 2–4);Valence and SpaceValence–verticalityPositive is spatially upper, negative is spatially lower(Crawford et al., 2006; Gottwald et al., 2015; Lakens, 2012^1^; Lakens et al., 2012^2^; Lynott & Coventry, 2014^3^; B. Meier et al., 2004^4^; B. P. Meier, Hauser, et al., 2007^5^; B. P. Meier, Sellbom, et al., 2007^6^; B. P. Meier & Fetterman, 2022^7^)Observed using linguistic stimuli (e.g., love, hate, etc., 4–7); The correspondence between +polars are more automatic and the automaticity of this correspondence could be diminished by reversing the frequency of polar during experiment (1–3)Valence–horizontalityPositive things are associated with people’s dominant side of space and negative emotion is associated with their non-dominant side of space; more intense emotion is mapped onto more right side of space(Casasanto, 2009^1^; Casasanto & Chrysikou, 2011^2^; de la Vega et al., 2013^3^; Holmes & Lourenco, 2011; Pitt & Casasanto, 2018^4^)Left-handers associate left-hand side space with positive valence, despite the right-is-good coding in language; (1); By temporarily handicapping one’s dominant hand, participants’ valence-horizontality (association could be reversed (2)Observed using linguistic stimuli (3–4)Valence and BrightnessValence–brightnessGood is bright, bad is dark, and vice versa(B. Meier et al., 2004^1^; B. Meier, Robinson, et al., 2007^2^; Okubo & Ishikawa, 2011^3^; Sherman & Clore, 2009; Song et al., 2012^4^; Xu & Labroo, 2014^5^)Observed using linguistic stimuli (e.g., gentle, devil, etc., 1–5)Time and SpaceEgo/object moving in time–ego/object moving in spacePeople primed by ego/object moving have consistent preference for ego/time moving interpretation of an ambiguous sentence, but primed by ego/time moving does not lead to consistent preference for spatial ego/object moving(Boroditsky, 2000^1^; Boroditsky & Ramscar, 2002^2^)Observed using linguistic stimuli/measure (e.g., ‘The meeting originally scheduled for next Wednesday has been moved forward two days’ – the meeting will be on Monday/Friday?) Primed by ego/object moving have consistent preference for ego/time moving interpretation of an ambiguous sentence. Primed by ego/time moving doesn’t lead to consistent preference for spatial ego/object moving, suggesting an independent conceptualization of time without necessary activation of space domain (1–2)Duration–spatial displacementSpatial displacement–estimates of duration(Athanasopoulos & Bylund, 2023^1^; Boroditsky, 2008^2^; Bottini & Casasanto, 2013^3^; Bylund & Athanasopoulos, 2017^4^; Casasanto & Boroditsky, 2008^5^; Gijssels et al., 2013^6^, 2013^7^; Lourenco & Longo, 2011^8^; Merritt et al., 2010^9^; Srinivasan & Carey, 2010^10^)Observed in preverbal infants and non-human animals (symmetric mapping between space and time: incongruent spatial displacement influence estimates of duration; incongruent duration influence estimates of spatial displacement) (8–10)Asymmetrical relationship between duration and spatial displacement in human adults and children with language ability (3–7); People speaking different languages and bilingual people’s estimation of duration are affected differently by language used in context, which has different spatial metaphors for duration (1,4)temporal progression–spatial directionTime moves along spatial direction such as left-right, up-down, uphill-downhill, east-west(Boroditsky et al., 2011^1^; Fuhrman et al., 2011^2^; Fuhrman & Boroditsky, 2010^3^; R. E. Núñez & Sweetser, 2006^4^; Santiago & Lakens, 2015^5^)People speaking languages with different spatial metaphors of temporal progression have different conceptualization for temporal progression (1–5)Number and Spacenumerical scale–vertical/horizontal lineLarge numbers are associated with bottom/right decision and small numbers are associated with top/left decision(Bulf et al., 2016^1^; de Hevia et al., 2014^2^; Dehaene et al., 1993; Drucker & Brannon, 2014^3^; Fischer & Shaki, 2014; Ito & Hatta, 2004^4^; Rugani et al., 2015; Santiago & Lakens, 2015; Shaki & Fischer, 2008^5^; Zohar-Shai et al., 2017^6^)Observed in nonlinguistic infants (2) and nonhuman animals (1,3)Whether people show left/right or top/bottom association of number depends on the writing system they use (4–6)ordinality of numbers–letters to temporal progressionPeople think about ascending numbers/letters have consistent preference for ego moving interpretation of an ambiguous sentence(Matlock et al., 2011^1^)Observed using linguistic stimuli/measure (e.g., ‘The meeting originally scheduled for next Wednesday has been moved forward two days’ – the meeting will be on Monday/Friday) (1)Morality and Physical CleannessMorality–purityMoral contamination is mapped to physical contamination(Lee & Schwarz, 2010^1^; Schnall et al., 2008^2^; Stellar & Willer, 2014^3^; Zhong & Liljenquist, 2006)Observed using linguistic stimuli/measure(e.g., lying, moral judgment, read a story, 1–3)Affection and WarmthEmotion/social proximity–temperatureExperiencing physical warmth/coldness induce social affiliation/loneliness and vice versa(Blumberg et al., 1992^1^; Harlow, 1958^2^; IJzerman & Semin, 2009^3^; Inagaki & Eisenberger, 2013^4^; Schilder et al., 2014^5^; Williams & Bargh, 2008a; Zhong & Leonardelli, 2008)Observed in nonhuman animals (1–2); social warmth and physical warmth share neural mechanism associated with warmth and rewarding outcomes (4)Observed using linguistic stimuli/measure (e.g., describe feeling with language) (3–5)Emotional distance and Physical distanceEmotion/social proximity–physical proximityJudgments of the strength of their emotional attachments to important aspects of their social world are enhanced/reduced given large/small physical-distance cues(Williams & Bargh, 2008b^1^)Observed using linguistic stimuli/measure (e.g., read a story and rate how much they like it) (1)Importance and WeightSocial importance–physical weightJudgment of importance are higher when accompanied by larger physical weight(Jostmann et al., 2009^1^)Observed using linguistic stimuli/measure (e.g., read about a public issue and rate how much people think their voice matter (1))A mix of conceptual domainsPredicate metaphorsA motor verb is used to act on intangible things(Boulenger et al., 2009^1^; Desai et al., 2013^2^; Wilson & Gibbs, 2007^3^)Observed neural activation in motor cortex when reading the predicate metaphors (2)Observed using linguistic stimuli (e.g., *grasp,1–3)*Adjective metaphorsAn adjective initially used to describe one domain is used to describe another domain(Citron & Goldberg, 2014^1^; Lacey et al., 2012^2^)Observed neural activation in sensory cortex when reading the adjective metaphors (1–2)Nominal metaphorsA is B while A and B are from disparate semantic domains(Al-Azary & Katz, 2021^1^; Liu & Lupyan, 2023^2^; Tourangeau & Sternberg, 1981^3^, (Tourangeau, et al., 1982)^4^; Zhu et al., 2024^5^)When metaphors are novel, it activates the embodied feature of the source domain (e.g., after reading my lawyer is a shark, people’s response to embodied feature of shark, e.g., bite may be boosted due to embodied simulation) (1)Observed using linguistic stimuli (1–5); Explained by alignment on a set of semantic dimensions (fast-slow, sharp-blunt, etc.) (2–4)Body-related metaphorsThere’s non-arbitrary connection between body parts and symbolic meanings across cultures(Holmes et al., 2018^1^)e.g., ‘idea’ in Chinese can be literally translated as ‘heart cut’, and non-Chinese speakers have non arbitrary intuitions about these compound words (1)Observed using linguistic stimuli (1)Discourse-level metaphorDiscourse-level metaphorical framing impacts people’s perception of the target domain.(Flusberg et al., 2018^1^; Hendricks et al., 2018^2^; Thibodeau & Boroditsky, 2011^3^)Observed using linguistic stimuli (e.g., read a story about crime in a city, 1–3)Relational mappingsPeople map across two patterns based on relational structure rather than object attributes(Casasola, 2005^1^; Christie & Gentner, 2014^2^; Gentner et al., 2013^3^; Loewenstein & Gentner, 2005^4^; Simms & Gentner, 2019^5^)Relational language facilitates abstraction of relational information (1–5)

To better isolate generalizable mechanisms, we will focus on studies conducted on healthy, neurotypical individuals, omitting work on individuals with neurological impairments (e.g., focal lesion: Cardillo et al., 2018; neurodegeneration: Klooster et al., 2020; Parkinson: Monetta & Pell, 2007; Alzheimer: Papagno, 2001) and traits like synesthesia (e.g., Hubbard et al., 2005; Ramachandran & Brang, 2008; Ramachandran & Hubbard, 2001). We also omit work focusing on iconicity and sound symbolism (Delgado et al., 2020; Emmorey, 2014; McCormick et al., 2021; Ozturk et al., 2013; Strickland et al., 2015, 2017) despite the likely convergence between these phenomena and those underlying cross-sensory/cross-conceptual mappings (Perniss et al., 2010; Sidhu & Pexman, 2018; Westbury et al., 2018).

Overview of cross-sensory and cross-conceptual mappings

Experimental paradigms for studying cross-domain mappings

The most direct approach to probe cross-domain mappings is to have participants evaluate the compatibility of specific pairings. For instance, researchers might ask participants to assess whether a high pitch better goes with a bright or a dim light. This can be done explicitly by using a matching task: present participants with stimuli from one domain and then ask them to choose which of several options from another domain is best aligned (e.g., Wang & Spence, 2017). Another task is to present participants with a perceptual stimulus (a color, a shape, a piece of music, etc.), or a word and ask them to rate it on a series of scales anchored by pairs of opposites (e.g., good/bad, high/low, slow/fast; i.e., the so-called semantic differential technique; Osgood et al., 1957). This allows researchers to evaluate the overall similarity between items from different sensory or conceptual domains within a common space (e.g., Liu & Lupyan, 2023; L. Walker et al., 2012; Whiteford et al., 2018). Figure 1 shows some of these tasks.Fig. 1. Schematic illustration of tasks for evaluating people’s judgment of compatibility between cross-domain items. (a) A matching task for cross-sensory (in this case, pitch-lightness) mapping. (b) A matching task for cross-conceptual (e.g., valence-space) mapping, participants are asked to choose whether the alien on the left-hand or right-hand side is more positive/negative. (c)-(d) Participants indicate how well they think two stimuli go with each other using a Likert scale (e)-(f) Participants rate a range of cross-sensory (e.g., taste and shape) or cross-conceptual (animals and jobs) on a set of dimensions. (Color figure online)

However, these tasks are inherently subjective and open-ended, limiting the conclusions one can draw about the cognitive and perceptual consequences of a given mapping. An alternative approach is to establish an objective ground truth and observe how manipulating stimuli from one domain impacts participants’ experiences in another domain. For example, in a study by Shermer and Levitan (2014), participants rated the spiciness of salsas that varied in color and piquancy. The results showed that spiciness ratings were influenced by color: darker salsa was rated as spicier than lighter salsa despite having the same objective level of spice. Similarly, people estimated the same ambient temperature to be lower after recalling a socially exclusionary experience compared to recalling a socially inclusive experience (Zhong & Leonardelli, 2008; Fig. 2).Fig. 2. Manipulating stimuli from one domain impacts participants’ experiences in another domain. (a) Participants rated the darker salsa as spicier. (b) Participants estimated the same ambient temperature as lower after recalling a socially exclusionary experience compared to recalling a socially inclusive experience. (Color figure online)

Another layer of subjectivity in such tasks is response bias. For example, participants may alter their response patterns to be more in line with what they believe the experimenter expects (Firestone & Scholl, 2014; Zizzo, 2010). Consequently, it remains unclear whether any consistent mappings revealed in the above ways are the result of representational overlap across domains or participants simply responding in the way they think they are expected to respond. Alternatively, researchers have used tasks that are designed to elicit more automatic, less consciously controlled responses. One such task is speeded discrimination, where participants need to classify stimulus features from one sensory domain on a directional scale (e.g., small-large, low-high) as fast as possible, while there’s a completely task-irrelevant stimulus in another sensory domain present that has either congruent or incongruent cross-sensory-domain directional features (e.g., Marks, 1987; Schubert, 2005). If congruent stimuli lead to faster, more accurate responses (i.e., congruence effect), it lends support to the automaticity of the cross-domain mapping (Fig. 3a–b). Other paradigms intended to test such automatic responses include speeded detection (e.g., Chiou & Rich, 2012, Fig. 3c) and the Implicit Association Test (IAT, e.g., Anikin & Johansson, 2019; Fig. 3d–e). Memory tasks offer another approach to measure the objectivity of cross-domain mappings, as the participants’ ability to accurately recall or reproduce information is not likely to be consciously controlled (Casasanto & Boroditsky, 2008; Crawford et al., 2006; Fig. 3f–g). For example, to investigate the mapping between time duration and spatial length, participants were shown computer-generated animations of a line growing over time and were asked to reproduce the duration. The length of the lines was irrelevant to the task of duration estimation. However, people could not disregard incompatible spatial information when reproducing time durations, as they tended to overestimate time when the line was long and underestimate it when the line was short (Casasanto & Boroditsky, 2008; see also Bylund & Athanasopoulos, 2017, for an analogous task probing duration-size mapping, where participants were asked to reproduce duration after seeing an animation of a container being filled gradually with liquid).Fig. 3**(a)** People are faster to classify a visual stimulus on the up/down side of the screen when hearing higher/lower pitch. (b) People are faster to detect visual targets in upper (or lower) spatial locations while hearing a higher (or lower) pitch sound. (c) People are faster to classify which group is more powerful when the more powerful group is displayed on the vertical position on the screen. (d)-(e) People match cross-sensory stimuli (e.g., sound and colors) or cross-conceptual stimuli (e.g., verticality-related words and emotional/rational words) with two response keys, and only one stimulus is present at a time. If the congruent stimuli (e.g., loud sound and dark color, up-related words and rational words) match the same key, they respond faster and more accurately than when they are incongruent. (f) Participants were shown computer-generated animations of a line growing over time and were asked to reproduce the duration or reproduce the time. They tended to overestimate time when the line was long and underestimate it when the line was short. (g) Participants were shown valenced pictures on different vertical locations, afterwards they recalled positive images as appearing in higher locations relative to negative images, reflecting a “GOOD is UP” mapping. (Color figure online)

What makes cross-domain mappings possible?

Evidence of cross-domain mappings goes beyond tasks requiring simply matching stimuli across domains, extending to tasks thought to tap into more automatic processes, with tell-tale signs of facilitation and impairments in accuracy and RTs. But why?

Spence (2011) discusses three mechanisms underlying cross-sensory mappings. Structural correspondence results from common neural coding across modalities. For example, loud sounds may map onto bright lights because both are coded through increases in neural firing rate. Statistical correspondence results from repeated co-occurrences (e.g., a mapping between small size and high pitch can arise from experiences associated with hearing objects fall: small objects ping; large objects thud). Linguistic correspondence arises from the use of the same words or phrases across domains, such as when “high” and “low” are used for both height and pitch. Importantly, these mechanisms are not mutually exclusive. For example, the cross-sensory mapping between pitch and elevation can be explained by both statistical correspondence and linguistic correspondence. More recent works introduced other explanations such as valence matching—where sensations are aligned based on the similarity of their emotional appeal or repulsion (Motoki et al., 2022; Saluja & Stevenson, 2018), emotion mediation—which aligns sensations based on shared emotion content (Palmer et al., 2013; Salgado Montejo et al., 2015; Schifferstein & Tanudjaja, 2004), and semantic mediation—which align sensations based on shared semantic meanings (Velasco et al., 2016; P. Walker, 2012, 2016)

Similarly, several mechanisms have been proposed to explain cross-conceptual mappings. Conceptual metaphor theory (Lakoff & Johnson, 1980a, 1980b) suggests that we understand abstract concepts via concrete experiences through embodied simulation. A theory of magnitude (ATOM; Walsh, 2003) suggests there is a domain-general magnitude system responsible for processing different quantities and enabling comprehension of ‘more’ or ‘less’ across various domains. The semantic mediation account (Dolscheid et al., 2013; Lakens, 2012; Liu & Lupyan, 2023) suggests that the similarities in the way things are represented in an abstract semantic space, lexicalized, or shaped by cultural-linguistic factors such as linguistic markedness, may systematically connect two conceptual domains. Again, these mechanisms are not mutually exclusive; the same phenomenon, such as mapping between space and time, could be explained by experiential correlations between space and time, the domain-general magnitude system, and the linguistic mechanism of using spatial terms to describe time.

It is already evident that the proposed mechanisms of how people make cross-sensory and cross-conceptual associations are very similar. In the following section, we sought to organize these mechanisms into four categories: first, cross-domain mappings emerge from statistical learning from the natural environment; second, cross-domain mappings are matched by magnitude; third, cross-domain mappings are matched by valence; fourth, cross-domain mappings are mediated by semantics. As we will argue, these non-mutually exclusive mechanisms provide a robust framework for understanding how humans make reliable mappings across domains.

Common mechanisms of cross-sensory and cross-conceptual mappings

Statistical learning from the natural environment

Our experiences of the world around us are inherently multisensory. For example, when we eat, we not only taste the food but also perceive its smell, texture, temperature, and appearance. The frequency with which two stimuli occur together can strengthen the activated neural connections, leading to the formation of cross-sensory mappings, a core principle of learning through association, often summarized by the phrase “cells that fire together, wire together” (Hebb, 1949).

Some universal (or near-universal) cross-conceptual mappings are believed to be grounded in such experiential correlations. One example is the mapping between time and space as reflected in both language and thought (a “long walk” takes more time than a “short” walk, while in general, traversing a longer distance; Casasanto & Boroditsky, 2008). Observing natural phenomena, such as the sun’s journey from east to west and the progression from dawn to dusk also accompanies visual-spatial changes in our surroundings with the passage of time (Boroditsky & Gaby, 2010). Specifically, statistical learning from the natural environment explains cross-domain mappings found widely in various species. Macaque monkeys, for instance, associate vocal tract resonances with visual size, matching large-sounding coos to larger faces and small-sounding coos to smaller faces (Ghazanfar et al., 2007).

The role of statistical learning in shaping cross-domain mappings might be underestimated. Studies that identify consistent cross-domain mappings often disregard statistical learning as an explanation, citing the absence of corresponding real-world regularities. For example, statistical learning was ruled out as an explanation for brightness–pitch mappings because bright objects do not inherently produce higher-pitched sounds than dark objects (Ludwig et al., 2011). However, such associations do occur. For example, alarms that pair loud high-pitch sounds with bright flashing lights. Even if such first-order associations were entirely absent from a learner’s experience, statistical learning could be based on higher-order associations (possibly first described by William James, 1890, as the law of dissociation by varying concomitants; see also Fiser & Aslin, 2002a, 2002b; Tighe & Tighe, 1966). Brightness and high pitch can become conjoined if both occur in similar contexts (e.g., both are used as effective attention-getters even if never at the same time). Furthermore, crossmodal correspondences can be transitive. Thus, the scope of statistical learning in shaping cross-domain mappings likely includes both first-order and higher-order associations, though the exact contribution of different orders remains to be empirically investigated.

Statistical learning, however, may not always be an adequate explanation of cross-domain mappings. First, some cross-domain mappings are observed even in neonates who have extremely limited experience with the relevant first- or higher-order associations. For instance, neonates aged 0–3 days showed prolonged attention to simultaneous increases (or decreases) in spatial extent, duration, or numerical quantity, but not when these dimensions varied in opposite directions (one increased while the other decreased; de Hevia et al., 2014). Another challenge to statistical learning as an explanation is structural cross-domain mappings such as understanding the structure of an atom by drawing insight from the structures of a blueberry muffin, a solar system, or a cloud. In these cases, cross-domain mappings might require processes like analogical reasoning, which transfer relational structures from one domain to another (Gentner, 1983).

Magnitude matching

Another explanation for cross-domain mappings is magnitude-matching: matching more to more, and less to less. One example is Walsh’s (2003) ATOM, which seeks to explain how people represent fundamental domains of experience such as length, area, numerosity, temporal duration through a generalized magnitude system. ATOM is supported by observed behavioral interference between cross-domain magnitude dimensions (Bueti & Walsh, 2009; Lindemann et al., 2008). Interference and congruence effect between different magnitudes is found to be modulated by the activity in the parietal lobe—the neuronal substrate proposed for domain-general magnitude processing (Belin et al., 1998; Bueti & Walsh, 2009; Cohen Kadosh & Walsh, 2009).

The generalized magnitude system is proposed to be involved not only when matching along properties such as size, distance, and duration, but also the intensity of various stimuli (Cohen Kadosh et al., 2008; Hartmann & Mast, 2017; Lindemann et al., 2008; Pinel et al., 2004; Vierck & Kiesel, 2010). The relationship between intensity of stimulus and magnitude of sensation was explored by S. S. Stevens (1957), who proposed two types of sensory continua. Prothetic continua differ in quantity, while metathetic continua differ in quality. For instance, the emotions of happiness and ecstasy differ in intensity (prothetic), whereas happiness and sadness differ in type (metathetic). Stevens demonstrated that the relationship between stimulus intensity and sensation magnitude for prothetic dimensions can be mathematically expressed using what is generally known as Stevens’ power law. Magnitude matching therefore also explains cross-domain mapping with similar levels of arousal or excitement. For example, loud sounds and bright colors have a similarly high level of excitation of their sensory domains (Marks, 1974, 1987; J. C. Stevens & Marks, 1965); auditory loudness maps onto the intensity of a gustatory stimulus (Smith & Sera, 1992), color saturation is mapped onto tastant concentration (Saluja & Stevenson, 2018; Shermer & Levitan, 2014).

Magnitude matching has been proposed as one way to explain cross-domain mappings evident in the earliest days of life. It is nonverbal and operational from birth or early infancy, before the development of higher-level conceptual abilities (de Hevia et al., 2014; Lewkowicz & Turkewitz, 1980; Mondloch & Maurer, 2004; P. Walker et al., 2014) and is a shared trait between humans and animals, with non-human primates and birds also demonstrating sensitivity to analogous magnitude relations (Adachi, 2014; Drucker & Brannon, 2014; Ghazanfar & Maier, 2009; Ludwig et al., 2011; Merritt et al., 2010; Rugani et al., 2010, 2015). However, magnitude matching struggles to fully explain mappings between more complex stimuli. This includes mappings involving metathetic dimensions without clear magnitude arrangement (e.g., hue and taste). Similarly, when it comes to stimuli varying along multiple dimensions, such as matching music with colors, only a fraction of dimensions—like loudness of sound or color saturation are magnitude-relevant. Additionally, magnitude matching alone doesn’t easily account for mappings involving abstract symbols like numbers, which lack inherent perceptual hierarchies of more or less.

Valence matching

Evaluating valence is a basic ability already present in infancy (Quinn et al., 2011; Ruba et al., 2019; Steiner et al., 2001) and observed widely across the animal kingdom (Berridge & Kringelbach, 2015), suggesting that it serves an important role in helping organisms assess potential threats. Interestingly, people automatically evaluate the valence not only of clearly valenced stimuli, such as guns or roses, but also of objects often considered neutral in everyday contexts, such as mugs or ketchup (Bradley & Lang, 1999; Lebrecht et al., 2012). Valence matching is proposed to explain cross-domain stimuli based on a common evaluative or hedonic system, which is underpinned by neural activity not only in multimodal regions known for processing emotion content, such as the amygdala and orbitofrontal cortex (LeDoux, 2003), but also within the primary sensorimotor cortices (Bestelmeyer et al., 2017; Gao & Shinkareva, 2021; Miskovic & Anderson, 2018; Sievers et al., 2013, 2021).

Cross-domain mappings based on valence matching—"good” stimuli in one domain go with “good” stimuli in another domain—have been proposed to explain consistency in mapping tastes to music genres (e.g., Motoki et al., 2022), timbres (e.g., Crisinel & Spence, 2010b), colors (e.g., Saluja & Stevenson, 2018), and pitches (e.g., Wang et al., 2016), as well as associating odors with colors (e.g., Y.-J. Kim, 2008). For example, the association between sweetness and the color pink may arise from both being linked to pleasantness whereas bitterness and blackness might both be linked with negative valence (Palmer & Schloss, 2010; Spence & Levitan, 2021).

Beyond cross-sensory mappings, valence matching also provides insights into cross-conceptual mappings, particularly when perceptions of good or bad in more intangible domains align with the valence of more concrete domains. One example is the conceptual metaphorical frame AFFECTION IS WARMTH which is rooted in the sense of touch/temperature, giving rise to numerous metaphorical expressions, such as “a warm hug” or “a cold-hearted person.” Some research indicates that social and physical warmth may indeed activate similar neural pathways associated with reward and pleasure (Inagaki & Eisenberger, 2013).1 Processing fluency, particularly motor fluency, serves as another key mechanism in forming valence associations (Reber et al., 1998; Winkielman & Cacioppo, 2001). An example is the association between motor fluency and the valence of horizontal spatial orientation, where right-side dominance in most individuals leads to a positive association with the right side. This bias is visible not only in language, as in the term “right-hand man,” but also in non-linguistic behaviors, with right-handers associating positively valenced concepts with rightward space and negatively valenced concepts with leftward space, a pattern that is reversed in left-handers (Casasanto, 2009). In turn, valence can also activate motor fluency simulation and bias perceptual judgment. For example, the presentation of positive words before a task requiring people to bisect vertical lines results in people bisecting lines biasing towards their dominant side (Milhau et al., 2017).

Semantic mediation

Statistical learning from natural environments, magnitude matching, and valence matching often bypass the need for a deeper analysis of similarities in cross-domain mappings. However, in what way is time like money, jealousy like green, or thinking as a storm in the brain? We propose these mappings require mapping based on shared meanings and structures, a mechanism we termed as semantic mediation. In the sections that follow, we’ll explore three ways semantic mediation enables or moderate cross-domain mappings.

Semantic dimensions scaffold structural mappings

When faced with stimuli that lack overt similarities—a common challenge to cross-domain mappings—people can still align domains based on their common structure. At its core, structural mapping is about alignment on common dimensions across divergent sensory or conceptual domains. Relational language has been argued to be an important element in establishing some cross-domain mappings (Christie et al., 2007; Gentner, 2010; Gentner & Christie, 2010). Relational terms (words that describe relationships between things, like “over,” “under,” “between,” etc.) can facilitate the process of finding structural similarities between different domains. For example, children benefit from instructions that use these relational terms to guide alignment, aiding them in discerning the relational correspondences between two distinct domains (Christie & Gentner, 2014; L. Yuan et al., 2017).

Beyond facilitating explicit relational alignment, semantic dimensions also facilitate implicit structural mapping by aligning elements within a shared semantic space. The semantic coding hypothesis proposed by Martino and Marks (1999), holds that during cross-sensory mapping, experiences from different modalities are transformed from their sensory representations into more abstract, semantic codes. These codes are accessible to both our perceptual and conceptual systems allowing for semantic alignment across domains. In line with this hypothesis, linguistic cues, much like perceptual cues, are potent drivers of cross-sensory congruence. For example, words related to lightness, like black vs. white or semantically related words such as day vs. night can trigger congruence effects with high or low-pitched sounds in ways that parallel congruence effects involving pairs of nonlinguistic stimuli (Martino & Marks, 1999; using a similar paradigm shown in Fig. 3a. Word stimuli were used instead of a dot).

Semantic differentials, first introduced by Osgood in 1957, are a valuable tool for examining how semantic dimensions mediate cross-domain mappings. This method asks participants to rate stimuli based on scales anchored by polar adjectives, like “loud-quiet” or “small-large” (as shown in Fig. 1e–f). In pioneering work by Karwoski et al. (1942) demonstrated that even basic perceptual features such as visual brightness or auditory pitch, possess rich, domain-general conceptual connotations. Subsequent studies by L. Walker et al. (2012) attribute consistent cross-sensory mappings to the cross-activation between dimensions of shared connotative meaning. Similarly, when matching sensory stimuli like music with colors, participants chose colors that matched the music based on semantic meaning, such as whether the music and colors were complex or simple, lively or dreary (Palmer et al., 2013; Whiteford et al., 2018). Similarly, cross-conceptual mappings could be influenced by their conceptual alignments on shared abstract dimensions. For example, when asked “If a flute were a job, what job would it be?”, there was a surprising degree of consensus in people’s responses: 20% answered ‘teacher’, significantly above the baseline probability of 7% for listing ‘teacher’ as a type of job. These kinds of mappings between concepts from disparate semantic domains like animal, job, and musical instruments were best accounted for when using alignment on semantic dimensions such as speed, valence, and genderedness as predictors of similarity (Liu & Lupyan, 2023).

Lexical mediation

Languages abound with phrases like “loud colors,” “sweet sound,” and “high pitch” (Winter, 2016). This type of language is so ubiquitous that we often overlook its metaphorical roots. One possibility suggests that these expressions merely label pre-existing associations between different sensory domains and are not necessary for identifying the cross-domain similarity. For instance, the Kreung people of a remote tribe in northeastern Cambodia, whose language does not use spatial language for pitch, still associate pitch with elevation (Parkinson et al., 2012).

However, the view that metaphors simply reflect pre-existing associations is challenged by evidence that the strength of cross-domain mappings can be influenced by their frequency of use in a specific language (Casasanto et al., 2003; Fernandez-Prieto et al., 2017; Holler et al., 2022). For example, Dutch and English use “high” and “low” to describe pitch, while Farsi uses “thin” and “thick.” Although height-pitch and thickness-pitch correspondences are found in prelinguistic infants (Dolscheid et al., 2012; P. Walker et al., 2010), Dutch speakers incorporate irrelevant height information and ignore irrelevant thickness information when estimating pitch, whereas Farsi speakers incorporate irrelevant thickness information and ignore irrelevant height information (refer to Fig. 2a for a similar paradigm). Dutch speakers, after being trained to linguistically describe pitch as thick/thin, have demonstrated nonlinguistic thickness–pitch mappings similar to Farsi speakers (Dolscheid et al., 2013), suggesting language can play a causal role in shaping nonlinguistic mental representations of pitch. One explanation is that shared labels across domains may invite speakers to align sensory representations, drawing out similarities through structural alignment (Christie & Gentner, 2010; Gentner, 1983) in a way that gradually becomes consistent with their language.

In a similar vein, cross-conceptual mappings may benefit from the reuse of words. In the same way that we perceive and express the relationship between height and pitch or temperature and color, we may use similar metaphoric structures to understand abstract concepts. For example, we often employ spatial metaphors to describe power dynamics (“higher status”; “lower class”) and time (“looking forward to the future”; “leaving the past behind”). The habitual use of these metaphorical expressions could shape our conceptual frameworks, aligning them more closely with the linguistic patterns present in our language. For example, Swedish spatializes time in terms of length (long/short), while Spanish spatializes time in terms of amount (much/small-bit), a difference reflected in how much interference was created by spatial cues consisting of length vs. amount. Crucially, Swedish-Spanish bilinguals show varied interference effects depending on language context, indicating that the representation of time may be flexible and depend on the linguistic framework employed at the moment (Bylund & Athanasopoulos, 2017).

In addition to inviting speakers to align representations across domains in line with how they are coded in the language, lexical mediation may also influence the directionality of cross-domain mappings. For example, bidirectional mappings between time and space are observed in neonates (de Hevia et al., 2014; Lourenco & Longo, 2010) and some non-human primates (Merritt et al., 2010), suggesting an intuitive ability to associate duration with distance without the influence of language. However, in human adults, these mappings become asymmetric. For example, incongruent spatial information interferes with time estimation, but not vice versa (Casasanto & Boroditsky, 2008). This linguistic asymmetry is also evident in how metaphors describe time in terms of space (e.g., long/short time) more frequently than the reverse. The discrepancy in language use between human adults and neonates/non-human primates suggests that the asymmetry in linguistic metaphors could be a candidate causing this cognitive asymmetry. Nonetheless, the causal relationship between linguistic asymmetry and cognitive asymmetry in cross-domain mappings remains to be more rigorously investigated through empirical studies.

Linguistic markedness

Evidence for cross-domain mappings often involves correspondence between binary dimensions such as high/low, small/large, and good/bad. When individuals are explicitly asked to map between these binary options, they tend to generate parallel association for both poles. For instance, a higher pitch tends to be paired with brighter colors, whereas a lower pitch is paired with darker colors. However, when the automaticity of associations is tested, such as with speeded discrimination tasks, the nonparallel correspondence of dimensions emerges. For example, while people are faster and more accurate in recognizing high-pitched tones emitted from high spatial locations, low-pitched tones were not more quickly recognized when emitted from lower spatial locations (Bonetti & Costa, 2018). Likewise, there is a congruence effect for positive valence and higher spatial positions, but this effect is less pronounced or absent between negative valence and lower positions (Huang & Tse, 2015; Lakens, 2012; Lynott & Coventry, 2014). The nonparallel dimensional interaction is sought to be explained by semantic mediation via linguistic markedness.

Linguistic markedness refers to an asymmetrical relationship between elements where one element is considered more basic or default (unmarked) while the other is more specialized or derivative (marked). This concept applies across linguistic domains—for example, oral (unmarked) vs. nasal vowels in phonology, singular (unmarked) vs. plural nouns in morphology, and active (unmarked) vs. passive voice in syntax. Our focus here is on semantic or conceptual markedness, which applies to scalar adjectives and evaluative concepts. In pairs like tall/short, big/small, good/bad, and high/low, the first term is typically unmarked—serving as the default end of a dimension—while the second is marked, indicating contrast, limitation, or absence.

Everyday language reflects this asymmetry: asking how tall someone is does not imply that they are tall while asking how short someone is does imply shortness. Markedness is also used to explain why negation applies to one end but not the other, e.g., happy/unhappy, but sad/unsad.* It is suggested that the default, unmarked pole has a processing advantage over the marked pole (Clark, 1969; Clark & Brownell, 1975; Hommel et al., 2001; Seymour, 1974). When the unmarked pole of conceptual meaning (e.g., positive valence) and the unmarked pole of perceptual features (e.g., high vertical location) of a stimulus overlap, people’s processing is boosted. There’s no processing benefit otherwise, i.e., when the two marked poles overlap (e.g., negative valence and low vertical location), or having the unmarked pole corresponds to the marked pole (e.g., negative valence and high vertical location). To further test how linguistic markedness could be manipulated and affect cross-domain mappings, Lakens et al. (2012) increased the salience of negative words by making them more frequent, turning the negative end of the valence dimension into the default or unmarked pole. The results showed that increasing the frequency of negative words did indeed eliminate the congruence effect for positive words and high vertical locations.2

Appeals to linguistic markedness do not, however, explain cross-domain mappings involving metathetic dimensions that typically do not have linguistically marked and unmarked poles. For example, metathetic dimensions like sweet, sour, and bitter don’t fit into a marked/unmarked dichotomy and argument of linguistic markedness does not apply. In addition, linguistic markedness is not sufficient to elicit cross-domain congruence effects on its own. For example, when experiments used spatial schemas other than high/low (e.g., front/back, big/small), there was no congruence effect arising between space and pitch, despite the overlap of unmarked ends (Dolscheid & Casasanto, 2015).

To summarize, semantics can mediate cross-domain mappings in several ways. First, semantic dimensions can serve as a scaffold, providing structure to abstract alignment. Second, using the same words across domains (e.g., “long” for both time and distance) can help establish or reinforce these mappings. Lastly, the structural overlap in polarities could modify the automaticity of cross-domain mappings.

Cross-domain mappings are both perceptual and conceptual

On hearing a melody, one might perceive its pitch, loudness, timbre, and tempo. The same melody may also evoke the sensation of a gentle breeze or the image of a serene moonlight—none of which are directly perceived. Similarly, when observing colors, attributes like saturation, hue, and brightness are immediately discernible. Yet colors also bear deeper conceptual associations; a particular shade of deep blue may conjure feelings and images associated with the expanse of the night sky. Are cross-domain mappings such as these experienced from aligning sensations like timbre and saturation, or are they conceived more in the abstract manner of associating melody and color similarly to the ambiance of a moonlit night?

The distinction between perception and conception is contentious. Some have argued that perception is “cognitively impenetrable”—and is strictly concerned with modal attributes: colors, shapes, smells, vibration. These perceptual inputs then feed into higher-level systems for further processing that is outside of perception proper (Fodor, 1983; Pylyshyn, 1999; Tye, 1995). Others (including some critics of cognitive penetrability) have argued that perceptual experience also encompasses higher-level content: seeing an object can include recognizing its category (Bayne, 2009), we can perceive causal relations (Kominsky & Scholl, 2020; Rolfs et al., 2013) as well as social and physical ones (Hafri & Firestone, 2021). We take the stance that there is a continuum from perceptual to higher-level conceptual processing with considerable interaction between lower-level and higher-level processes (Goldstone & Barsalou, 1998; Lupyan, 2015, 2017; Quinn & Eimas, 1997).

Some cross-domain mappings align more closely with the perceptual end of the continuum, and display rapid and automatic associations that are phenomenologically stubborn even when contradicted by explicit knowledge. A classic example is the McGurk effect, where an individual’s speech perception is markedly altered by seeing a speaker’s mouth make a different sound (e.g., seeing a mouth utter/ga/while hearing/ba/causes us to perceive a/da/; McGgurk & Macdonald, 1976). This effect shows how visual and auditory cues are rapidly integrated to form a coherent perceptual experience, often leading to a different interpretation than when auditory and visual cues are experienced in isolation. Similar phenomena include the ventriloquism effect, where visual cues influence the perceived location of sounds (Parise & Spence, 2008, 2009; usually explained by statistical learning), the interference of spatial information with numerical estimation (Dormal & Pesenti, 2007; usually explained by magnitude matching), and the alteration of taste experience by visual pleasantness (Ohla et al., 2012; usually explained by valence matching). These phenomena are not restricted to human adults; they have been observed in human infants and across animal species, suggesting the ability to perceptually integrate information from multiple sensory domains may have deep evolutionary roots. (For more evidence favoring different perceptual-level explanations, see Table 1 and Table 2.)

In contrast, some mappings are not directly derived from sensory input but are constructed through abstract reasoning. For example, metaphoric expressions like “Juliet is the sun” require abilities to draw meaningful parallels between two disparate domains that transcend perceptual correspondence (i.e., Juliet does not emit infrared radiation or have things orbiting around her). The partial projection between conceptual spaces is extensively discussed in Turner and Fauconnier’s theory of conceptual integration (Fauconnier & Turner, 1998, 2008). Another example is people use cross-domain mappings when trying to understand the model of an atom through an analogy between planetary systems and electrons orbiting around the nucleus, or when trying to understand an increase in crime through analogy to rampaging animals or a spreading virus (Thibodeau & Boroditsky, 2011). Unlike perceptual mechanisms, which support cross-domain mappings that are more consistent, stable, and universal, conceptual mechanisms support mappings that are more dynamic and variable, adapting to different frames in culture, language, and personal experience. As a result, people become more sensitive to cross-domain mappings encoded in their language (Dolscheid et al., 2013). While statistical learning, magnitude matching, and valence matching are primarily proposed to explain lower-level perceptual mechanisms,3 semantic mediation is primarily proposed to explain higher-level conceptual mechanisms. (For more evidence favoring conceptual-level explanations, see Table 1 and Table 2.)

Relatively lower-level mechanisms based on perceptual integration yield forms of cross-domain mappings similar to those observable in neonates or non-human animals, while higher-level mechanisms bring cross-domain mappings to their full-blown forms as observed in human adults. Through two case studies, we look into how cross-domain mappings we encounter in the real world are sculpted by both lower and higher-level mechanisms.

Cross-sensory mappings are more conceptual than you think: the case of emotion-mediated mappings

One explanation for explaining the patterns of associations between stimuli like colors, shapes, music, and odors is emotion mediation (Arnheim, 1986; Levitan et al., 2014; Palmer et al., 2013; Whiteford et al., 2018). On this view, particular types of music and colors are associated to the extent that both evoke similar emotions. For example, happy music is aligned with happy colors (i.e., colors that evoke joy). But are such emotion-mediated mappings best thought of as being more on the perceptual or conceptual end?

Emotions have been argued to be perceivable entities by some researchers and conceptual constructs by others, representing two distinct theoretical traditions in understanding emotional processing. On the one hand, infants and animals—from rodents to primates—can differentiate emotional cues in facial/body expressions and vocalizations (e.g., Briefer, 2012; Grossmann, 2010). Cross-cultural studies have demonstrated that some basic emotions like happiness, sadness, anger, fear, disgust, and surprise are recognized with above-chance accuracy across diverse cultures (Ekman & Friesen, 1971; Elfenbein & Ambady, 2002). Such evidence has been used in support of emotions being rooted in some universal, perceivable components. On the other hand, it has also been shown that emotion recognition is partly mediated by language (Lindquist et al., 2006; Souter et al., 2021). People who speak different languages show some differences in emotion perception (Gendron et al., 2014). Such findings led researchers to argue for emotion categories being conceptual constructs rather than a “readout” of perceptual categories. To reconcile these two views, we can consider emotions as arising from varying activation levels along core affective dimensions like valence (an evaluative continuum) and arousal (a magnitude continuum; Barrett & Satpute, 2013; Lindquist et al., 2013). Meanwhile, language helps to abstract diverse physiological and behavioral patterns into more discrete emotion categories at higher conceptual levels. Discrete emotions, mediated by emotion concepts, likely involve semantic processing both when emotion categories are initially learned and when emotions are experienced and interpreted in real time (Jackson et al., 2019; Lindquist et al., 2006, 2015; Satpute & Lindquist, 2019, 2021).

Given that emotions function as both perceptual entities and conceptual constructs, we can understand cross-sensory mappings as operating at two levels. At a lower level, the correspondence between cross-domain sensory stimuli can be linked to attributes such as magnitude and valence in a relatively universal manner. At a higher level, the correspondence might be explained by alignments on semantic dimensions or linguistically encoded emotion categories. This hypothesis is supported by evidence showing that mappings between cross-sensory stimuli display both cultural universality and variability. Cross-culturally, consistent cross-sensory associations are observed when stimuli vary along core affective dimensions. For example, participants from both the U.S. and Mexico consistently matched fast-tempo, major-mode music to more saturated, lighter, and yellower colors. Colors and music strongly aligned on core dimensions like positive/negative and strong/weak (Palmer et al., 2013). When emotional content was statistically controlled for, the correlations between perceptual features (e.g., faster tempo to redder colors) of music and colors disappeared, with two latent affective factors—valence and arousal—accounting for most of the variance (Whiteford et al., 2018).

At the same time, participants from similar cultural and linguistic backgrounds tend to be more aligned in their associations between sensory stimuli and emotions. Research on music-emotion associations demonstrates that while certain basic emotions (e.g., happy/sad) are recognized across cultures, Western European participants (Germany and Norway) showed similar recognition patterns to each other, as did Asian participants (Korea and Indonesia) among themselves, with an in-group advantage for recognizing emotions from one’s own cultural background (Argstatter, 2016). In a study that included participants from 30 countries, linguistic and geographic proximity significantly predicted similarity in color-emotion associations, with linguistic distance being a stronger predictor than geographic distance (Jonauskaite et al., 2020). This interplay between perceptual mechanisms and conceptual mechanisms predicts both a strong universal consistency and local variations specific to people who speak the same language. Understanding which specific color-music associations vary across cultures and languages remain open questions.

Cross-conceptual mappings are more perceptual than you think: The case of metaphors

Traditionally, metaphors have been understood as mappings between abstract and categorical conceptual representations. This means that metaphors are processed not by drawing direct parallels with literal, concrete features, but rather through more abstract associations. For instance, in the metaphor “time is a thief,” the focus is on the abstract qualities of time and theft (taking something away from us), rather than on a literal interpretation involving time wearing a mask or carrying a bag of stolen goods. This abstraction-focused approach to metaphor is supported by evidence suggesting that metaphor comprehension either requires inhibiting literal sensorimotor representations of the source (McGlone & Manfredi, 2001), or bypassing them entirely through direct access to abstract meanings (Glucksberg, 2008; Keysar, 1989).

However, where do the abstract qualities of a concept come from in the first place? Embodiment theory suggests that abstract metaphors are structured by our sensorimotor experiences (Ackerman et al., 2010; Boot & Pecher, 2010; Gibbs, 2006; Wilkowski et al., 2009; see also Desai, 2021; Khatin-Zadeh et al., 2023, for reviews). In conceptual metaphor theory—a major strand of embodied cognition, mappings usually run from richer, highly structured, experientially concrete domains (space, motion, force, bodily sensation) to sparser, abstract ones (time, emotion, social relations). For example, space is three-dimensional and directionless, while time is one-dimensional and directional. This asymmetry in structural properties allows time to “borrow” spatial structure in flexible ways, reflected in the linguistic patterns we use to talk—and think—about it. In English, time is often spatialized in two contrasting frames. In ego-moving expressions (e.g., “we’re coming up on Summer”), time is stationary and we move through it. In time-moving expressions (e.g., “Summer flew by”), time itself is in motion while we remain still. Bodily actions can bias temporal interpretation in line with these spatial frames. For instance, after pulling a chair toward themselves, participants are more likely to interpret “The meeting on Wednesday has been moved forward two days” as moving to Friday (ego-moving) rather than to Monday (time-moving; Boroditsky, 2000; Boroditsky & Ramscar, 2002). Cross-linguistic evidence further shows that languages vary in how they spatialize time—conceptualizing it as flowing from left to right, right to left, front to back, top to bottom, west to east, or even along the course of rivers. Consistent with their spatial frame of time, speakers of different languages mentally represent and reason about temporal sequences differently (Fuhrman & Boroditsky, 2010; R. E. Núñez & Sweetser, 2006; Santiago et al., 2007; Torralbo et al., 2006).

In support of the embodiment theory, behavioral studies show that performing, observing and even imagining compatible physical actions (such as grasping) facilitates the processing of conceptually related metaphors (such as “grasp the concept”; Gibbs, 2006; Gibbs et al., 2006; Horchak et al., 2014; Wilson & Gibbs, 2007). Conversely, simulating metaphors can also influence subsequent bodily judgments and behaviors (Gibbs, 2013; Perlman et al., 2014; Slepian & Ambady, 2014). For example, participants who heard about a successful relationship “moving along in a good direction” subsequently walked both longer in time and further in distance compared with those who heard about an unsuccessful relationship with the same metaphorical frame (Gibbs, 2013). At the neural level, it has been shown that processing texture and taste metaphors such “she had a rough day” (Lacey et al., 2012) and she looked at him sweetly (Citron & Goldberg, 2014) activates corresponding sensorimotor cortices. While such sensorimotor activations during metaphor comprehension could partly reflect associative spread from related literal senses and offer only correlational evidence, stronger causal evidence comes from studies using interference methods: for instance, disrupting motor cortex with transcranial magnetic stimulation (TMS) selectively impairs the comprehension of action-related metaphors (e.g., grasp an idea; Reilly et al., 2019; Willems et al., 2011), suggesting that sensorimotor systems might play a functional role in metaphor understanding (Casasanto, 2022).

The degree of sensorimotor involvement changes as metaphors become familiar. The structure-mapping theory (Gentner, 1983) and career-of-metaphor theory (Bowdle & Gentner, 2005) provide a possible mechanism for how repeated use of metaphoric sense leads to abstraction: Novel metaphors require active analogical mapping between source and target domains, often recruiting sensorimotor simulations of the source. Over time, repeated metaphorical uses crystallize into abstracted senses that can be accessed directly, without reactivating the literal, embodied features of the source domain. Supporting this mechanism, it was shown that novel metaphors more strongly activate sensorimotor features compared with familiar ones (Al-Azary & Katz, 2021; Desai et al., 2011). For example, when comprehending nominal metaphors (e.g., my lawyer is a shark), participants respond more quickly to literal, perceptual features of the source domain (e.g., bite) when the metaphor is novel, but respond more quickly to abstract, conventionalized features (e.g., killer) when the metaphor is familiar (Al-Azary & Katz, 2021).

Based on this embodied view of metaphor, cross-conceptual mappings, like cross-sensory mappings, are grounded in perceptual experiences. Concepts often have an embodied basis that is somewhat consistent across cultures. This is evident in widespread conceptual mappings such as those between time and space, good and up, or affection and warmth, which can be explained by experiential correlations in the environment, magnitude matching, or valence matching. Based on this perceptual foundation, cultures, and languages play a significant role in mediating and elaborating these mappings, leading to constraint variations in how relatively abstract concepts are understood in different cultures. While frequently used abstractions may develop more independent representations over time, their perceptual basis continues to provide a foundation for understanding, especially when learning novel abstract concepts.

Future directions

The empirical findings reviewed in this article show that two previously remote phenomena: cross-sensory mapping and cross-conceptual mapping, are perhaps two manifestations of a global mechanism. Cross-domain mappings, in general, rely on a process that requires operation on both the level of perception and conception.

What insights can we draw from studying cross-sensory mapping and cross-conceptual mapping together?

We have discussed a number of similarities between cross-sensory mappings and cross-conceptual mappings. However, empirical research uniting these mappings remains scarce. Should the mechanisms underlying these mappings be homogeneous, we might expect a positive transfer from one type of mapping to another. For instance, if individuals can adeptly comprehend novel metaphors or show strong analogical reasoning abilities, would they also exhibit greater consistency in cross-sensory mappings compared to individuals who struggle with metaphor comprehension or solving analogies? Moreover, do people tend to rely on the same mechanisms for different types of cross-domain mappings? Could bias in making one kind of cross-domain mapping predict people’s bias in making another kind of cross-domain mapping? For example, in music-color mappings, some individuals might rely more on statistical cues (e.g., certain genres of music often co-occur with specific colors in the environment, like rock music being associated with darker attire, or classical music being associated with yellow or deep red due to warm concert hall lighting), while others might depend more on semantic associations (e.g., happy music paired with bright, cheerful colors; sad music paired with dark, muted colors; Liu et al., 2024). Similarly, in cross-conceptual mappings, such as associating jobs with animals, one might project a police officer onto a dog given that police officers often work with dogs whereas another might project a police officer onto a lion, relying more on shared semantic attributes like authority and power. When statistical and semantic biases lead to different mappings, would a person’s bias in one type of cross-domain mapping predicts their bias in another type of mapping? For example, would those who associate sad classical music with yellow (statistical bias) or blue (semantic bias) tend to map police officers to dogs or lions, respectively?

Can people’s cross-domain mappings tell us about cognitive abilities such as abstraction, creativity and inductive reasoning?

Our ability to identify similarities across seemingly disparate domains is a cognitive skill that broadly manifests in various similarity-based tasks previously used to investigate general cognitive abilities such as abstraction, creativity, and inductive reasoning. Consider, for instance, the classic “odd-one-out” paradigm for inductive reasoning. Individuals must identify the item that does not belong to a given set based on shared characteristics or conceptual relationships. An example set is horse, cow, and milk, where either milk or horse could be the odd one out, depending on whether the relevant feature is category membership (mammal) or context (milk as a product of cows; Duan & Lupyan, 2023). Ad hoc categorization tasks require participants to group ostensibly unrelated things under a novel category. For example, they might be asked to create a category for “items that bring comfort,” individuals need to identify abstract similarities that transcend typical categorization schemes and group items like a blanket, tea, and a book. Similarly, director-matcher tasks like Codenames (a popular word-guessing game) also relies on players’ ability to identify connections between words (e.g., Rissman et al., 2023), where players give one-word clues to help their teammates identify specific words from a grid, such as using the clue “water” to link the words “ocean,” “fish,” and “tears.” These tasks all involve drawing connections between seemingly disparate domains and probing our conceptual space in flexible and creative ways.

Preliminary evidence suggests that proficiency in cross-sensory and cross-conceptual mapping may be part of a domain-general creative ability. For instance, synesthesia—disproportionately observed among artists and other highly creative individuals (Dailey et al., 1997; Domino, 1989; Root-Bernstein & Root-Bernstein, 1999)—is associated with not only more frequent use of cross-sensory language but also more creative use of metaphors (Turner & Littlemore, 2023). This raises intriguing questions about whether creative skills in cross-mapping might transfer to other cognitive tasks. For example, might people who excel at cross-domain mapping also demonstrate greater creativity and flexibility in tasks like ad hoc categorization? do people who favor certain strategies or biases in mapping (e.g., relying on statistical co-occurrences) exhibit the same biases in tasks like odd-one-out*?* Are individuals who align closely in their cross-domain associations (e.g., both associating Bach with blue or nurses with violins) also more effective at communicating about clues in a word guessing game? Could there be a causal relationship whereby training people to do novel cross-domain mapping (e.g., “How is a tree like a family?”) enhances their domain-general abstract reasoning and cognitive flexibility, such as improving performance in tasks requiring ad hoc categorization, transferring solutions across context to solve new problems, and enhancing communication in collaborative settings?

Does language play a causal role in cross-domain mappings?

Throughout this review, we have seen tantalizing hints of linguistic influences on cross-domain mappings, but its causal role remains murky. Does language simply reflect pre-existing mappings or does it actively shape how we connect disparate domains of experience?

Some cross-domain mappings are more automatic than others. For example, mappings between space and time are symmetrically automatic in early life (e.g., Srinivasan & Carey, 2010; see Table 1 and Table 2 for more examples) but become more asymmetrical in adulthood with inconsistent spatial information interfere with time estimation but not the reverse (Casasanto & Boroditsky, 2008; Winter et al., 2015), paralleled by the asymmetry in linguistic metaphors of using space to describe time than the reverse. However, to what extent does language drive this asymmetry? Longitudinal studies could track the development of specific cross-domain mappings from infancy through childhood, examining the temporal precedence of acquiring spatial metaphors for time in relation to changes in mapping symmetry. One could also manipulate the frequency of cross-domain linguistic mappings or teach novel cross-domain expressions, then test for changes in the automaticity of nonlinguistic mappings.

In addition, semantic mediation—one of the main mechanisms for cross-domain mapping—is usually operationalized by measuring alignment on semantic dimensions that are defined by linguistic anchors. But what is the relationship between language and the semantic dimensions that align cross-domain constructs? One possibility is that semantic space is reducible to basic constructs like valence and magnitude, with language merely reflecting these universal dimensions. An alternative possibility is that language not only reflects the semantic space but also programs it. For example, emotion literature has shown that the development of language supports the growth of emotion concept representations from a simple “positive or negative” dichotomy in childhood to a more multidimensional structure in adulthood (Nook et al., 2017). Emotion concepts as attached to emotion words—such as “surprise,” “grateful,” “envy,” “nostalgia,” and “anxiety”—help us not only articulate but also perceive discrete emotions beyond a simple positive-negative scale (Gendron et al., 2012; Lindquist et al., 2014, 2015; Lindquist & Gendron, 2013). Similarly, language goes beyond simply labeling quantities as “more” or “less” but refines our understanding of magnitude by providing us with the vocabulary needed to express, operate over, and memorize (Frank et al., 2008), and represent (Frank et al., 2012; Pitt et al., 2022) large and abstract magnitude (e.g., social status) with finer precision and detail. However, the causal role of language for semantic alignment between domains remains an empirical question. One can examine if people speaking different languages perform differently on cross-domain mappings that require semantic mediation. For example, color semantics might be different across languages, then people might associate the same music with different colors. This would be correlational though—but more causal role could be supported by experimentally manipulating language and examine the change in cross-domain mapping, such as asking multilingual people to do the same task with different linguistic contexts, or teach people to use novel linguistic descriptors for previously unconnected cross-domain stimuli and see if people prefer different mappings when their linguistic system is moderated. To further disentangle the effects of long-term linguistic experience in shaping semantics from the immediate, online influence of language on probing semantics, one can use verbal interference tasks to disrupt linguistic processing and observing effects on the pattern of cross-domain judgments. For example, are people speaking different languages behaving more/less alike when they cannot use language for semantically mediated cross-domain mappings? Are people more or less consistent when they cannot use language?

Another question arises from observations that individuals can perform cross-domain mappings even in the absence of direct sensory experience. For example, despite their inability to visually perceive color, congenitally blind individuals are able to map colors onto semantic dimensions in a manner similar to sighted individuals (Lenci et al., 2013; Saysani, 2019; Saysani et al., 2018, 2021; Shepard & Cooper, 1992). This capability is presumably due to their ability to acquire knowledge and deduce the conceptual structure of color space from language (J. S. Kim et al., 2021; Lupyan et al., 2020; Lupyan & Winter, 2018; Saysani et al., 2018, 2021; van Paridon et al., 2021). Research could further explore how language enables cross-domain mappings, especially in cases where individuals lack the appropriate sensory inputs for learning relevant domain knowledge perceptually. This exploration might involve studying populations with varying perceptual impairments to understand how they achieve conceptual mappings in the absence of typical sensory experiences. Similarly, examining individuals with language impairments could reveal insights into how deficits in linguistic capabilities affect the ability to form cross-domain mappings.

Finally, the remarkable cross-domain mapping abilities of large language models (LLMs) offer a new frontier for exploration (Motoki et al., 2024; Yehudai et al., 2024). Language models (e.g., Devlin, 2018; OpenAI, 2023; Radford et al., 2019) trained solely on text, can generate sensible mappings between disparate domains. This hints at the wealth of cross-domain information encoded in language itself. By comparing LLMs trained on different language corpora to human performance across cultures, we might identify which mappings are learnable from language alone and which require direct sensory experience.

Conclusion

We sought to bring together two important areas of cognitive science that have traditionally been studied separately: cross-sensory mappings and cross-conceptual mappings. By examining these phenomena side by side, we argue that they are better understood as two manifestations of common underlying mechanisms. Specifically, we identified four key mechanisms underlying both types of cross-domain mappings: statistical learning from environmental regularities, matching based on magnitude, valence, and mediation through semantic associations.

While statistical learning, magnitude matching, and valence matching, arise early in development and are shared with other species, semantic mediation involves higher-order process such as abstract reasoning and symbolic interpretation. Crucially, we argue that what may appear as purely perceptual associations involve conceptual mediation, while seemingly abstract conceptual mappings are grounded in perceptual experiences. The interplay between perceptual and conceptual mechanisms is exemplified in phenomena like emotion-mediated mappings, where basic dimensions like valence and arousal interact with linguistically-mediated emotion concepts to produce rich and nuanced associations between disparate sensory domains. Similarly, conceptual metaphors exemplify how abstract concepts are often scaffolded upon and shaped by embodied, perceptual experiences.

Looking ahead, we propose integrating research on cross-sensory and cross-conceptual mappings to empirically examine shared cognitive mechanisms and explore how cross-domain mappings may inform our understanding of broader cognitive skills such as abstraction and creativity. Crucially, we advocate for rigorous investigation into the mechanisms through which language might causally influence cross-domain mappings, which will contribute to the broader discussion of how linguistic experience not only reflects mental representations but potentially plays an active role in shaping and programming them.

Bibliography33

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Akmajian, A., Demer, R. A., Farmer, A. K., & Harnish, R. M. (2001). Linguistics: An introduction to language and communication. MIT Press.
2Boroditsky, L. (2008). Do English and Mandarin speakers think differently about time? Proceedings of the Annual Meeting of the Cognitive Science Society, 30(30).
3Bottini, R., & Casasanto, D. (2013). Space and time in the child’s mind: Metaphoric or ATO Mic? Frontiers in Psychology, 4. https://www.frontiersin.org/article/10.3389/fpsyg.2013.00803
4Bradley, M., & Lang, P. (1999). Affective Norms for English Words (ANEW): Instruction manual and affective ratings. https://www.semanticscholar.org/paper/Affective-Norms-for-English-Words-(ANEW)%3A-Manual-Bradley-Lang/c 765eb 0a 31849361 d 829b 24e 173a 37bab 0919892
5Brinton, L. J. (2000). The structure of modern English: A linguistic introduction. Johns Benjamins. https://books.google.com/books?hl=en&lr=&id=7Zyz 0A 6b XWEC&oi=fnd&pg=PR 13&dq=The+Structure+of+Modern+English&ots=s A Tm G Qf Z 4h&sig=DK Fp 0_sx N Mu U Kxinw RB-F 9a X_VY
6Casasanto, D. (2022). Embodied semantics. In F. T. Li (Ed.), Handbook of cognitive semantics. Brill.http://www.casasanto.com/papers/Casasanto%20Embodied%C 2%A 0Semantics%C 2%A 02022.pdf
7Casasanto, D., Phillips, W., & Boroditsky, L. (2003). Do we think about music in terms of space? Metaphoric representation of musical pitch. Proceedings of the Annual Meeting of the Cognitive Science Society, 25(25).
8Christie, S., Gentner, D., Vosniadou, S., & Kayser, D. (2007). Relational similarity in identity relation: The role of language. Proceedings of the Second European Cognitive Science Conference, 601–666.