CGCMA: Conditionally-Gated Cross-Modal Attention for Event-Conditioned Asynchronous Fusion
Yunxiang Guo

TL;DR
This paper introduces CGCMA, a novel model for asynchronous multimodal fusion that explicitly reasons about freshness and trust, demonstrated on cryptocurrency market data with improved trading performance.
Contribution
The paper proposes CGCMA, a new attention mechanism that separates grounding from trust control, and introduces CMI, a new asynchronous multimodal dataset for evaluation.
Findings
CGCMA achieves the highest Sharpe ratio among baselines on the crypto news dataset.
The model's gains are not solely due to web scalar features or simple freshness heuristics.
Results support the effectiveness of asynchronous multimodal reasoning in noisy, real-world settings.
Abstract
We study asynchronous alignment, a first-class multimodal learning setting in which a dense primary stream must be fused with sporadic external context whose value depends on when it arrives. Unlike standard multimodal benchmarks that assume structural synchrony, this setting requires models to reason explicitly about freshness and trust. We focus on the event-conditioned case in which continuous market states are paired with delayed web intelligence, and we use high-frequency cryptocurrency markets only as a timestamped, high-noise stress test for this broader problem. We propose CGCMA (Conditionally-Gated Cross-Modal Attention), whose central design principle is to separate text-conditioned grounding from lag-aware trust control. Text first attends over price sequences to identify event-relevant market states, after which a conditional gate uses modality agreement, web features, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
