ECHO: Event-Centric Hypergraph Operations via Multi-Agent Collaboration for Multimedia Event Extraction

Hailong Chu; Hongbing Li; Yunlong Chu; Shutai Huang; Xingyue Zhang; Tinghe Yan; Jinsong Zhang; Shuo Zhang; Lei Li

arXiv:2603.06683·cs.CV·April 7, 2026

ECHO: Event-Centric Hypergraph Operations via Multi-Agent Collaboration for Multimedia Event Extraction

Hailong Chu, Hongbing Li, Yunlong Chu, Shutai Huang, Xingyue Zhang, Tinghe Yan, Jinsong Zhang, Shuo Zhang, Lei Li

PDF

TL;DR

ECHO is a multi-agent framework that explicitly models multimedia events as hypergraphs, enabling iterative refinement and improved accuracy in multimedia event extraction tasks.

Contribution

It introduces a hypergraph-based approach with explicit intermediate structures and a decoupled linking and binding strategy, advancing multimedia event extraction methods.

Findings

01

ECHO outperforms previous methods with 7.3 F1 point gains on event mention extraction.

02

ECHO achieves 15.5 F1 point improvements on argument role prediction.

03

The hypergraph and iterative refinement approach enhances interpretability and robustness.

Abstract

Multimedia event extraction (M2E2) aims to predict triggers, ground arguments across text and images, and then assemble them into schema-consistent event records. Recent LLM-based approaches have shown strong potential for M2E2, but their intermediate event hypotheses often remain implicit, and event-argument linking is still tightly coupled with role binding. This leaves little opportunity to inspect or revise intermediate event hypotheses and makes predictions brittle to early errors. To bridge this gap, we present ECHO, a multi-agent framework that reframes M2E2 as iterative refinement over an explicit Multimedia Event Hypergraph (MEHG). Instead of relying on implicit linear generation, ECHO performs auditable atomic updates over a shared hypergraph, making intermediate event structures explicit and revisable. Furthermore, we introduce a Link-then-Bind strategy that decouples…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.