ECHO: Event-Centric Hypergraph Operations via Multi-Agent Collaboration for Multimedia Event Extraction
Hailong Chu, Hongbing Li, Yunlong Chu, Shutai Huang, Xingyue Zhang, Tinghe Yan, Jinsong Zhang, Shuo Zhang, Lei Li

TL;DR
ECHO is a multi-agent framework that explicitly models multimedia events as hypergraphs, enabling iterative refinement and improved accuracy in multimedia event extraction tasks.
Contribution
It introduces a hypergraph-based approach with explicit intermediate structures and a decoupled linking and binding strategy, advancing multimedia event extraction methods.
Findings
ECHO outperforms previous methods with 7.3 F1 point gains on event mention extraction.
ECHO achieves 15.5 F1 point improvements on argument role prediction.
The hypergraph and iterative refinement approach enhances interpretability and robustness.
Abstract
Multimedia event extraction (M2E2) aims to predict triggers, ground arguments across text and images, and then assemble them into schema-consistent event records. Recent LLM-based approaches have shown strong potential for M2E2, but their intermediate event hypotheses often remain implicit, and event-argument linking is still tightly coupled with role binding. This leaves little opportunity to inspect or revise intermediate event hypotheses and makes predictions brittle to early errors. To bridge this gap, we present ECHO, a multi-agent framework that reframes M2E2 as iterative refinement over an explicit Multimedia Event Hypergraph (MEHG). Instead of relying on implicit linear generation, ECHO performs auditable atomic updates over a shared hypergraph, making intermediate event structures explicit and revisable. Furthermore, we introduce a Link-then-Bind strategy that decouples…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
