Multi-modal Motion Prediction using Temporal Ensembling with Learning-based Aggregation
Kai-Yin Hong, Chieh-Chih Wang, Wen-Chieh Lin

TL;DR
This paper presents a novel temporal ensembling method with learning-based aggregation for trajectory prediction, improving diversity and accuracy by leveraging multi-frame predictions and traffic context, validated on Argoverse 2.
Contribution
Introduces Temporal Ensembling with Learning-based Aggregation, combining multi-frame predictions and traffic context for more accurate and diverse trajectory predictions.
Findings
4% reduction in minADE
5% decrease in minFDE
1.16% reduction in miss rate
Abstract
Recent years have seen a shift towards learning-based methods for trajectory prediction, with challenges remaining in addressing uncertainty and capturing multi-modal distributions. This paper introduces Temporal Ensembling with Learning-based Aggregation, a meta-algorithm designed to mitigate the issue of missing behaviors in trajectory prediction, which leads to inconsistent predictions across consecutive frames. Unlike conventional model ensembling, temporal ensembling leverages predictions from nearby frames to enhance spatial coverage and prediction diversity. By confirming predictions from multiple frames, temporal ensembling compensates for occasional errors in individual frame predictions. Furthermore, trajectory-level aggregation, often utilized in model ensembling, is insufficient for temporal ensembling due to a lack of consideration of traffic context and its tendency to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Video Surveillance and Tracking Methods · Human Pose and Action Recognition
