HLTCOE Evaluation Team at TREC 2025: VQA Track
Dengjia Zhang, Charles Weng, Katherine Guerrerio, Yi Lu, Kenton Murray, Alexander Martin, Reno Kriz, Benjamin Van Durme

TL;DR
This paper introduces a listwise learning framework with a novel loss function for improved answer ranking and generation in video question answering, demonstrating enhanced accuracy and stability in TREC VQA 2025 evaluations.
Contribution
The paper presents a new listwise ranking approach with Masked Pointer Cross-Entropy Loss for better answer generation and ranking stability in multimodal video question answering.
Findings
Improved accuracy in answer generation.
Enhanced ranking stability for temporal reasoning questions.
Consistent gains demonstrated in TREC VQA 2025 results.
Abstract
The HLTCOE Evaluation team participated in TREC VQA's Answer Generation (AG) task, for which we developed a listwise learning framework that aims to improve semantic precision and ranking consistency in answer generation. Given a video-question pair, a base multimodal model first generates multiple candidate answers, which are then reranked using a model trained with a novel Masked Pointer Cross-Entropy Loss with Rank Weights. This objective integrates pointer-based candidate selection, rank-dependent weighting, and masked cross-entropy under vocabulary restriction, enabling stable and interpretable listwise optimization. By bridging generative modeling with discriminative ranking, our method produces coherent, fine-grained answer lists. Experiments reveal consistent gains in accuracy and ranking stability, especially for questions requiring temporal reasoning and semantic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Speech and dialogue systems
