# Needle in a haystack: Coarse-to-fine alignment network for moment retrieval from large-scale video collections

**Authors:** Lingwen Meng, Fangyuan Liu, Mingyong Xin, Siqi Guo, Fu Zou

PMC · DOI: 10.1371/journal.pone.0320661 · PLOS One · 2025-05-15

## TL;DR

This paper introduces a new method for efficiently finding specific video moments in large collections using a two-step approach with a coarse-to-fine alignment network.

## Contribution

The novel coarse-to-fine alignment network (CFAN) decomposes moment retrieval into two subtasks for improved efficiency and accuracy.

## Key findings

- CFAN outperforms existing methods on three public video datasets.
- The multi-level alignment information enhances global and fine-grained contextual understanding.
- The method is efficient and scalable for large-scale video collections.

## Abstract

Moment retrieval from large-scale video collections aims to search and localize the temporal boundary of a video moment from a collection of numerous videos according to the given natural language query. Existing methods for moment retrieval in a single video is too time-consuming to directly scale to this task due to their sophisticated network architecture. In this paper, we decompose the original problem into two mutually boosting subtasks: video retrieval from video collections and moment retrieval in a single video, and propose the coarse-to-fine alignment network (CFAN) including a video alignment module, a cross-modal interaction module and flow of multi-level coarse-to-fine alignment information. Through the interaction of the multi-level information from two subtasks, our method makes full use of the global contextual information in videos and the fine-grained alignment information between videos and queries. We perform sufficient experiments on three public datasets ActivityNet Captions, Charades-STA and DiDeMo and the evaluation results demonstrate the effectiveness of the proposed CFAN method.

## Full-text entities

- **Chemicals:** FA (MESH:D005492), CFAN (-), DETR (MESH:C035773)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12080789/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12080789/full.md

## References

47 references — full list in the complete paper: https://tomesphere.com/paper/PMC12080789/full.md

---
Source: https://tomesphere.com/paper/PMC12080789