Learning a Grammar Inducer from Massive Uncurated Instructional Videos
Songyang Zhang, Linfeng Song, Lifeng Jin, Haitao Mi, Kun Xu, Dong Yu, and Jiebo Luo

TL;DR
This paper introduces a novel grammar induction model trained on large-scale uncurated instructional videos, effectively handling loose text-video correspondence and outperforming previous methods across multiple datasets.
Contribution
The authors develop a new model that learns from massive uncurated videos without manual feature engineering, improving grammar induction accuracy under weak correspondence conditions.
Findings
Model trained on YouTube data shows strong performance across unseen datasets.
Outperforms previous state-of-the-art systems trained on in-domain data.
Handles domain shift and noisy labels effectively.
Abstract
Video-aided grammar induction aims to leverage video information for finding more accurate syntactic grammars for accompanying text. While previous work focuses on building systems for inducing grammars on text that are well-aligned with video content, we investigate the scenario, in which text and video are only in loose correspondence. Such data can be found in abundance online, and the weak correspondence is similar to the indeterminacy problem studied in language acquisition. Furthermore, we build a new model that can better learn video-span correlation without manually designed features adopted by previous work. Experiments show that our model trained only on large-scale YouTube data with no text-video alignment reports strong and robust performances across three unseen datasets, despite domain shift and noisy label issues. Furthermore our model yields higher F1 scores than the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling
