NoteIt: A System Converting Instructional Videos to Interactable Notes Through Multimodal Video Understanding
Running Zhao, Zhihan Jiang, Xinchen Zhang, Chirui Chang, Handi Chen, Weipeng Deng, Luyao Jin, Xiaojuan Qi, Xun Qian, Edith C.H. Ngai

TL;DR
NoteIt is a system that automatically converts instructional videos into interactive, customizable notes by extracting hierarchical and multimodal information, improving note quality and user experience.
Contribution
It introduces a novel pipeline for faithful extraction of hierarchical and multimodal information from videos to generate interactive notes, addressing limitations of existing tools.
Findings
High performance in objective metrics
Positive user feedback in usability study
Effective extraction of hierarchical and multimodal information
Abstract
Users often take notes for instructional videos to access key knowledge later without revisiting long videos. Automated note generation tools enable users to obtain informative notes efficiently. However, notes generated by existing research or off-the-shelf tools fail to preserve the information conveyed in the original videos comprehensively, nor can they satisfy users' expectations for diverse presentation formats and interactive features when using notes digitally. In this work, we present NoteIt, a system, which automatically converts instructional videos to interactable notes using a novel pipeline that faithfully extracts hierarchical structure and multimodal key information from videos. With NoteIt's interface, users can interact with the system to further customize the content and presentation formats of the notes according to their preferences. We conducted both a technical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
