Lecture video indexing using boosted margin maximizing neural networks
Di Ma, Xi Zhang, Xu Ouyang, Gady Agam

TL;DR
This paper introduces a boosted deep convolutional neural network system for lecture video indexing, effectively matching slide images to video frames despite noise, occlusion, and perspective distortions.
Contribution
It proposes a novel neural network architecture combined with boosting for improved slide-to-video frame matching in lecture indexing.
Findings
Enhanced robustness to noise and occlusion
Superior performance over existing methods
Effective handling of spatial transformations
Abstract
This paper presents a novel approach for lecture video indexing using a boosted deep convolutional neural network system. The indexing is performed by matching high quality slide images, for which text is either known or extracted, to lower resolution video frames with possible noise, perspective distortion, and occlusions. We propose a deep neural network integrated with a boosting framework composed of two sub-networks targeting feature extraction and similarity determination to perform the matching. The trained network is given as input a pair of slide image and a candidate video frame image and produces the similarity between them. A boosting framework is integrated into our proposed network during the training process. Experimental results show that the proposed approach is much more capable of handling occlusion, spatial transformations, and other types of noises when compared…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Advanced Image and Video Retrieval Techniques · Advanced Vision and Imaging
