What Can We Learn from Harry Potter? An Exploratory Study of Visual Representation Learning from Atypical Videos

Qiyue Sun; Qiming Huang; Yang Yang; Hongjun Wang; Jianbo Jiao

arXiv:2508.21770·cs.CV·September 9, 2025

What Can We Learn from Harry Potter? An Exploratory Study of Visual Representation Learning from Atypical Videos

Qiyue Sun, Qiming Huang, Yang Yang, Hongjun Wang, Jianbo Jiao

PDF

Open Access 4 Reviews

TL;DR

This study explores how exposing models to atypical, unusual videos like sci-fi and animation can enhance open-world visual representation learning, improving tasks such as OOD detection, NCD, and ZSAR.

Contribution

It introduces a new atypical video dataset and demonstrates that such data improves open-world learning tasks, highlighting the importance of semantic diversity in atypical samples.

Findings

01

Atypical videos improve OOD detection performance.

02

Semantic diversity in atypical data enhances generalization in NCD and ZSAR.

03

Using fewer, more diverse atypical samples outperforms larger, typical datasets.

Abstract

Humans usually show exceptional generalisation and discovery ability in the open world, when being shown uncommon new concepts. Whereas most existing studies in the literature focus on common typical data from closed sets, open-world novel discovery is under-explored in videos. In this paper, we are interested in asking: What if atypical unusual videos are exposed in the learning process? To this end, we collect a new video dataset consisting of various types of unusual atypical data (e.g., sci-fi, animation, etc.). To study how such atypical data may benefit open-world learning, we feed them into the model training process for representation learning. Focusing on three key tasks in open-world learning: out-of-distribution (OOD) detection, novel category discovery (NCD), and zero-shot action recognition (ZSAR), we found that even straightforward learning approaches with atypical data…

Peer Reviews

Decision·ICLR 2025 Conference Withdrawn Submission

Reviewer 01Rating 3Confidence 4

Strengths

+ The paper is well-written and very easy to understand. + The experimental results of the paper are very good compared to the baseline.

Weaknesses

- The experimental results are insufficient. - There is a lack of insight regarding the core atypical data.

Reviewer 02Rating 3Confidence 5

Strengths

The first paper to introduce a dataset containing atypical videos in sci-fi and animation category.

Weaknesses

1. Very limited experiments - fine-tuning only vanilla ResNet, with one in-distribution dataset and showing improvement on that is not enough at all. There are a lot of strong models in existing literature that do OOD detection with high robustness to outliers. To show effectiveness of the proposed atypical dataset, need a much more extensive experiments on stronger models and more in-distribution datasets. 2. Missing quantitative evaluations - Randomly combining some of the 2,3 categories

Reviewer 03Rating 1Confidence 4

Strengths

The method tests a strategy known to work in other problems such as text and image classification on video classification to show that it works with their new dataset. It’s nice to see different experiments on how much different outlier methods work to see how each supporting dataset separately contributes to accuracy. The paper is easy to read and clear on what they are doing.

Weaknesses

Major: The paper is lacking in novelty and is applying known methods on known datasets. This would fit better in an applications track at a conference rather than a general research track since there isn’t much novel about the method or the datasets. This rise to the level of novelty required to be published at ICRL or similar conferences. Authors need to cite Terry Boult’s work where “atypical” are called “known unknowns” and aid in detection and have been around even before this works cit

Reviewer 04Rating 1Confidence 5

Strengths

The observation that training a model to recognize out-of-distribution samples on more out-of-distribution samples improves its test time performance makes sense. The paper is readable.

Weaknesses

Although the paper is readable, the writing quality is low (grammatical mistakes, convoluted writing). Overall, the presentation quality is low (organization of the manuscript, completeness of the captions, notation, clarity, etc.). The contribution is overclaimed in the abstract/introduction. The paper only show results on OOD (and even that in an extremely narrow setting) but claim a contribution to "visual representation learning in the open world". The proposed "atypical" video dataset is

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning