Optimizing GPT for Video Understanding: Zero-Shot Performance and Prompt Engineering
Mark Beliaev, Victor Yang, Madhura Raju, Jiachen Sun, Xinghai Hu

TL;DR
This paper presents a novel prompt engineering approach to enhance GPT's zero-shot video classification performance, demonstrating significant improvements through policy simplification and a decomposition-aggregation technique, without additional model fine-tuning.
Contribution
It introduces a new decomposition-aggregation prompt engineering method and demonstrates how prompt optimization can improve GPT's zero-shot video classification performance.
Findings
Simplifying policies reduces false negatives.
Decomposition-aggregation prompts outperform traditional methods.
Prompt engineering significantly enhances GPT's performance.
Abstract
In this study, we tackle industry challenges in video content classification by exploring and optimizing GPT-based models for zero-shot classification across seven critical categories of video quality. We contribute a novel approach to improving GPT's performance through prompt optimization and policy refinement, demonstrating that simplifying complex policies significantly reduces false negatives. Additionally, we introduce a new decomposition-aggregation-based prompt engineering technique, which outperforms traditional single-prompt methods. These experiments, conducted on real industry problems, show that thoughtful prompt design can substantially enhance GPT's performance without additional finetuning, offering an effective and scalable solution for improving video classification.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Image Processing Techniques · Machine Learning and Data Classification
