In-Context Ensemble Learning from Pseudo Labels Improves Video-Language Models for Low-Level Workflow Understanding
Moucheng Xu, Evangelos Chatzaroulas, Luc McCutcheon, Abdul, Ahad, Hamzah Azeem, Janusz Marecki, Ammar Anwar

TL;DR
This paper introduces an in-context ensemble learning approach that improves video-language models' ability to generate accurate, step-by-step Standard Operating Procedures from video demonstrations, especially in low-level workflow understanding.
Contribution
The paper proposes a novel in-context ensemble learning strategy that aggregates pseudo labels and enhances zero-shot SOP generation in video-language models.
Findings
In-context learning improves temporal accuracy of SOP generation.
Ensemble learning enhances model capabilities beyond context window limits.
The approach consistently outperforms baseline models in SOP tasks.
Abstract
A Standard Operating Procedure (SOP) defines a low-level, step-by-step written guide for a business software workflow. SOP generation is a crucial step towards automating end-to-end software workflows. Manually creating SOPs can be time-consuming. Recent advancements in large video-language models offer the potential for automating SOP generation by analyzing recordings of human demonstrations. However, current large video-language models face challenges with zero-shot SOP generation. In this work, we first explore in-context learning with video-language models for SOP generation. We then propose an exploration-focused strategy called In-Context Ensemble Learning, to aggregate pseudo labels of multiple possible paths of SOPs. The proposed in-context ensemble learning as well enables the models to learn beyond its context window limit with an implicit consistency regularisation. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Anomaly Detection Techniques and Applications · Time Series Analysis and Forecasting
