The Agent's First Day: Benchmarking Learning, Exploration, and Scheduling in the Workplace Scenarios
Daocheng Fu, Jianbiao Mei, Rong Wu, Xuemeng Yang, Jia Xu, Ding Wang, Pinlong Cai, Yong Liu, Licheng Wen, Botian Shi

TL;DR
This paper introduces extit{EvoEnv}, a dynamic benchmarking environment for evaluating multi-modal large language models in realistic workplace scenarios, focusing on scheduling, exploration, and continual learning challenges.
Contribution
It presents a novel environment for assessing agent robustness in dynamic, real-world tasks, highlighting deficiencies of current models and promoting more reliable, adaptable AI systems.
Findings
Current agents perform poorly in dynamic, uncertain environments.
Active exploration reduces hallucinations and improves decision-making.
Continuous learning strategies enhance adaptability in evolving tasks.
Abstract
The rapid evolution of Multi-modal Large Language Models (MLLMs) has advanced workflow automation; however, existing research mainly targets performance upper bounds in static environments, overlooking robustness for stochastic real-world deployment. We identify three key challenges: dynamic task scheduling, active exploration under uncertainty, and continuous learning from experience. To bridge this gap, we introduce \method{}, a dynamic evaluation environment that simulates a "trainee" agent continuously exploring a novel setting. Unlike traditional benchmarks, \method{} evaluates agents along three dimensions: (1) context-aware scheduling for streaming tasks with varying priorities; (2) prudent information acquisition to reduce hallucination via active exploration; and (3) continuous evolution by distilling generalized strategies from rule-based, dynamically generated tasks.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Machine Learning and Algorithms · Explainable Artificial Intelligence (XAI)
