cotomi Act: Learning to Automate Work by Watching You

Masafumi Oyamada; Kunihiro Takeoka; Kosuke Akimoto; Ryoma Obara; Masafumi Enomoto; Haochen Zhang; Daichi Haraguchi; Takuya Tamura

arXiv:2605.03231·cs.AI·May 6, 2026

cotomi Act: Learning to Automate Work by Watching You

Masafumi Oyamada, Kunihiro Takeoka, Kosuke Akimoto, Ryoma Obara, Masafumi Enomoto, Haochen Zhang, Daichi Haraguchi, Takuya Tamura

PDF

TL;DR

cotomi Act is a browser agent that learns from user behavior to automate tasks and organize knowledge, achieving high success rates and enabling shared workspace collaboration.

Contribution

It introduces a novel browser-based agent that combines reliable multi-step task execution with passive organizational knowledge learning from user behavior.

Findings

01

Achieves 80.4% success on WebArena tasks, surpassing human baseline of 78.2%.

02

Passively abstracts user browsing into shared artifacts like task boards and wikis.

03

Demonstrates improved task success as behavior-derived knowledge accumulates.

Abstract

What if a browser agent could learn your work simply by watching you do it? We present cotomi Act, a browser-based computer-using agent that combines reliable multi-step task execution with persistent organizational knowledge learned from user behavior. For execution, an agent scaffold with adaptive lazy observation, verbal-diff-based history compression, coarse-grained actions, and test-time scaling via best-of-N action selection achieves 80.4% on the 179-task WebArena human-evaluation subset, exceeding the reported 78.2% human baseline. For organizational knowledge, a behavior-to-knowledge pipeline passively observes the user's browsing and progressively abstracts it into artifacts (task boards, wiki) exposed through a shared workspace editable by both user and agent. A controlled proxy evaluation confirms that task success improves as behavior-derived knowledge accumulates. In our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.