FOCAL: Filtered On-device Continuous Activity Logging for Efficient Personal Desktop Summarization
Haoran Yin, Zhiyuan Wen, Jiannong Cao, Bo Yuan, Ruosong Yang

TL;DR
FOCAL is a privacy-first on-device system that efficiently transforms continuous desktop interaction streams into task-organized logs, reducing resource consumption and maintaining high summarization accuracy.
Contribution
It introduces a multi-agent architecture with filtering, task attribution, visual reasoning, and isolated memory, enabling efficient, privacy-preserving desktop stream summarization.
Findings
Reduces token consumption by 60.4% and VLM calls by 72.3% compared to baseline.
Improves Key Information Recall from 0.38 to 0.61.
Maintains high task accuracy and KIR under task interruptions.
Abstract
Desktop interaction streams provide a continuous, privacy-sensitive record of interleaved user tasks. Transforming these streams into task-organized personal logs on-device faces two main challenges: exhaustive Vision-Language Model (VLM) processing strains local resources, and global stream processing causes cross-task context pollution. We present FOCAL (Filtered On-device Continuous Activity Logging), a privacy-first multi-agent system utilizing a unified filter-plan-log architecture. It cascades a lightweight Filter Agent for noise suppression, a text-only Brain Agent for task attribution, a Record Agent for selective visual reasoning, and a task-isolated Memory Agent for context-coherent summarization. Experiments on DesktopBench (comprising 2,572 screenshots across 420 complex sessions) show FOCAL reduces total token consumption by 60.4% and VLM call count by 72.3% versus a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
