Olympus: A Universal Task Router for Computer Vision Tasks

Yuanze Lin; Yunsheng Li; Dongdong Chen; Weijian Xu; Ronald Clark,; Philip H. S. Torr

arXiv:2412.09612·cs.CV·April 3, 2025

Olympus: A Universal Task Router for Computer Vision Tasks

Yuanze Lin, Yunsheng Li, Dongdong Chen, Weijian Xu, Ronald Clark,, Philip H. S. Torr

PDF

Open Access 1 Repo 1 Models 1 Datasets

TL;DR

Olympus is a versatile framework that transforms Multimodal Large Language Models into a unified system capable of handling diverse computer vision tasks through instruction-based routing, without extensive retraining.

Contribution

It introduces Olympus, a universal task router that enables existing MLLMs to perform numerous vision tasks via modular, instruction-driven delegation, expanding their functionality efficiently.

Findings

01

Achieves 94.75% routing accuracy across 20 tasks

02

Attains 91.82% precision in chained action scenarios

03

Demonstrates effective integration with existing MLLMs

Abstract

We introduce Olympus, a new approach that transforms Multimodal Large Language Models (MLLMs) into a unified framework capable of handling a wide array of computer vision tasks. Utilizing a controller MLLM, Olympus delegates over 20 specialized tasks across images, videos, and 3D objects to dedicated modules. This instruction-based routing enables complex workflows through chained actions without the need for training heavy generative models. Olympus easily integrates with existing MLLMs, expanding their capabilities with comparable performance. Experimental results demonstrate that Olympus achieves an average routing accuracy of 94.75% across 20 tasks and precision of 91.82% in chained action scenarios, showcasing its effectiveness as a universal task router that can solve a diverse range of computer vision tasks. Project page: http://yuanze-lin.me/Olympus_page/

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yuanze-lin/Olympus
pytorchOfficial

Models

🤗
Yuanze/Olympus
model· 11 dl· ♡ 3
11 dl♡ 3

Datasets

Yuanze/Olympus
dataset· 111 dl
111 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization