HyperTASR: Hypernetwork-Driven Task-Aware Scene Representations for Robust Manipulation
Li Sun, Jiefeng Wu, Feng Chen, Ruizhe Liu, Yanchao Yang

TL;DR
HyperTASR introduces a hypernetwork-driven framework that dynamically modulates scene representations based on task and phase, improving robotic manipulation by emulating human perceptual adaptation.
Contribution
It presents a novel hypernetwork-based approach that adaptively transforms scene representations conditioned on task objectives and execution phase, enhancing policy learning.
Findings
Significant performance improvements in simulation and real-world tasks.
Enhanced focus on task-relevant scene features through ablation studies.
Better alignment with human-like perceptual adaptation during manipulation.
Abstract
Effective policy learning for robotic manipulation requires scene representations that selectively capture task-relevant environmental features. Current approaches typically employ task-agnostic representation extraction, failing to emulate the dynamic perceptual adaptation observed in human cognition. We present HyperTASR, a hypernetwork-driven framework that modulates scene representations based on both task objectives and the execution phase. Our architecture dynamically generates representation transformation parameters conditioned on task specifications and progression state, enabling representations to evolve contextually throughout task execution. This approach maintains architectural compatibility with existing policy learning frameworks while fundamentally reconfiguring how visual features are processed. Unlike methods that simply concatenate or fuse task embeddings with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
