Domain-Invariant Per-Frame Feature Extraction for Cross-Domain Imitation Learning with Visual Observations
Minung Kim, Kawon Lee, Jungmo Kim, Sungho Choi, Seungyul Han

TL;DR
This paper introduces DIFF-IL, a novel imitation learning method that extracts domain-invariant features from individual frames and uses temporal labeling to improve cross-domain visual imitation tasks.
Contribution
The paper proposes a new IL approach that isolates domain-invariant features per frame and employs frame-wise time labeling for better behavior segmentation and reward assignment.
Findings
DIFF-IL outperforms existing methods in diverse visual environments.
Effective in handling high-dimensional, noisy, and incomplete visual observations.
Improves imitation learning performance across different visual domains.
Abstract
Imitation learning (IL) enables agents to mimic expert behavior without reward signals but faces challenges in cross-domain scenarios with high-dimensional, noisy, and incomplete visual observations. To address this, we propose Domain-Invariant Per-Frame Feature Extraction for Imitation Learning (DIFF-IL), a novel IL method that extracts domain-invariant features from individual frames and adapts them into sequences to isolate and replicate expert behaviors. We also introduce a frame-wise time labeling technique to segment expert behaviors by timesteps and assign rewards aligned with temporal contexts, enhancing task performance. Experiments across diverse visual environments demonstrate the effectiveness of DIFF-IL in addressing complex visual tasks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Advanced Vision and Imaging
