PASTA: Vision Transformer Patch Aggregation for Weakly Supervised Target and Anomaly Segmentation

Melanie Neubauer; Elmar Rueckert; Christian Rauch

arXiv:2604.09701·cs.CV·April 14, 2026

PASTA: Vision Transformer Patch Aggregation for Weakly Supervised Target and Anomaly Segmentation

Melanie Neubauer, Elmar Rueckert, Christian Rauch

PDF

TL;DR

PASTA introduces a weakly supervised vision transformer-based method for real-time, pixel-precise target and anomaly segmentation in unstructured environments, reducing training time significantly.

Contribution

It presents a novel weakly supervised pipeline using ViT features and semantic prompts for zero-shot segmentation, outperforming domain-specific baselines.

Findings

01

Achieves up to 88.3% IoU for target segmentation.

02

Reduces training time by 75.8% compared to baselines.

03

Performs well on industrial and agricultural datasets.

Abstract

Detecting unseen anomalies in unstructured environments presents a critical challenge for industrial and agricultural applications such as material recycling and weeding. Existing perception systems frequently fail to satisfy the strict operational requirements of these domains, specifically real-time processing, pixel-level segmentation precision, and robust accuracy, due to their reliance on exhaustively annotated datasets. To address these limitations, we propose a weakly supervised pipeline for object segmentation and classification using weak image-level supervision called 'Patch Aggregation for Segmentation of Targets and Anomalies' (PASTA). By comparing an observed scene with a nominal reference, PASTA identifies Target and Anomaly objects through distribution analysis in self-supervised Vision Transformer (ViT) feature spaces. Our pipeline utilizes semantic text-prompts via the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.