Time-rEversed diffusioN tEnsor Transformer: A new TENET of Few-Shot Object Detection
Shan Zhang, Naila Murray, Lei Wang, Piotr Koniusz

TL;DR
This paper introduces TENET, a novel tensor transformer approach for few-shot object detection that captures detailed multi-way features and dynamically models query-support correlations, significantly improving detection accuracy.
Contribution
The paper proposes TENET, a tensor transformer with high-order representations and a Transformer Relation Head, addressing information loss and positional sensitivity in existing FSOD methods.
Findings
Achieves state-of-the-art results on PASCAL VOC, FSOD, and COCO datasets.
Effectively captures multi-way feature interactions for robust detection.
Improves sensitivity to positional variations of objects.
Abstract
In this paper, we tackle the challenging problem of Few-shot Object Detection. Existing FSOD pipelines (i) use average-pooled representations that result in information loss; and/or (ii) discard position information that can help detect object instances. Consequently, such pipelines are sensitive to large intra-class appearance and geometric variations between support and query images. To address these drawbacks, we propose a Time-rEversed diffusioN tEnsor Transformer (TENET), which i) forms high-order tensor representations that capture multi-way feature occurrences that are highly discriminative, and ii) uses a transformer that dynamically extracts correlations between the query image and the entire support set, instead of a single average-pooled support embedding. We also propose a Transformer Relation Head (TRH), equipped with higher-order representations, which encodes correlations…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Multimodal Machine Learning Applications
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Adam · Position-Wise Feed-Forward Layer · Dense Connections · Label Smoothing · Absolute Position Encodings · Layer Normalization
