Multidimensional Task Learning: A Unified Tensor Framework for Computer Vision Tasks
Alaa El Ichi, Khalide Jbilou

TL;DR
This paper proposes a unified tensor-based framework for computer vision tasks, enabling more flexible and expressive task configurations than traditional matrix-based methods.
Contribution
It introduces Generalized Einstein MLPs (GE-MLPs) operating on tensors, expanding the expressible space of vision tasks beyond matrix-based limitations.
Findings
Classification, segmentation, and detection are special cases of the proposed framework.
The tensor task space is strictly larger than matrix-based formulations.
The framework allows for complex task configurations like spatiotemporal and cross-modal predictions.
Abstract
This paper introduces Multidimensional Task Learning (MTL), a unified mathematical framework based on Generalized Einstein MLPs (GE-MLPs) that operate directly on tensors via the Einstein product. We argue that current computer vision task formulations are inherently constrained by matrix-based thinking: standard architectures rely on matrix-valued weights and vectorvalued biases, requiring structural flattening that restricts the space of naturally expressible tasks. GE-MLPs lift this constraint by operating with tensor-valued parameters, enabling explicit control over which dimensions are preserved or contracted without information loss. Through rigorous mathematical derivations, we demonstrate that classification, segmentation, and detection are special cases of MTL, differing only in their dimensional configuration within a formally defined task space. We further prove that this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Human Pose and Action Recognition
