MViT: A vision transformer with fractal path reordering and dynamic positional encoding

Bomin Liu; Linjun He; Yan Zhu; Anil Yaman; Anil Yaman; Anil Yaman; Anil Yaman

PMC · DOI:10.1371/journal.pone.0340788·January 16, 2026

MViT: A vision transformer with fractal path reordering and dynamic positional encoding

Bomin Liu, Linjun He, Yan Zhu, Anil Yaman, Anil Yaman, Anil Yaman, Anil Yaman

PDF

Open Access

TL;DR

MViT is a new Vision Transformer that improves spatial coherence and structural adaptability using fractal path reordering and dynamic positional encoding.

Contribution

The novel use of a recursive Moore curve and fractal-based components to enhance spatial continuity and structural modeling in Vision Transformers.

Findings

01

MViT improves classification accuracy by 0.52% on CIFAR-100 and 0.31% on ImageNet-21k compared to ViT-B/16.

02

The model achieves better PSNR and SSIM scores, indicating improved structural representation.

03

MViT shows robustness to rotation and maintains performance across different Transformer backbones and tasks.

Abstract

Vision Transformers have demonstrated remarkable performance in image classification and structural modeling; however, fixed patch partitioning and static positional encoding often disrupt spatial continuity, thereby limiting their ability to represent rotated structures and irregular boundary regions. To address these limitations, we propose the Moore-curve Vision Transformer (MViT), a Vision Transformer (ViT) framework based on a recursive Moore curve. The proposed framework comprises three key components. First, a multi-order fractal mapping is employed to optimize patch reordering and enhance the spatial coherence of the token sequence. Second, a 7×7 dynamic partitioning template together with a boundary compensation algorithm jointly optimizes dense structural representation and resolution adaptability. Third, a period-aware positional encoding module integrates fractal periodic…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Genes2

VIT TOP1

Proteins2

Species2

Sus scrofa(pig · species)Homo sapiens(human · species)

Cell lines2

ViT[5— Mus musculus (Mouse) · Transformed cell line -T[2— Mus musculus (Mouse) · Transformed cell line

Chemicals1

Anil

Diseases3

MViT ORCID iD COVID-19

Figures50

Click any figure to enlarge with its caption.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Face Recognition and Perception · Advanced Optical Imaging Technologies