PicT: A Slim Weakly Supervised Vision Transformer for Pavement Distress Classification
Wenhao Tang, Sheng Huang, Xiaoxian Zhang, Luwen Huangfu

TL;DR
PicT is a novel weakly supervised vision Transformer that improves pavement distress classification by leveraging patch-level pseudo labels and a patch refiner, achieving higher accuracy and efficiency.
Contribution
The paper introduces PicT, a Swin Transformer-based model with a Patch Labeling Teacher and Patch Refiner for enhanced pavement distress classification under weak supervision.
Findings
Outperforms previous models by 2.4% in P@R and 3.9% in F1 scores.
Achieves 1.8x higher throughput and 7x faster training speed.
Effectively exploits patch-level information for better classification accuracy.
Abstract
Automatic pavement distress classification facilitates improving the efficiency of pavement maintenance and reducing the cost of labor and resources. A recently influential branch of this task divides the pavement image into patches and addresses these issues from the perspective of multi-instance learning. However, these methods neglect the correlation between patches and suffer from a low efficiency in the model optimization and inference. Meanwhile, Swin Transformer is able to address both of these issues with its unique strengths. Built upon Swin Transformer, we present a vision Transformer named \textbf{P}avement \textbf{I}mage \textbf{C}lassification \textbf{T}ransformer (\textbf{PicT}) for pavement distress classification. In order to better exploit the discriminative information of pavement images at the patch level, the \textit{Patch Labeling Teacher} is proposed to leverage a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInfrastructure Maintenance and Monitoring · Asphalt Pavement Performance Evaluation · Advanced Neural Network Applications
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Residual Connection · Stochastic Depth · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Adam
