PicT: A Slim Weakly Supervised Vision Transformer for Pavement Distress   Classification

Wenhao Tang; Sheng Huang; Xiaoxian Zhang; Luwen Huangfu

arXiv:2209.10074·cs.CV·September 22, 2022

PicT: A Slim Weakly Supervised Vision Transformer for Pavement Distress Classification

Wenhao Tang, Sheng Huang, Xiaoxian Zhang, Luwen Huangfu

PDF

Open Access 1 Repo

TL;DR

PicT is a novel weakly supervised vision Transformer that improves pavement distress classification by leveraging patch-level pseudo labels and a patch refiner, achieving higher accuracy and efficiency.

Contribution

The paper introduces PicT, a Swin Transformer-based model with a Patch Labeling Teacher and Patch Refiner for enhanced pavement distress classification under weak supervision.

Findings

01

Outperforms previous models by 2.4% in P@R and 3.9% in F1 scores.

02

Achieves 1.8x higher throughput and 7x faster training speed.

03

Effectively exploits patch-level information for better classification accuracy.

Abstract

Automatic pavement distress classification facilitates improving the efficiency of pavement maintenance and reducing the cost of labor and resources. A recently influential branch of this task divides the pavement image into patches and addresses these issues from the perspective of multi-instance learning. However, these methods neglect the correlation between patches and suffer from a low efficiency in the model optimization and inference. Meanwhile, Swin Transformer is able to address both of these issues with its unique strengths. Built upon Swin Transformer, we present a vision Transformer named \textbf{P}avement \textbf{I}mage \textbf{C}lassification \textbf{T}ransformer (\textbf{PicT}) for pavement distress classification. In order to better exploit the discriminative information of pavement images at the patch level, the \textit{Patch Labeling Teacher} is proposed to leverage a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dearcaat/pict
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInfrastructure Maintenance and Monitoring · Asphalt Pavement Performance Evaluation · Advanced Neural Network Applications

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Residual Connection · Stochastic Depth · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Adam