# C-ViT: An Improved ViT Model for Multi-Label Classification of Bamboo Chopstick Defects

**Authors:** Waizhong Wang, Wei Peng, Liancheng Zeng, Yue Shen, Chaoyun Zhu, Yingchun Kuang

PMC · DOI: 10.3390/s26030812 · Sensors (Basel, Switzerland) · 2026-01-26

## TL;DR

This paper introduces C-ViT, an improved ViT model for multi-label classification of defects in bamboo chopsticks, achieving high accuracy with a new feature extraction method and loss function.

## Contribution

Proposes C-ViT with a convolutional feature extraction module and a novel Hard Examples Contrastive Loss for multi-label classification.

## Key findings

- C-ViT achieves 92.8% mAP on the BCDD dataset, outperforming ViTS by 1.2%.
- Adding HCL further improves performance to 94.3% on the same dataset.
- HCL is validated on the VOC2012 dataset, showing effectiveness in multi-label tasks.

## Abstract

The quality of disposable bamboo chopsticks directly affects consumers’ usage experience and health safety. Therefore, quality inspection is particularly important, and multi-label classification of defects can better meet the refined demands of actual production. While ViT has made significant progress in visual tasks, it has limitations when dealing with extreme aspect ratios like bamboo chopsticks. To address this, this paper proposes an improved ViT model, C-ViT, introducing a convolutional neural network feature extraction module (CFE) to replace traditional patch embedding, making the input features more suitable for the ViT model. Moreover, existing loss functions in multi-label classification tasks focus on label prediction optimization, making hard labels difficult to learn due to their low gradient contribution. Therefore, this paper proposes a Hard Examples Contrastive Loss (HCL) function, dynamically selecting hard examples and combining label and feature correlation to construct a contrastive learning mechanism, enhancing the model’s ability to model hard examples. Experimental results show that on the self-built bamboo chopstick defect dataset (BCDD), C-ViT improves the mAP by 1.2% to 92.8% compared to the ViTS model, and can reach 94.3% after adding HCL. In addition, we further verified the effectiveness of the proposed HCL function in multi-label classification tasks on the VOC2012 public dataset.

## Full-text entities

- **Diseases:** Chopstick Defects (MESH:D000013)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12899101/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12899101/full.md

## References

30 references — full list in the complete paper: https://tomesphere.com/paper/PMC12899101/full.md

---
Source: https://tomesphere.com/paper/PMC12899101