BeetleFlow: An Integrative Deep Learning Pipeline for Beetle Image Processing

Fangxun Liu; S M Rayeed; Samuel Stevens; Alyson East; Cheng Hsuan Chiang; Colin Lee; Daniel Yi; Junke Yang; Tejas Naik; Ziyi Wang; Connor Kilrain; Elijah H Buckwalter; Jiacheng Hou; Saul Ibaven Bueno; Shuheng Wang; Xinyue Ma; Yifan Liu; Zhiyuan Tao; Ziheng Zhang; Eric Sokol; Michael Belitz; Sydne Record; Charles V. Stewart; Wei-Lun Chao

arXiv:2511.00255·cs.CV·March 30, 2026

BeetleFlow: An Integrative Deep Learning Pipeline for Beetle Image Processing

Fangxun Liu, S M Rayeed, Samuel Stevens, Alyson East, Cheng Hsuan Chiang, Colin Lee, Daniel Yi, Junke Yang, Tejas Naik, Ziyi Wang, Connor Kilrain, Elijah H Buckwalter, Jiacheng Hou, Saul Ibaven Bueno, Shuheng Wang, Xinyue Ma, Yifan Liu, Zhiyuan Tao, Ziheng Zhang, Eric Sokol

PDF

TL;DR

BeetleFlow is a comprehensive deep learning pipeline that automates beetle detection, cropping, and segmentation in tray images, significantly aiding large-scale entomological data analysis.

Contribution

The paper introduces a novel multi-stage pipeline combining transformer-based detection and segmentation models specifically tailored for beetle image processing.

Findings

01

Achieved high accuracy in beetle segmentation with fine-tuned transformer models.

02

Developed an iterative detection process using open-vocabulary models.

03

Enhanced efficiency in processing large-scale beetle image datasets.

Abstract

In entomology and ecology research, biologists often need to collect a large number of insects, among which beetles are the most common species. A common practice for biologists to organize beetles is to place them on trays and take a picture of each tray. Given the images of thousands of such trays, it is important to have an automated pipeline to process the large-scale data for further research. Therefore, we develop a 3-stage pipeline to detect all the beetles on each tray, sort and crop the image of each beetle, and do morphological segmentation on the cropped beetles. For detection, we design an iterative process utilizing a transformer-based open-vocabulary object detector and a vision-language model. For segmentation, we manually labeled 670 beetle images and fine-tuned two variants of a transformer-based segmentation model to achieve fine-grained segmentation of beetles with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.