D-FINE-seg: Object Detection and Instance Segmentation Framework with multi-backend deployment
Argo Saakyan, Dmitry Solntsev

TL;DR
D-FINE-seg extends a transformer-based object detection framework to include real-time instance segmentation with improved accuracy and efficient multi-backend deployment, maintaining competitive latency.
Contribution
It introduces a novel instance segmentation extension for D-FINE with a lightweight mask head and a comprehensive deployment pipeline across multiple inference engines.
Findings
Improves F1-score over YOLO26 on TACO dataset
Maintains competitive latency in real-time segmentation
Provides an open-source multi-backend deployment framework
Abstract
Transformer-based real-time object detectors achieve strong accuracy-latency trade-offs, and D-FINE is among the top-performing recent architectures. However, real-time instance segmentation with transformers is still less common. We present D-FINE-seg, an instance segmentation extension of D-FINE that adds: a lightweight mask head, segmentation-aware training, including box cropped BCE and dice mask losses, auxiliary and denoising mask supervision, and adapted Hungarian matching cost. On the TACO dataset, D-FINE-seg improves F1-score over Ultralytics YOLO26 under a unified TensorRT FP16 end-to-end benchmarking protocol, while maintaining competitive latency. Second contribution is an end-to-end pipeline for training, exporting, and optimized inference across ONNX, TensorRT, OpenVINO for both object detection and instance segmentation tasks. This framework is released as open-source…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
