A lightweight Transformer-based model for fish landmark detection
Alzayat Saleh, David Jones, Dean Jerry, Mostafa Rahimi Azghadi

TL;DR
This paper introduces MFLD-net, a lightweight convolution-based Transformer model for fish landmark detection that performs well on small datasets and is suitable for mobile applications.
Contribution
The paper presents a novel lightweight model combining CNNs and ViT elements for fish landmark detection, effective in low-data regimes and suitable for embedded devices.
Findings
MFLD-net achieves comparable or superior accuracy to state-of-the-art CNNs.
The model does not require pre-training and generalizes well on small datasets.
It is lightweight and suitable for mobile and embedded systems.
Abstract
Transformer-based models, such as the Vision Transformer (ViT), can outperform onvolutional Neural Networks (CNNs) in some vision tasks when there is sufficient training data. However, (CNNs) have a strong and useful inductive bias for vision tasks (i.e. translation equivariance and locality). In this work, we developed a novel model architecture that we call a Mobile fish landmark detection network (MFLD-net). We have made this model using convolution operations based on ViT (i.e. Patch embeddings, Multi-Layer Perceptrons). MFLD-net can achieve competitive or better results in low data regimes while being lightweight and therefore suitable for embedded and mobile devices. Furthermore, we show that MFLD-net can achieve keypoint (landmark) estimation accuracies on-par or even better than some of the state-of-the-art (CNNs) on a fish image dataset. Additionally, unlike ViT, MFLD-net does…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWater Quality Monitoring Technologies · Ichthyology and Marine Biology · Fish Ecology and Management Studies
MethodsAttention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Adam · Softmax · Absolute Position Encodings · Dropout · Dense Connections · Convolution
