TL;DR
UVid-Net is a novel CNN architecture that embeds temporal information directly into the encoder for UAV aerial video semantic segmentation, achieving higher accuracy and efficiency without additional computational modules.
Contribution
This work introduces UVid-Net, an encoder-decoder CNN that incorporates temporal data within the encoder, improving segmentation accuracy and efficiency for UAV videos.
Findings
Achieved mIoU of 0.79 on ManipalUAVid dataset.
Outperformed existing state-of-the-art algorithms.
Showed promising results with transfer learning on urban street scenes.
Abstract
Semantic segmentation of aerial videos has been extensively used for decision making in monitoring environmental changes, urban planning, and disaster management. The reliability of these decision support systems is dependent on the accuracy of the video semantic segmentation algorithms. The existing CNN based video semantic segmentation methods have enhanced the image semantic segmentation methods by incorporating an additional module such as LSTM or optical flow for computing temporal dynamics of the video which is a computational overhead. The proposed research work modifies the CNN architecture by incorporating temporal information to improve the efficiency of video semantic segmentation. In this work, an enhanced encoder-decoder based CNN architecture (UVid-Net) is proposed for UAV video semantic segmentation. The encoder of the proposed architecture embeds temporal information…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsConcatenated Skip Connection · Max Pooling · Convolution · *Communicated@Fast*How Do I Communicate to Expedia? · U-Net · Sigmoid Activation · Tanh Activation · Long Short-Term Memory
