Lightweight Transformers make strong encoders for underwater object detection
Article Ecrit par: Cui, Jinrong ; Liu, Hailong ; Zhong, Haowei ; Huang, Cheng ;
Résumé: Underwater object detection methods are widely used in ocean exploration tasks, and precise center localization can help users find objects of interest accurately and quickly. In recent years, the underwater detector based on convolutional neural networks (CNNs) has achieved great success. However, due to the locality of convolution, the detector based on CNNs is usually difficult to explicitly model the long-term dependence. In addition, Transformers can obtain global context, but it will seriously reduce the inference speed of the detector, because Transformers need a lot of memory and computation. In this paper, we propose CSPTCenterNet underwater detector, which uses a proposed lightweight Transformers to extract global context, so as to improve the performance of the detector while maintaining real-time detection. And we fuse the encoded feature maps with the high-resolution feature maps in the backbone network in the upsampling stage to increase the spatial details that Transformers lack. Finally, we use GIoU loss and multi-samples strategy to train the network to enhance the accurate regression ability of the detector. Extensive experiments on the underwater dataset and the PASCAL VOC dataset demonstrate the effectiveness of our proposed method. And our method achieves the best detection performance while achieving inference speed 2 to 10 times faster than other state-of-the-art methods.
Langue:
Anglais