Mobile_ViT: Underwater Acoustic Target Recognition Method Based on Local?Global Feature Fusion

Haiyang Yao

Tian Gao

Yong Wang

Haiyan Wang and Xiao Chen

Resumen

To overcome the challenges of inadequate representation and ineffective information exchange stemming from feature homogenization in underwater acoustic target recognition, we introduce a hybrid network named Mobile_ViT, which synergizes MobileNet and Transformer architectures. The network begins with a convolutional backbone incorporating an embedded coordinate attention mechanism to enhance the local details of inputs. This mechanism captures the long-term temporal dependencies and precise frequency?domain relationships of signals, focusing the features on the time?frequency positions. Subsequently, the Transformer?s Encoder is integrated at the end of the backbone to facilitate global characterization, thus effectively overcoming the convolutional neural network?s shortcomings in capturing long-range feature dependencies. Evaluation on the Shipsear and DeepShip datasets yields accuracies of 98.50% and 94.57%, respectively, marking a substantial improvement over the baseline. Notably, the proposed method also demonstrates obvious separation coefficients, signifying enhanced clustering effectiveness, and is lighter than other Transformers.

Palabras claves

underwater acoustic target recognition - attention mechanism - feature fusion - MobileNet - Transformer

Acceso

PÁGINAS

pp. 0 - 0

NÚMERO

Volumen: 12 Parte: 4 (2024)

MATERIAS

INGENIERÍA Y CONSTRUCCIÓN CIVIL

DOI

https://doi.org/10.3390/jmse12040589

Mobile_ViT: Underwater Acoustic Target Recognition Method Based on Local?Global Feature Fusion

Revistas destacadas