Resumen
3D mesh as a complex data structure can provide effective shape representation for 3D objects, but due to the irregularity and disorder of the mesh data, it is difficult for convolutional neural networks to be directly applied to 3D mesh data processing. At the same time, the extensive use of convolutional kernels and pooling layers focusing on local features can cause the loss of spatial information and dependencies of low-level features. In this paper, we propose a self-attentive convolutional network MixFormer applied to 3D mesh models. By defining 3D convolutional kernels and vector self-attention mechanisms applicable to 3D mesh models, our neural network is able to learn 3D mesh model features. Combining the features of convolutional networks and transformer networks, the network can focus on both local detail features and long-range dependencies between features, thus achieving good learning results without stacking multiple layers and saving arithmetic overhead compared to pure transformer architectures. We conduct classification and semantic segmentation experiments on SHREC15, SCAPE, FAUST, MIT, and Adobe Fuse datasets. Experimental results show that the network can achieve 96.7% classification and better segmentation results by using fewer parameters and network layers.