Resumen
Moving object tracking techniques using machine and deep learning require large datasets for neural model training. New strategies need to be invented that utilize smaller data training sizes to realize the impact of large-sized datasets. However, current research does not balance the training data size and neural parameters, which creates the problem of inadequacy of the information provided by the low visual data content for parameter optimization. To enhance the performance of moving object tracking that appears in only a few frames, this research proposes a deep learning model using an abundant encoder?decoder (a high-resolution transformer (HRT) encoder?decoder). An HRT encoder?decoder employs feature map extraction that focuses on high resolution feature maps that are more representative of the moving object. In addition, we employ the proposed HRT encoder?decoder for feature map extraction and fusion to reimburse the few frames that have the visual information. Our extensive experiments on the Pascal DOC19 and MS-DS17 datasets have implied that the HRT encoder?decoder abundant model outperforms those of previous studies involving few frames that include moving objects.