Resumen
In this paper, we use data from the Microsoft Kinect sensor that processes the captured imageof a person using and extracting the joints information on every frame. Then, we propose the creation ofan image derived from all the sequential frames of a gesture the movement, which facilitates training in aconvolutional neural network. We trained a CNN using two strategies: combined training and individualtraining. The strategies were experimented in the convolutional neural network (CNN) using theMSRC-12 dataset, obtaining an accuracy rate of 86.67% in combined training and 90.78% of accuracyrate in the individual training.. Then, the trained neural network was used to classify data obtained fromKinect with a person, obtaining an accuracy rate of 72.08% in combined training and 81.25% inindividualized training. Finally, we use the system to send commands to a mobile robot in order to controlit.