Resumen
Contrastive learning, as an unsupervised technique, has emerged as a prominent method in time series representation learning tasks, serving as a viable solution to the scarcity of annotated data. However, the application of data augmentation methods during training can distort the distribution of raw data. This discrepancy between the representations learned from augmented data in contrastive learning and those obtained from supervised learning results in an incomplete understanding of the information contained in the real data from the trained encoder. We refer to this as the data augmentation bias (DAB), representing the disparity between the two sets of learned representations. To mitigate the influence of DAB, we propose a DAB-aware contrastive learning framework for time series representation (DABaCLT). This framework leverages a raw features stream (RFS) to extract features from raw data, which are then combined with augmented data to create positive and negative pairs for DAB-aware contrastive learning. Additionally, we introduce a DAB-minimizing loss function (DABMinLoss) within the contrasting module to minimize the DAB of the extracted temporal and contextual features. Our proposed method is evaluated on three time series classification tasks, including sleep staging classification (SSC) and epilepsy seizure prediction (ESP) based on EEG and human activity recognition (HAR) based on sensors signals. The experimental results demonstrate that our DABaCLT achieves strong performance in self-supervised time series representation, 0.19% to 22.95% accuracy improvement for SSC, 2.96% to 5.05% for HAR, 1.00% to 2.46% for ESP, and achieves comparable performance to the supervised approach. The source code for our framework is open-source.