Resumen
This work focused on demonstrating the capability of unsupervised machine learning techniques in detecting impending anomalies by extracting hidden trends in the datasets of fuel economy and emissions of light-duty vehicles (LDVs), which consist of cars and light-duty trucks. This case study used the vehicles? fuel economy and emissions testing datasets for vehicle model years 2015 to 2023 with a total of 34,602 data samples on LDVs of major vehicle manufacturers. Three unsupervised techniques were used: principal components analysis (PCA), K-Means clustering, and self-organizing maps (SOM). Results show that there are clusters of data that exhibit trends not represented by the dataset as a whole. Fuel CO vs. Fuel Economy has a negative correlation in the whole dataset (r = -0.355 for LDVs model year 2022), but it has positive correlations in certain sample clusters (e.g., LDVs model year 2022: r = +0.62 in a K-Means cluster where the slope is around 0.347 g-CO/mi/MPG). A time series analysis of the results of clustering indicates that Test Procedure and Fuel Type, specifically Test Procedure 11 and Fuel Type 26 as defined by the US EPA, could be the contributors to the positive correlation of CO and Fuel Economy. This detected peculiar trend of CO-vs.-Fuel Economy is an impending anomaly, as the use of Fuel 26 in emissions testing with Test Procedure 11 of US-EPA has been increasing through the years. With the finding that the clustered data samples with positive CO-vs.-Fuel Economy correlation all came from vehicle manufacturers that independently conduct the standard testing procedures and not data from US-EPA testing centers, it was concluded that the chemistry of using Fuel 26 in performing Test Procedure 11 should be re-evaluated by US-EPA.