Inicio  /  Future Internet  /  Vol: 14 Par: 5 (2022)  /  Artículo
ARTÍCULO
TITULO

Missing Data Imputation in the Internet of Things Sensor Networks

Benjamin Agbo    
Hussain Al-Aqrabi    
Richard Hill and Tariq Alsboui    

Resumen

The Internet of Things (IoT) has had a tremendous impact on the evolution and adoption of information and communication technology. In the modern world, data are generated by individuals and collected automatically by physical objects that are fitted with electronics, sensors, and network connectivity. IoT sensor networks have become integral aspects of environmental monitoring systems. However, data collected from IoT sensor devices are usually incomplete due to various reasons such as sensor failures, drifts, network faults and various other operational issues. The presence of incomplete or missing values can substantially affect the calibration of on-field environmental sensors. The aim of this study is to identify efficient missing data imputation techniques that will ensure accurate calibration of sensors. To achieve this, we propose an efficient and robust imputation technique based on k-means clustering that is capable of selecting the best imputation technique for missing data imputation. We then evaluate the accuracy of our proposed technique against other techniques and test their effect on various calibration processes for data collected from on-field low-cost environmental sensors in urban air pollution monitoring stations. To test the efficiency of the imputation techniques, we simulated missing data rates at 10?40% and also considered missing values occurring over consecutive periods of time (1 day, 1 week and 1 month). Overall, our proposed BFMVI model recorded the best imputation accuracy (0.011758 RMSE for 10% missing data and 0.169418 RMSE at 40% missing data) compared to the other techniques (kNearest-Neighbour (kNN), Regression Imputation (RI), Expectation Maximization (EM) and MissForest techniques) when evaluated using different performance indicators. Moreover, the results show a trade-off between imputation accuracy and computational complexity with benchmark techniques showing a low computational complexity at the expense of accuracy when compared with our proposed technique.

Palabras claves

 Artículos similares

       
 
Christos Tzimopoulos, Kyriakos Papadopoulos, Nikiforos Samarinas, Basil Papadopoulos and Christos Evangelides    
In this work, a novel fuzzy FEM (Finite Elements Method) numerical solution describing the recession flow in unconfined aquifers is proposed. In general, recession flow and drainage problems can be described by the nonlinear Boussinesq equation, while th... ver más
Revista: Hydrology

 
Menna Ibrahim Gabr, Yehia Mostafa Helmy and Doaa Saad Elzanfaly    
Data completeness is one of the most common challenges that hinder the performance of data analytics platforms. Different studies have assessed the effect of missing values on different classification models based on a single evaluation metric, namely, a... ver más

 
Xing Su, Wenjie Sun, Chenting Song, Zhi Cai and Limin Guo    
With the rapid development of the economy, car ownership has grown rapidly, which causes many traffic problems. In recent years, intelligent transportation systems have been used to solve various traffic problems. To achieve effective and efficient traff... ver más

 
Li Cai, Cong Sha, Jing He and Shaowen Yao    
Traffic flows (e.g., the traffic of vehicles, passengers, and bikes) aim to reveal traffic flow phenomena generated by traffic participants in traffic activities. Various studies of traffic flows rely heavily on high-quality traffic data. The taxi GPS tr... ver más

 
Hatef Dastour and Quazi K. Hassan    
Having a complete hydrological time series is crucial for water-resources management and modeling. However, this can pose a challenge in data-scarce environments where data gaps are widespread. In such situations, recurring data gaps can lead to unfavora... ver más
Revista: Hydrology