Resumen
Reinforcement learning (RL) has received much attention in recent years due to its adaptability to unpredictable events such as harvested energy and workload, especially in the context of edge computing for Internet-of-Things (IoT) nodes. Due to limited resources in IoT nodes, it is difficult to achieve self-adaptability. This paper studies online reactivity issues of fixed learning rate in the linear actor-critic (LAC) algorithm for transmission duty-cycle control. We propose the LAC-AB algorithm that introduces into the LAC algorithm an adaptive learning rate called Adam for actor update to achieve better adaptability. We introduce a definition of ?convergence? when quantitative analysis of convergence is performed. Simulation results using real-life one-year solar irradiance data indicate that, unlike the conventional setups of two decay rate β1,β2" role="presentation">??1,??2ß1,ß2
ß
1
,
ß
2
of Adam, smaller β1" role="presentation">??1ß1
ß
1
such as 0.2?0.4 are suitable for power-failure-sensitive applications and 0.5?0.7 for latency-sensitive applications with β2∈[0.1,0.3]" role="presentation">??2?[0.1,0.3]ß2?[0.1,0.3]
ß
2
?
[
0.1
,
0.3
]
. LAC-AB improves the time of reactivity by 68.5?88.1% in our application; it also fine-tunes the initial learning rate for the initial state and improves the time of fine-tuning by 78.2?84.3%, compared to the LAC. Besides, the number of power failures is drastically reduced to zero or a few occurrences over 300 simulations.