Long-Term Visitation Value for Deep Exploration in Sparse-Reward Reinforcement Learning

Simone Parisi

Davide Tateo

Maximilian Hensel

Carlo D?Eramo

Jan Peters and Joni Pajarinen

Resumen

Reinforcement learning with sparse rewards is still an open challenge. Classic methods rely on getting feedback via extrinsic rewards to train the agent, and in situations where this occurs very rarely the agent learns slowly or cannot learn at all. Similarly, if the agent receives also rewards that create suboptimal modes of the objective function, it will likely prematurely stop exploring. More recent methods add auxiliary intrinsic rewards to encourage exploration. However, auxiliary rewards lead to a non-stationary target for the Q-function. In this paper, we present a novel approach that (1) plans exploration actions far into the future by using a long-term visitation count, and (2) decouples exploration and exploitation by learning a separate function assessing the exploration value of the actions. Contrary to existing methods that use models of reward and dynamics, our approach is off-policy and model-free. We further propose new tabular environments for benchmarking exploration in reinforcement learning. Empirical results on classic and novel benchmarks show that the proposed approach outperforms existing methods in environments with sparse rewards, especially in the presence of rewards that create suboptimal modes of the objective function. Results also suggest that our approach scales gracefully with the size of the environment.

Palabras claves

reinforcement learning - sparse reward - exploration - upper confidence bound - off-policy

Acceso

PÁGINAS

pp. 0 - 0

NÚMERO

Volumen: 15 Parte: 3 (2022)

MATERIAS

INGENIERÍA Y CONSTRUCCIÓN CIVIL
TECNOLOGÍA

REVISTAS SIMILARES

Aerospace
Algorithms
Applied Sciences

DOI

https://doi.org/10.3390/a15030081

Artículos similares

Impact-Angle Constraint Guidance and Control Strategies Based on Deep Reinforcement Learning

Acceso

Junfang Fan, Denghui Dou and Yi Ji

In this study, two different impact-angle-constrained guidance and control strategies using deep reinforcement learning (DRL) are proposed. The proposed strategies are based on the dual-loop and integrated guidance and control types. To address comprehen... ver más

Revista: Aerospace

Evolutionary Game Analysis of Shared Manufacturing Quality Synergy under Dynamic Reward and Punishment Mechanism

Acceso

Ziming Zhang, Xinping Wang, Chang Su and Linhui Sun

Quality improvement is crucial for manufacturing, and existing research has paid less attention to the influence of regulatory factors and irrational factors of decision makers. Considering the impact of the reward and punishment strategy of the shared p... ver más

Revista: Applied Sciences

Scheduling of AGVs in Automated Container Terminal Based on the Deep Deterministic Policy Gradient (DDPG) Using the Convolutional Neural Network (CNN)

Acceso

Chun Chen, Zhi-Hua Hu and Lei Wang

In order to improve the horizontal transportation efficiency of the terminal Automated Guided Vehicles (AGVs), it is necessary to focus on coordinating the time and space synchronization operation of the loading and unloading of equipment, the transporta... ver más

Revista: Journal of Marine Science and Engineering

APPROACH TO BUILDING A GLOBAL MOBILE AGENT WAY BASED ON Q-LEARNING

Acceso

Vitalii Martovytskyi, Oleksandr Ivaniuk Pág. 43 - 51

Today, the problem of navigation of autonomous mobile systems in a space where disturbances are possible is urgent. The task of finding a route for a mobile robot is a complex and non-trivial task. At the moment, there are many algorithms that allow you ... ver más

Revista: Innovative technologies and scientific solutions for industries

Revistas destacadas

Acceso directo a los números publicados en la revista Infrastructures

Infrastructures

Acceso directo a los números publicados en la revista Informed Infraestructure

Informed Infraestructure

Acceso directo a los números publicados en la revista BiT

Acceso directo a los números publicados en la revista Revista de la Construcción

Revista de la Construcción

Ver todas las revistas