Inicio  /  Algorithms  /  Vol: 17 Par: 2 (2024)  /  Artículo
ARTÍCULO
TITULO

Learning State-Specific Action Masks for Reinforcement Learning

Ziyi Wang    
Xinran Li    
Luoyang Sun    
Haifeng Zhang    
Hualin Liu and Jun Wang    

Resumen

Efficient yet sufficient exploration remains a critical challenge in reinforcement learning (RL), especially for Markov Decision Processes (MDPs) with vast action spaces. Previous approaches have commonly involved projecting the original action space into a latent space or employing environmental action masks to reduce the action possibilities. Nevertheless, these methods often lack interpretability or rely on expert knowledge. In this study, we introduce a novel method for automatically reducing the action space in environments with discrete action spaces while preserving interpretability. The proposed approach learns state-specific masks with a dual purpose: (1) eliminating actions with minimal influence on the MDP and (2) aggregating actions with identical behavioral consequences within the MDP. Specifically, we introduce a novel concept called Bisimulation Metrics on Actions by States (BMAS) to quantify the behavioral consequences of actions within the MDP and design a dedicated mask model to ensure their binary nature. Crucially, we present a practical learning procedure for training the mask model, leveraging transition data collected by any RL policy. Our method is designed to be plug-and-play and adaptable to all RL policies, and to validate its effectiveness, an integration into two prominent RL algorithms, DQN and PPO, is performed. Experimental results obtained from Maze, Atari, and μ" role="presentation">??µ µ RTS2 reveal a substantial acceleration in the RL learning process and noteworthy performance improvements facilitated by the introduced approach.

 Artículos similares

       
 
Mingxin Zou, Yanqing Zhou, Xinhua Jiang, Julin Gao, Xiaofang Yu and Xuelei Ma    
Field manual labor behavior recognition is an important task that applies deep learning algorithms to industrial equipment for capturing and analyzing people?s behavior during field labor. In this study, we propose a field manual labor behavior recogniti... ver más
Revista: Applied Sciences

 
Hui-Jun Kim, Jung-Soon Kim and Sung-Hee Kim    
The existing question-and-answer screening test has a limitation in that test accuracy varies due to a high learning effect and based on the inspector?s competency, which can have consequences for rapid-onset cognitive-related diseases. To solve this pro... ver más
Revista: Applied Sciences

 
Shiplu Das, Sanjoy Pratihar, Buddhadeb Pradhan, Rutvij H. Jhaveri and Francesco Benedetto    
The main purpose of a detection system is to ascertain the state of an individual?s eyes, whether they are open and alert or closed, and then alert them to their level of fatigue. As a result of this, they will refrain from approaching an accident site. ... ver más
Revista: Information

 
Siyao Lu, Rui Xu, Zhaoyu Li, Bang Wang and Zhijun Zhao    
The International Lunar Research Station, to be established around 2030, will equip lunar rovers with robotic arms as constructors. Construction requires lunar soil and lunar rovers, for which rovers must go toward different waypoints without encounterin... ver más
Revista: Aerospace

 
Jin Wang, Peng Zhao, Zhe Zhang, Ting Yue, Hailiang Liu and Lixin Wang    
The upset state is an unexpected flight state, which is characterized by an unintentional deviation from normal operating parameters. It is difficult for the pilot to recover the aircraft from the upset state accurately and quickly. In this paper, an ups... ver más
Revista: Aerospace