A Survey on Transformers in Reinforcement Learning
Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Transformers are widely used in NLP and CV, mostly in supervised settings. Transformers are being used in reinforcement learning, but face unique design choices and challenges. This paper reviews motivations and progress on using Transformers in RL, provides a taxonomy, and discusses future prospects. Paper Content Introduction Reinforcement learning (RL) is a mathematical formalism for sequential decision-making RL can be used to acquire intelligent behaviors automatically Deep neural networks can be used to approximate functions with high capacity Deep reinforcement learning (DRL) has achieved tremendous developments in recent years Sample efficiency is an issue for DRL in real-world applications Inductive bias can be introduced into the DRL framework Choosing function approximator architectures is an important inductive bias Supervised learning (SL) has been used to motivate architecture for RL Convolutional neural networks (CNN) and recurrent neural networks (RNN) are common practices for DRL Transformer architecture has revolutionized learning paradigm across SL tasks Transformers have been applied to RL to extract relations between entities and capture multi-step temporal dependencies Offline RL has attracted attention due to its ability to leverage offline large-scale datasets Transformers can serve directly as a model for sequential decisions Transformer-based architectures often suffer from high computational and memory costs Problem scope Reinforcement learning Reinforcement Learning (RL) is a type of learning in a Markov Decision Process (MDP) RL aims to learn a policy to maximize the expected discounted return Topics in RL include meta RL, multi-task RL, and multi-agent RL Offline RL does not allow interaction with the environment during training Goal-conditioned RL extends the standard RL problem to goal-augmented setting Model-based RL learns an auxiliary dynamic model of the environment Transformers Transformer is a neural network for modeling sequential data Self-attention mechanism captures dependencies within long sequences Inputs, queries, keys, and values are mapped to linear transformations Output of self-attention layer is a weighted sum of all values Multi-head attention and residual connection help Transformers learn expressive representations and model long-term interactions Combination of transformers and rl Transformers can be used as a component for RL algorithms Transformers can also be used as a whole sequential decision-maker Network architecture in rl Early progress of network architecture design in RL has challenges Techniques of neural networks (e....