Awesome Reinforcement Learning
A curated list of resources dedicated to reinforcement learning.
We have pages for other topics: awesomernn, awesomedeepvision, awesomerandomforest
Maintainers: Hyunsoo Kim, Jiwon Kim
We are looking for more contributors and maintainers!
Contributing
Please feel free to pull requests
Table of Contents
Codes
 Codes for examples and exercises in Richard Sutton and Andrew Barto's Reinforcement Learning: An Introduction
 Simulation code for Reinforcement Learning Control Problems
 MATLAB Environment and GUI for Reinforcement Learning
 Reinforcement Learning Repository  University of Massachusetts, Amherst
 BrownUMBC Reinforcement Learning and Planning Library (Java)
 Reinforcement Learning in R (MDP, Value Iteration)
 Reinforcement Learning Environment in Python and MATLAB
 RLGlue (standard interface for RL) and RLGlue Library
 PyBrain Library  PythonBased Reinforcement learning, Artificial intelligence, and Neural network
 Maja  Machine learning framework for problems in Reinforcement Learning in python
 TeachingBox  Java based Reinforcement Learning framework
 Implementation of RL algorithms in Python/C++
 Policy Gradient Reinforcement Learning Toolbox for MATLAB
 PIQLE  Platform Implementing QLEarning and other RL algorithms
 BeliefBox  Bayesian reinforcement learning library and toolkit
 Deep QLearning with Tensor Flow  A deep Q learning demonstration using Google Tensorflow
Theory
Lectures
 [UCL] COMPM050/COMPGI13 Reinforcement Learning by David Silver
 [UC Berkeley] CS188 Artificial Intelligence by Pieter Abbeel
 [Udacity (Georgia Tech.)] Machine Learning 3: Reinforcement Learning (CS7641)
 [Stanford] CS229 Machine Learning  Lecture 16: Reinforcement Learning by Andrew Ng
Books
 Richard Sutton and Andrew Barto, Reinforcement Learning: An Introduction [Book] [Code]
 Csaba Szepesvari, Algorithms for Reinforcement Learning [Book]
 David Poole and Alan Mackworth, Artificial Intelligence: Foundations of Computational Agents [Book Chapter]
 Dimitri P. Bertsekas and John N. Tsitsiklis, NeuroDynamic Programming [Book (Amazon)] [Summary]
 Mykel J. Kochenderfer, Decision Making Under Uncertainty: Theory and Application [Book (Amazon)]
Surveys
 Leslie Pack Kaelbling, Michael L. Littman, Andrew W. Moore, Reinforcement Learning: A Survey, JAIR, 1996. [Paper]
 S. S. Keerthi and B. Ravindran, A Tutorial Survey of Reinforcement Learning, Sadhana, 1994. [Paper]
 Jens Kober, J. Andrew Bagnell, Jan Peters, Reinforcement Learning in Robotics, A Survey, IJRR, 2013. [Paper]
 Littman, Michael L. "Reinforcement learning improves behaviour from evaluative feedback." Nature 521.7553 (2015): 445451. [Paper]
 Marc P. Deisenroth, Gerhard Neumann, Jan Peter, A Survey on Policy Search for Robotics, Foundations and Trends in Robotics, 2014. [Book]
Papers / Thesis

Foundational Papers
 Marvin Minsky, Steps toward Artificial Intelligence, Proceedings of the IRE, 1961. [Paper]
 discusses issues in RL such as the "credit assignment problem"
 Ian H. Witten, An Adaptive Optimal Controller for DiscreteTime Markov Environments, Information and Control, 1977. [Paper]
 earliest publication on temporaldifference (TD) learning rule.
 Marvin Minsky, Steps toward Artificial Intelligence, Proceedings of the IRE, 1961. [Paper]

Methods
 Dynamic Programming (DP):
 Christopher J. C. H. Watkins, Learning from Delayed Rewards, Ph.D. Thesis, Cambridge University, 1989. [Thesis]
 Monte Carlo:
 TemporalDifference:
 Richard S. Sutton, Learning to predict by the methods of temporal differences. Machine Learning 3: 944, 1988. [Paper]
 QLearning (Offpolicy TD algorithm):
 Chris Watkins, Learning from Delayed Rewards, Cambridge, 1989. [Thesis]
 Sarsa (Onpolicy TD algorithm):
 RLearning (learning of relative values)
 Andrew Schwartz, A Reinforcement Learning Method for Maximizing Undiscounted Rewards, ICML, 1993. [PaperGoogle Scholar]
 Function Approximation methods (LeastSqaure Temporal Difference, LeastSqaure Policy Iteration)
 Policy Search / Policy Gradient
 Richard Sutton, David McAllester, Satinder Singh, Yishay Mansour, Policy Gradient Methods for Reinforcement Learning with Function Approximation, NIPS, 1999. [Paper]
 Jan Peters, Sethu Vijayakumar, Stefan Schaal, Natural ActorCritic, ECML, 2005. [Paper]
 Jens Kober, Jan Peters, Policy Search for Motor Primitives in Robotics, NIPS, 2009. [Paper]
 Jan Peters, Katharina Mulling, Yasemin Altun, Relative Entropy Policy Search, AAAI, 2010. [Paper]
 Freek Stulp, Olivier Sigaud, Path Integral Policy Improvement with Covariance Matrix Adaptation, ICML, 2012. [Paper]
 Nate Kohl, Peter Stone, Policy Gradient Reinforcement Learning for Fast Quadrupedal Locomotion, ICRA, 2004. [Paper]
 Marc Deisenroth, Carl Rasmussen, PILCO: A ModelBased and DataEfficient Approach to Policy Search, ICML, 2011. [Paper]
 Scott Kuindersma, Roderic Grupen, Andrew Barto, Learning Dynamic Arm Motions for Postural Recovery, Humanoids, 2011. [Paper]
 Hierarchical RL
 Deep Learning + Reinforcement Learning (A sample of recent works on DL+RL)
 V. Mnih, et. al., Humanlevel Control through Deep Reinforcement Learning, Nature, 2015. [Paper]
 Xiaoxiao Guo, Satinder Singh, Honglak Lee, Richard Lewis, Xiaoshi Wang, Deep Learning for RealTime Atari Game Play Using Offline MonteCarlo Tree Search Planning, NIPS, 2014. [Paper]
 Sergey Levine, Chelsea Finn, Trevor Darrel, Pieter Abbeel, EndtoEnd Training of Deep Visuomotor Policies. ArXiv, 16 Oct 2015. [ArXiv]
 Tom Schaul, John Quan, Ioannis Antonoglou, David Silver, Prioritized Experience Replay, ArXiv, 18 Nov 2015. [ArXiv]
 Hado van Hasselt, Arthur Guez, David Silver, Deep Reinforcement Learning with Double QLearning, ArXiv, 22 Sep 2015. [ArXiv]
 Dynamic Programming (DP):
Applications
Game Playing

Traditional Games

Computer Games
 Humanlevel Control through Deep Reinforcement Learning (Mnih, Nature 2015) [Paper] [Code] [Video]
 Flappy Bird Reinforcement Learning [Video]
 MarI/O  learning to play Mario with evolutionary reinforcement learning using artificial neural networks (Stanley, Evolutionary Computation 2002) [Paper][Video]
Robotics
 Policy Gradient Reinforcement Learning for Fast Quadrupedal Locomotion (Kohl, ICRA 2004) [Paper]
 Robot Motor SKill Coordination with EMbased Reinforcement Learning (Kormushev, IROS 2010) [Paper] [Video]
 Generalized Model Learning for Reinforcement Learning on a Humanoid Robot (Hester, ICRA 2010) [Paper] [Video]
 Autonomous Skill Acquisition on a Mobile Manipulator (Konidaris, AAAI 2011) [Paper] [Video]
 PILCO: A ModelBased and DataEfficient Approach to Policy Search (Deisenroth, ICML 2011) [Paper]
 Incremental Semantically Grounded Learning from Demonstration (Niekum, RSS 2013) [Paper]
 Efficient Reinforcement Learning for Robots using Informative Simulated Priors (Cutler, ICRA 2015) [Paper] [Video]
Control
 An Application of Reinforcement Learning to Aerobatic Helicopter Flight (Abbeel, NIPS 2006) [Paper] [Video]
 Autonomous helicopter control using Reinforcement Learning Policy Search Methods (Bagnell, ICRA 2011) [Paper]
Operations Research
 Scaling Averagereward Reinforcement Learning for Product Delivery (Proper, AAAI 2004) [Paper]
 Cross Channel Optimized Marketing by Reinforcement Learning (Abe, KDD 2004) [Paper]
Human Computer Interaction
 Optimizing Dialogue Management with Reinforcement Learning: Experiments with the NJFun System (Singh, JAIR 2002) [Paper]
Tutorials / Websites
 Mance Harmon and Stephanie Harmon, Reinforcement Learning: A Tutorial
 Short introduction to some Reinforcement Learning algorithms
 C. Igel, M.A. Riedmiller, et al., Reinforcement Learning in a Nutshell, ESANN, 2007. [Paper]
 UNSW  Reinforcement Learning
 ROS Reinforcement Learning Tutorial
 POMDP for Dummies
 Scholarpedia articles on:
 Repository with useful MATLAB Software, presentations, and demo videos
 Bibliography on Reinforcement Learning
 UC Berkeley  CS 294: Deep Reinforcement Learning, Fall 2015 (John Schulman, Pieter Abbeel) [Class Website]
 Blog posts on Reinforcement Learning, Parts 14 by Travis DeWolf
Online Demos
 Realworld demonstrations of Reinforcement Learning
 Deep QLearning Demo  A deep Q learning demonstration using ConvNetJS
 Deep QLearning with Tensor Flow  A deep Q learning demonstration using Google Tensorflow