View on GitHub

Rl-Agents

Hits

Rl-Agents

SIRL - Space Invader RL

Q-learning is a simple yet quite powerful algorithm to create a cheat sheet for our agent. This helps the agent figure out exactly which action to perform.

But what if this cheatsheet is too long? Imagine an environment with 10,000 states and 1,000 actions per state. This would create a table of 10 million cells. Things will quickly get out of control!

It is pretty clear that we can’t infer the Q-value of new states from already explored states. This presents two problems:

First, the amount of memory required to save and update that table would increase as the number of states increases Second, the amount of time required to explore each state to create the required Q-table would be unrealistic

Model Summary

ms1

BO - Break Out RL

A longstanding goal of artificial intelligence is the development of algorithms capable of general competency in a variety of tasks and domains without the need for domain-specific tailoring. To this end, different theoretical frameworks have been proposed to formalize the notion of “big” artificial intelligence (e.g., Russell, 1997; Hutter, 2005; Legg, 2008). Similar ideas have been developed around the theme of lifelong learning: learning a reusable, highlevel understanding of the world from raw sensory data (Thrun & Mitchell, 1995; Pierce & Kuipers, 1997; Stober & Kuipers, 2008; Sutton et al., 2011). The growing interest in competitions such as the General Game Playing competition (Genesereth, Love, & Pell, 2005), Reinforcement Learning competition (Whiteson, Tanner, & White, 2010), and the International Planning competition (Coles et al., 2012) also suggests the artificial intelligence community’s desire for the emergence of algorithms that provide general competency.

Model Summary

ms1

Autonomous Taxi - Numpy Q-learning from Scratch

Q-Table is just a fancy name for a simple lookup table where we calculate the maximum expected future rewards for action at each state. Basically, this table will guide us to the best action at each state.

Each Q-table score will be the maximum expected future reward that the robot will get if it takes that action at that state. This is an iterative process, as we need to improve the Q-Table at each iteration.

But the questions are:

qalgo

FlappyDQN - Deeper Q- Learning

So, what are the steps involved in reinforcement learning using deep Q-learning networks (DQNs)?

All the past experience is stored by the user in memory The next action is determined by the maximum output of the Q-network The loss function here is mean squared error of the predicted Q-value and the target Q-value – Q*. This is basically a regression problem. However, we do not know the target or actual value here as we are dealing with a reinforcement learning problem. Going back to the Q-value update equation derived fromthe Bellman equation. we have: image

The section in green represents the target. We can argue that it is predicting its own value, but since R is the unbiased true reward, the network is going to update its gradient using backpropagation to finally converge.

Model Summary

dqn