Hands-On Reinforcement Learning with Python
上QQ阅读APP看书,第一时间看更新

Building a video game bot

Let's learn how to build a video game bot which plays a car racing game. Our objective is that the car has to move forward without getting stuck on any obstacles or hitting other cars.

First, we import the necessary libraries:

import gym
import universe # register universe environment
import random

Then we simulate our car racing environment using the make function:

env = gym.make('flashgames.NeonRace-v0')
env.configure(remotes=1) #automatically creates a local docker container

Let's create the variables for moving the car:

# Move left
left = [('KeyEvent', 'ArrowUp', True), ('KeyEvent', 'ArrowLeft', True),
('KeyEvent', 'ArrowRight', False)]

#Move right
right = [('KeyEvent', 'ArrowUp', True), ('KeyEvent', 'ArrowLeft', False),
('KeyEvent', 'ArrowRight', True)]

# Move forward
forward = [('KeyEvent', 'ArrowUp', True), ('KeyEvent', 'ArrowRight', False),
('KeyEvent', 'ArrowLeft', False), ('KeyEvent', 'n', True)]

We will initialize some other variables:

# We use turn variable for deciding whether to turn or not
turn = 0

# We store all the rewards in rewards list
rewards = []

#we will use buffer as some threshold
buffer_size = 100

#we will initially set action as forward, which just move the car forward #without any turn
action = forward

Now, let's allow our game agent to play in an infinite loop that continuously performs an action based on interaction with the environment:

while True:
turn -= 1
# Let us say initially we take no turn and move forward.
# We will check value of turn, if it is less than 0
# then there is no necessity for turning and we just move forward.
if turn <= 0:
action = forward
turn = 0

Then we use env.step() to perform an action (moving forward for now) for a one-time step:

 action_n = [action for ob in observation_n]
observation_n, reward_n, done_n, info = env.step(action_n)

For each time step, we record the results in the observation_n, reward_n, done_n, and info variables:

  • observation _n: State of the car
  • reward_n: Reward gained by the previous action, if the car successfully moves forward without getting stuck on obstacles
  • done_n: It is a Boolean; it will be set to true if the game is over
  • info_n: Used for debugging purposes

Obviously, an agent (car) cannot move forward throughout the game; it needs to take a turn, avoid obstacles, and will also hit other vehicles. But it has to determine whether it should take a turn and, if yes, then in which direction it should turn.

First, we will calculate the mean of the rewards we obtained so far; if it is 0 then it is clear that we got stuck somewhere while moving forward and we need to take a turn. Then again, which direction do we need to turn? Do you recollect the policy functions we studied in Chapter 1, Introduction to Reinforcement Learning.

Referring to the same concept, we have two policies here: one is turning left and the other is turning right. We will take a random policy here and calculate a reward and improve upon that.

We will generate a random number and if it is less than 0.5, then we will take a right, otherwise we will take a left. Later, we will update our rewards and, based on our rewards, we will learn which direction is best:

if len(rewards) >= buffer_size:
mean = sum(rewards)/len(rewards)

if mean == 0:
turn = 20
if random.random() < 0.5:
action = right
else:
action = left

rewards = []

Then, for each episode (say the game is over), we reinitialize the environment (start the game from the beginning) using the env.render():

  env.render()

The complete code is as follows:

import gym
import universe # register universe environment
import random

env = gym.make('flashgames.NeonRace-v0')
env.configure(remotes=1) # automatically creates a local docker container
observation_n = env.reset()

##Declare actions
#Move left
left = [('KeyEvent', 'ArrowUp', True), ('KeyEvent', 'ArrowLeft', True),
('KeyEvent', 'ArrowRight', False)]

#move right
right = [('KeyEvent', 'ArrowUp', True), ('KeyEvent', 'ArrowLeft', False),
('KeyEvent', 'ArrowRight', True)]

# Move forward
forward = [('KeyEvent', 'ArrowUp', True), ('KeyEvent', 'ArrowRight', False),
('KeyEvent', 'ArrowLeft', False), ('KeyEvent', 'n', True)]

#Determine whether to turn or not
turn = 0
#store rewards in a list
rewards = []
#use buffer as a threshold
buffer_size = 100
#initial action as forward
action = forward

while True:
turn -= 1
if turn <= 0:
action = forward
turn = 0
action_n = [action for ob in observation_n]
observation_n, reward_n, done_n, info = env.step(action_n)
rewards += [reward_n[0]]

if len(rewards) >= buffer_size:
mean = sum(rewards)/len(rewards)

if mean == 0:
turn = 20
if random.random() < 0.5:
action = right
else:
action = left

rewards = []

env.render()

If you run the program, you can see how the car learns to move without getting stuck or hitting other vehicles: