Build a Reinforcement Learning Terran Agent with PySC2 2.0 framework

USEFUL LINKS FOR THE TUTORIAL

BEFORE STARTING

Through this tutorial, you will build a Smart Terran Agent capable of learning to develop a better strategy over time through a reward system based on the actions it tooks and the state resulting from those.

  • Pandas python library
  • Numpy python library
  • Basic python programming skills

1.- IMPORTS

Import the libraries at the top of the file as following:

2.- CREATE A QTABLE FOR REINFORCMENT LEARNING

We must define the algorithm for our Machine Learning Agent, there is where the QlearningTable comes to action, it is a simplified version of reinforcement learning.

Choose Action Method

The main method of the learning table is choosing the action to perform, here with the e_greedy parameter means it will choose 90% of times the preferred action and the 10% of the time it will choose randomly for exploring extra possibilities of having a random action.

Learn Method

The next important method here is learn. It takes as parameters:

  • a is that action that was performed in that state
  • r is the reward that was received after taking the action
  • s_ is the state the bot landed in after taking the action

3.- DEFINE A BASE AGENT

Then we must define a Base Agent, an agent that both our random and learning agent will use. It has all of the actions that can be done and a few other methods that both agents can share to make them a little simpler.

Helper Functions

To perform these actions the bot will need helper functions described below:

Returns a specific set of units of the army (applies for buildings and troops)
Returns the units that are finished and not the ones that are being created (applies for buildings and troops)
Calculate the distances between a list of units and a specified point

Specific Actions

And finally the actions that can be done are:

The method that will send an idle SCV back to a mineral patch
Generate buildings (for further and deep reinforcement learning you can consider to add more type of buildings)
Create an army of marines and sending them to attack
Step method to know where our base is placed and do nothing (no operation to perform)

4.- RANDOM AGENT

We choose an action at random here from our predefined list, and then we use Python’s getattr which essentially converts the action name into a method call, and passes in the observation as an argument.

5.- SMART AGENT

The Smart Agent is like the random agent but having more machine learning stuff because here is where we initialize the QLearning Table once the Agent is created. It takes the actions of the Base Agent and that is how the QLearningTable knows those are the actions that it can choose from and then perform.

New Game Method

Here we start a new game once the current it is finished by simply initializing some values like where the base is and the previous state and action.

Get State Method

Essentially takes all the values of the game that we can find useful and important for example how many barracks, or supplies, or idle svcs we have, and then returning those in a tuple that can feed into our machine learning algorithm and it can know which is the current state of the game at certain point in game.

If you want to add more type of units or buildings and store its value, you can check the links at the top to find its function related

Step Action Method

It gets the current state of the game, it chooses and action, so it feeds the state into the QLearningTable and then chooses the best action or an action at random for finally return it.

6.- MAIN METHOD

At the end we have the method that runs the game to see what happens in real time. Here we create the SmartAgent and our RandomAgent, set those as the players, pass those also in the run loop to control both agents instead of one and then, once it starts it will open 2 windows, one for each agent.

CONCLUSIONS

When you initially run this, both agents will do pretty much the same random actions, training some marines and attacking, building randomly or any other weird stuff, but this is just because the smart agent is still not smart enough.

References