Let’s make an AI that destroys video games: Crash Course AI #13

Jabril: John Green Bot are you serious?! I made this game and you beat my high score? John-Green-bot: Pizza! Jabril: So John Green Bot is pretty good at Pizza Jump, but what about this new game we made, TrashBlaster? John-Green-bot: Hey, that’s me! Jabril:Yeah, let’s see watch you’ve got. John-Green-bot: That’s not fair, Jabril!! Jabril: It’s okay John Green Bot we’ve got you covered. Today we’re gonna design and build an AI
program to help you play this game like a pro. INTRO Hey, I’m Jabril and welcome to Crash Course
AI! Last time, we talked about some of the ways
that AI systems learn to play games. I’ve been playing video games for as long
as I can remember. They’re fun, challenging, and tell interesting
stories where the player gets to jump on goombas or build cities or cross the road or flap
a bird. But games are also a great way to test AI
techniques because they usually involve simpler worlds than the one we live in. Plus, games involve things that humans are
often pretty good at like strategy, planning, coordination, deception, reflexes, and intuition. Recently, AIs have become good at some tough
games, like Go or Starcraft II. So our goal today is to build an AI to play
a video game that our writing team and friends at Thought Cafe designed called TrashBlaster! The player’s goal in TrashBlaster is to
swim through the ocean as a little virtual John-Green-bot, and destroy pieces of trash. But we have to be careful, because if John-Green-bot
touches a piece of trash, then he loses and the game restarts. Like in previous labs, we’ll be writing
all of our code using a language called Python in a tool called Google Colaboratory. And as you watch this video, you can follow
along with the code in your browser from the link we put in the description. In these Colaboratory files, there’s some
regular text explaining what we’re trying to do, and pieces of code that you can run
by pushing the play button. These pieces of code build on each other,
so keep in mind that we have to run them in order from top to bottom, otherwise we might
get an error. To actually run the code and experiment with
changing it, you’ll have to either click “open in playground” at the top of the
page or open the File menu and click “Save a Copy to Drive”. And just an fyi: you’ll need a Google account
for this. So to create this game-playing AI system,
first, we need to build the game and set up everything like the rules and graphics. Second, we’ll need to think about how to
create a TrashBlaster AI model that can play the game and learn to get better. And third, we’ll need to train the model
and evaluate how well it works. Without a game, we can’t do anything. So we’ve got to start by generating all
the pieces of one. To start, we’re going to need to fill up
our toolbox by importing some helpful libraries, such as PyGame. The first step in 1.1 and 1.2 loads the libraries,
and step 1.3 saves the game so we can watch it later. This might take a second to download. The basic building blocks of any game are
different objects that interact with each other. There’s usually something or someone the
player controls and enemies that you battle — All these objects and their interactions
with one another need to be defined in the code. So to make TrashBlaster, we need to define
three objects and what they do: a blaster, a hero, and trash to destroy. The blaster is what actually destroys the
trash, so we’re going to load an image that looks like a laser-ball and set
some properties. How far does it go, what direction does it
fly, and what happens to the blast when it hits a piece of trash? Our hero is John-Green-bot, so now we’ve
got to load his image, and define properties like how fast he can swim and how a blast appears when he uses his blaster. And we need to load an image for the trash pieces, and then code how they move and what happens if they get hit by a
blast, like, for example, total destruction or splitting into 2 smaller pieces. Finally, all these objects are floating in
the ocean, so we need a piece of code to generate the background. The shape of this game’s ocean is toroidal,
which means it wraps around, and if any object flies off the screen to the right, then it
will immediately appear on the far left side. Every game needs some way to track how the player’s doing, so we’ll show the score too. Now that we have all the pieces in place,
we can actually build the game and decide how everything interacts. The key to how everything fits together is
the run function. It’s a loop of checking whether the game
is over; moving all the objects; updating the game; checking whether our hero is okay;
and making new trash. As long as our hero hasn’t bumped into any
trash, the game continues. That’s pretty much it for the game mechanics. We’ve created a hero, a blaster, trash,
and a scoreboard, and code that controls their interactions. Step 2 is modeling the AI’s brain so John-Green-bot
can play! And for that, we can turn back to our old
friend the neural network. When I play games, I try to watch for the
biggest threat because I don’t want to lose. So let’s program John-Green-bot to use a
similar strategy. For his neural network’s input layer, let’s
consider the 5 pieces of trash that are closest to his avatar. (And remember, the closest trash might actually
be on the other side of the screen!) Really, we want John-Green-bot to pay attention
to where the trash is and where it’s going. So we want the X and Y positions relative
to the hero, the X and Y velocities relative to the hero, and the size of each piece of
trash. That’s 5 inputs for 5 pieces of trash, so
our input layer is going to have 25 nodes. For the hidden layers, let’s start small
and create 2 layers with 15 nodes each. This is just a guess, so we can change it
later if we want. Because the output of this neural network
is gameplay, we want the output nodes to be connected to the movement of the hero and
shooting blasts. So there will be 5 nodes total: an X and Y
for movement, an X and Y direction for aiming the blaster, and whether or not to fire the
blaster. To start, the weights of the neural network
are initialized to 0, so the first time John-Green-bot plays he basically sits there and does nothing. To train his brain with regular supervised
learning, we’d normally say what the best action is at each timestep. But because losing TrashBlaster depends on
lots of collective actions and mistakes, not just one key moment, supervised learning might
not be the right approach for us. Instead, we’ll use reinforcement learning
strategies to train John-Green-bot based on all the moves he makes from the beginning
to the end of a game, and we’ll evolve a better AI using a genetic algorithm which
is commonly referred to as GA. To start, we’ll create some number of John-Green-bots
with empty brains (let’s say 200), and we’ll have them play
TrashBlaster. They’re all pretty terrible, but because
of luck, some will probably be a little bit less terrible. In biological evolution, parents pass on most
of their characteristics to their offspring when they reproduce. But the new generation may have some small
differences, or mutations. To replicate this, we’ll use code to take
the 100 highest-scoring John-Green-bots and clone each of them as our reproduction step. Then, we’ll slightly and randomly change
the weights in those 100 cloned neural networks, which is our mutation step. Right now, we’ll program a 5% chance that
any given weight will be mutated, and randomly choose how much that weight mutates (so it
could be barely any change or a huge one). And you could experiment with this if you
like. Mutation affects how much the AI changes overall,
so it’s a little bit like the learning rate that we talked about in previous episodes. We have to try and balance steadily improving
each generation with making big changes that might be really helpful (or harmful). After we’ve created these 100 mutant John-Green-bots,
we’ll combine them with the 100 unmutated original models (just in case the mutations
were harmful) and have them all play the game. Then we evaluate, clone, and mutate them over
and over again. Over time, the genetic algorithm usually makes
AI that are gradually better at whatever they’re being asked to do, like play TrashBlaster. This is because models with better mutations
will be more likely to score high and reproduce in the future. ALL of this stuff, from building John-Green-bot’s
neural network to defining mutation for our genetic algorithm, are in this section of
code. After setting up all that, we have to write
code to carefully define what doing “better” at the game means. Destroying a bunch of trash? Staying alive for a long time? Avoiding off-target blaster shots? Together, these decisions about what “better”
means define an AI model’s fitness. Programming this function is pretty much the
most important part of this lab, because how we define fitness will affect how John-Green-bot’s
AI will evolve. If we don’t carefully balance our fitness
function, his AI could end up doing some pretty weird things. For example, we could just define fitness
as how long the player stays alive, but then John-Green-bot’s AI might play TrashAvoider
and dodge trash instead of TrashBlaster and destroy trash. But if we define the fitness to only be related
to how many trash pieces are destroyed, we might get a wild hero that’s constantly
blasting. So, for now, I’m going to try a fitness
function that keeps the player alive and blasts trash. We’ll define the fitness as +1 for every
second that John-Green-bot stays alive, and +10 for every piece of trash that is zapped. But it’s not as fun if the AI just blasts
everywhere, so let’s also add a penalty of -2 for every blast he fires. The fitness for each John-Green-bot AI will
be updated continuously as he plays the game, and it’ll be shown on the scoreboard we
created earlier. You can take some time to play around with
this fitness function and watch how John-Green-bot’s AI can learn and evolve differently. Finally, we can move onto Step 3 and actually train John-Green-bot’s AI to blast some trash! So first, we need to start up our game. And to kick off the genetic algorithm, we
have to define how many randomly-wired John-Green-bot models we want in our starting population. Let’s stick with 200 for now. If we waited for each John-Green-bot model
to start, play, and lose the game… this training process could take DAYS. But because our computer can multitask, we
can use a multiprocessing package to make all 200 AI models play separate games at the
same time, which will be MUCH faster. And this is all part of the training. This is where we’ll code in the details
of the genetic algorithm, like sorting John-Green-bots by their fitness and choosing which ones will
reproduce. Now that we have the 100 John-Green-bots that
we want to reproduce, this code will clone and mutate them so we have that combined group
of 100 old and 100 mutant AI models. Then, we can run 200 more games for these
200 John-Green-bots. It just takes a few seconds to go through
them all thanks to that last chunk of code. And we can see how well they do! The average score of the AI models that we
picked to reproduce is almost twice as high as the overall average. Which is good! It means that the John-Green-bot is learning
something. We can even watch a replay of the best AI. Uh… even the best isn’t very exciting
right now. We can see the fitness function changing as
time passes, but the hero’s just sitting there not getting hit and shooting forward
– we want John-Green-bot to actually play, not just sit still and get lucky. We can also see visual representation of this
specific neural network, where higher weights are represented by the redness of the connections. It’s tough to interpret what exactly this
diagram means, but we can keep it in mind as we keep training John-Green-bot. Genetic algorithms take time to evolve a good
model. So let’s change the number of iterations
in the loop in STEP 3.3, and run the training step 10 times to repeatedly copy, mutate, and test the fitness of these AI models. Okay, now I’ve trained 10 more iterations. And if I view a replay of the last game, we
can see that John-Green-bot is doing a little better. He’s moving around a little and actually
sort of aiming. If we keep training, one model might get lucky,
destroy a bunch of trash, has a high fitness, and gets copied and mutated to make future
generations even better. But John-Green-bot needs lots of iterations
to get really good at TrashBlaster. You might consider changing the number of
iterations to 50 or 100 times per click… which might take a while. Now here’s an example of a game after 15,600
training iterations just look at John-Green-bot swimming and blasting trash like a pro. And all this was done using a genetic algorithm, raw luck, and a carefully crafted fitness function. Genetic algorithms tend to work pretty well
on small problems like getting good at TrashBlaster. When the problems get bigger, the random mutations
of genetic algorithms are sometimes… well, too random to create consistently good results. So part of the reason this works so well is
because John-Green-bot’s neural network is pretty tiny compared to many AIs created
for industrial-sized problems. But still, it’s fun to experiment with AI
and games like TrashBlaster. For example, you can try to change the values
of the fitness function and see how John-Green-bot’s AI evolves differently. Or you could change how the neural network
gets mutated, like by messing with the structure instead of the weights. Or you could change how much the run function
loops per second, from 5 times a second to 10 or 20, and give John-Green-bot superhuman
reflexes. You can download the clip of your AI playing
TrashBlaster by looking for game_animation.gif in the file browser on the left-hand side
of the Colaboratory file. You can also download source code from Github
to run on your own computer if you want to experiment (we’ll leave a link in the description). And next time, we’ll start shifting away
from games and learn about other ways that humans and AI can work together in teams. See ya then. Crash Course AI is produced in association
with PBS Digital Studios. If you want to help keep Crash Course free
for everyone, forever, you can join our community on Patreon. And if you want to learn more about genetics
and evolution check out Crash Course Biology.

Leave a Reply

Your email address will not be published. Required fields are marked *