By far the biggest programming project that I pursued outside of school was creating a bot that could learn to play Rocket League by analyzing replays of human gameplay.

A close match between my bot and a Psyonix Rookie bot.

Rocket League is a competitive physics-based game of car soccer with a very high skill ceiling. Even experienced players struggle to perform well consistently, which is one reason why the vast majority of bot creators opt for defined routines rather than machine learning. When I started, I was not aware that many brilliant members of the RLBot community had already attempted this and found human replay data to be insufficient for model training.

Thus, I began writing scripts to collect and process thousands of publicly available replay files, and after months of programming and tinkering with neural networks, I feel I have seen some encouraging results. So this is where I will try to explain the approach I took, documenting my failures, successes, and work that remains to be done.

If you want to skip straight to the code, you can find it in this repository.

Data Collection

Initially, my ambitous self wanted to gather as many of my own replays as I could. I hoped to create a bot that could perform at the rank of grand champion. However, I quickly realized a problem.

Whenever the bot inevitably makes a poor decision, it will likely end up in a situation that a GC player would never find himself in, and thus it would struggle to recover. This sort of scenario occurred in the above video, at around the 6:40 mark. It drives too far into the goal and lands awkwardly. It happens. The problem is that it then spends a lot of time powersliding for no apparent reason, rather than defending.

To limit this sort of thing, I saw two necessary changes. First, I needed as much data as possible. The more matches at our disposal, the more situations we are able to recover from. Secondly, I decided to lower my standards and gather replays from a much lower rank set like silver. I figured that the increase in diversity would be worth the additional player mistakes.

The obvious source of this data was ballchasing.com, where users can upload their replays for future analysis. Using the API, I discovered that hundreds of silver 1v1 matches were being added every day, and I wrote a couple Python scripts to download them automatically until reaching the hourly limit. I currently have just over 15,000 replay files stored on a dedicated microSD card.

Replay Processing

Next came the very important step of figuring out what data I needed and how to extract it. I could not make any sense of the replay file format, but thankfully there are people who've made parsers like Rattletrap that convert them to javascript object notation. Slowly but surely, I learned how things were represented and developed additional Python scripts to organize the data with class structures and arrays.

At this point I realized how limited the replays actually are. The data is saved at a resolution of 30 frames per second, which might be acceptable if every frame actually contained all the physics information. Unfortunately, one car may have a new state defined at frame 189, while the other car does not. These gaps present a big problem, and right now my only solution is simple linear interpolation—which really isn't so simple with quaternions and rotation matrices.



The next great obstacle came when discovering that replays do not actually contain all the controller inputs from the player. I found the data for boost, jump, steering, and throttle—everything was going well. I just needed to find the roll, pitch, and yaw inputs for aerials. Turns out those don't exist.

Thankfully Sam Mish figured out a way to solve for those inputs based on the change in angular velocities. I was able to implement his solution and test it in Rocket League by giving a bot a script of inputs to follow and seeing if they matched the replay. It wasn't perfect, but maybe it would be good enough. I wanted to get on with training.

Model Training

I set up my replay processing script to run in parallel and concatenated the rows into a series of numpy arrays with roughly 30 input columns and 7 output columns. The input columns describe the state of the game for a particular frame: ball position, player rotations, velocities, boost amounts, and the like. The output columns are what the player's inputs were for the next frame. That may sound contradictory, but controller inputs are the outputs that we want.

After following some quick multi-output regression tutorials and setting up RLBot, I was able to see an autonomous car out on the pitch. And it was hilarious. I watched it flail around and do donuts for minutes on end. I remember my roommate saying, "Yup, looks like a silver player to me!"

But as entertaining as it was, this was nowhere near the desired outcome. None of my models could make any sense of the quaternions and coordinate system, and they essentially converged to the mean of all the rows. Thus, I tried to handle more of the computation up front and provide a more informative input.

I ended up stretching the input layer out to 92 parameters per frame. I abandoned the quaternions for rotation matrices, and added new copies of every variable in the reference frame of the car's orientation. No longer was I just telling the bot that the ball was at the middle of the field, I was also telling it that the ball was behind him, and that the opponent was moving away from him. I spent a long time debugging the rotation matrices before discovering that the replays seem to switch back and forth between left-hand and right-hand representations of the z-axis. Once I got that sorted out and normalized all the values, the new model was at least driving around, rather than flopping around.

One of the unique challenges of this machine learning problem is the mix of output types. Some are booleans, like boost and jump. These resolve to either 0 or 1, while others like steering and throttle are analog, falling anywhere on the range of [-1,1]. I was completely new to machine learning theory, but I came to understand that these call for two different methods of computing loss and activation, so I originally had two different models. Recently I merged the two into one network where the output layer has combination of activation and loss functions.

The first set of neurons have a sigmoid activation and use weighted binary cross entropy for the loss, while the rest use hyperbolic tangent activation and RMSE for the loss. While initially confusing to set up, I think this is a much better configuration because it allows the outputs to be computed all at once, leading to just one associated loss.



After tons of trial and error with training parameters, my best model so far is what starred in the video at the top of the page. But after many more tests I have tragically forgotten what settings led to that model. I think I was using an SGD optimizer with learning_rate=0.01 and momentum=0.99, and I know the network size wasn't too large—maybe a few hidden layers with size 800. As far as batch size and training time, I'm not sure. But I haven't had the patience to train for lots of epochs because it takes a very long time to go through 50 gigabytes of memory-mapped data. I hope to do a better job of logging my experiments and maybe I can include some TensorBoard charts here in the future.

Thoughts

This project has reached a state far beyond my initial expectations. Although my creation lost to a Psyonix Rookie bot, and is still inferior to other models that were trained on more consistent data sources, it has been exciting to see some correct behaviors emerge! However, there is still certainly room for refinement in both network parameters and replay processing.

From what I have read about machine learning, there is this notion that more data will always improve performance—that enough rows can help a model cut through the noise and find overall trends. But I have a hunch that such generalization may actually produce negative side effects for this project.

Consider the case where an attacking player loses possession and has low boost. Do you try to pressure the ball, or turn back and shadow defend? The answer depends on the person's playstyle. Some go for boost, some go for ball. And if your dataset is so large to where it is split close to 50/50, the bot will converge at a decision that is neither push nor retreat. It might end up doing something in between, which is undoubtedly the worst course of action. With the 250 million rows that I have, I may already be dealing with these effects. My validation losses show no signs of overfitting, so I see no reason to fill more storage space with data that might not actually help.

Moving forward, I plan on conducting more structured experiments with respect to output weights, learning rate, network size, batch size, and training time. This would be expedited if I had a separate computer to train on, since I can't be giving up all my RAM for days at a time. I may try to remove some complexity by reducing the penalty for an incorrect output when that output had no real effect, such as trying to use handbrake while mid-air.

Of course if you have any questions or advice you'd like to offer, email me! I will try to update this page with any further findings.

4 February 2021