The Witcher 3 is a massive game. It packs in 35 hours of dialogue, each line of which was voice acted and motion captured. If I had been in charge of orchestrating all the moving parts of the game’s development, I would’ve had a breakdown a month in and the dialogue system would’ve ended up more like Facade. Thankfully, the much more capable Piotr Tomsinski was in charge, and he gave an enlightening talk at GDC on Friday about how much work went into making the characters move and speak so naturally.
The problem going into The Witcher 3 was obvious: they were making a vast, non-linear, fully-voiced RPG. CD Projekt wanted decisions in The Witcher 3 to feel meaningful, and for them to feel meaningful players needed to form emotional attachments with the characters. They wanted to be able to sell drama by showing it, not by telling you up front a scene was supposed to be emotional. Writing 101, essentially.
Doing individual motion capture work for every dialogue scene and then animating them all by hand would’ve been impossible, or taken up ridiculous resources (Tomsinski showed that a team of only 14 worked on the cinematic dialogue system, including programmers, animators, and QA—other hands likely pitched in, but that seems to be the core team). So CD Projekt built a number of systems, and a huge library of data in the form of reusable and easily modified animations, that could be combined together to create The Witcher 3.
With the systems they created, designers could make their own dialogue scenes without needing to pull models into a tool like Maya to do heavy duty animation. When he first showed off their Timeline tool, it looked overwhelmingly complicated—like a more complex version of Logic Pro or Adobe Premiere. But it’s actually not so bad: there are different rows for animations, ‘lookats’ (which is where the characters in the scene are looking), placement (location in 3D space), and a few other elements.
The real magic comes in how they generated the dozens of hours of dialogue scenes using an algorithm, and then went into the timeline to hand-tune each one instead of building it from scratch.
“It sounds crazy, especially for the artist, but we do generate dialogues by code,” Tomsinski said. “The generator’s purpose is to fill the timeline with basic units. It creates the first pass of the dialogue loop. We found out it’s much faster to fix or modify existing events than to preset every event every time for every character. The generator works so well that some less important dialogues will be untouched by the human hand.”
That’s right: a bunch of math determined how most of the dialogue in The Witcher 3 was arranged and animated. So how did it work?
“The generator requires three different types of inputs: information about the actors, [some cinematic instructions], and finally the extracted data from voiceovers. We use an algorithm to generate markers, or accents, from the voiceovers, so later we can match the events in animation with the sound. It generates camera movement and placement, facial animation, body animations, and the lookats.”
The Witcher 3 has some of the best-looking character interaction in any game, and most of that started with procedural generation. If the animators weren’t happy with a scene, they could simply press a button to regenerate it, and the algorithm would conjure up something new with a slightly altered mix of camera movements and animators. Tomsinski showed off some side-by-side examples, and it was easy to see the small distinctions between them; subtle differences between head and body movements, the pauses between movements.
“The generator works so well that some less important dialogues will be untouched by the human hand.”
Of course, they didn’t let the algorithm run and call it a day. The thing both scenes had in common was that they looked a bit amateurish—really, like awkward actors stumbling over a scene in a film, or the not-quite-natural animation of games that started to really explore cinematic character interactions (i.e. almost everything pre-Mass Effect). Most of the time, the animators would take what the generator had created, then go into the timeline to tweak it by hand, which could deliver a much better scene in just a few minutes. In some cases, they’d add in more elaborate camera movements, reposition characters and facial expressions, and so on, but they already had a great, unpolished base to work from.
The finished example Tomsinski showed adding a lingering camera shot to the end of the scene for a more cinematic transition, and the character Geralt had been talking to made a subtle facial expression as the witcher walked away. It doesn’t sound like much, but it’s amazing how much more life that gave the scene.
The building blocks for all those scenes were a set of 2400 dialogue animations, but divided between the various types of characters: men, women, dwarves, elves, children, etc., and different poses (standing, kneeling, and sitting), that number gets significantly smaller. They needed to be reusable.
Tomsinski gave an example: a simple gesture Geralt makes with his hand while standing. What if they wanted Geralt to make that gesture while sitting? They could try adding that animation to the timeline after inserting Geralt in a sitting pose, but that doesn’t work—he suddenly appears stood up and waves. So they created a system for additive animations, where only the key part of the body will move—in this case, his arm—allowing animations to be combined. Bam! Geralt is sitting down, but making the same gesture. Other tools, like masking, let them further tweak the movement of specific limbs. In this example, they made sure his legs looked natural as he moved.
There were other key elements to the system, like how they designed the lookat animations with attached poses, so characters would lean on one arm when looking in a certain direction, and how the timeline could dynamically scale for localization to account for longer or shorter dialogue in different languages. But to recap: holy cow, the cinematic dialogue in The Witcher 3 is amazing, and now we know why.