TOASTblag

Saturday, March 9, 2013

More Music

As an experiment in learning streaming technology, and having some fun with hanging the Kinect from the ceilng, I live-recorded some music on twitch. Despite the uncomfortableness I have about being on camera, here you are: http://twitch.tv/ventare

It's worth getting used to, in any case. Who knows, maybe someone'll like it :p

Wednesday, February 13, 2013

Works in Progresses

Rather than deliberate and continue waiting to post anything, I'll post a small update without giving it much thought:

I'm planning on making a gameplay video of the watchtower scenario; it's old now but it does show some scenario tech that I never made a video for. It also has the pathing speedups and large unit count that we didn't have before the last video (which was 8 months ago or something crazy).

We've been tweaking game parameters and class mechanics pretty wildly. A few 'features' are totally cut out at the moment, to see how the game plays without them (pressure/engagement speed). We're playing with unit count, overall speed, number of abilities, comboing maneuvers and actions, etc.

We have a system for making things easily, so I guess it's time to play with it :p

But mostly I'm working on the new terrain engine. It's complicated. I'll do a post on it soon as its integrated into the actual game.

Video soon, I hope :)

Sunday, January 6, 2013

Agh

AGH.

-1 % 2 != 1

WHY DIDN'T I ALREADY KNOW THIS

Thursday, November 15, 2012

Pathing Headaches; Bug Hunting

Sometimes on this Blag I like to just talk about programming, and some of the challenges I've come across throughout this project. This is one such post.

The 'crowning achievement' of a few months ago, within World, was updates to the pathfinding system. The fixes and speedups here literally made the game work, compared to the terrible game breaking behaviors beforehand. A result of this update was a particularly rare exception thrown on the server. This exception resulted in an invalid path, but when recomputed raw (without the path cache) the answer was correct. I've run for months with the bug in play, but just ignored. The game runs fine without it being fixed; at least, as far as is obvious to see. Nonetheless an outstanding bug (a crash bug) in this glorious system could not be sustained.

First an explanation of the pathing and mapping system. Part of me wants to say "it's not very complicated", but I'll leave that analysis to others. The map, as far as the server cares, is a connected directed graph. Each vertex is a possible location for an entity to exist. Edges are walkable/movable paths, the length of which determines the time to walk the edge. The graph is built up as a hexagonal grid, but in actuality the hexagonal limitation is not implied by the graph, it's merely a consequence of the graph's construction. As far as the mapping system cares, it's a generic graph with no basis on the game geometry, terrain, or environment.

Finding paths, then, is a fairly straightforward application of well known computer science algorithms. Any game programmer worth their sodium knows the A* pathfinding algorithm. It's a natural fit in this case, and so it is used.

The game, by its nature, requests a ridiculous number of paths per second. Due to constant formation and range adjustments, as well as navigating around other entities, paths are needed very frequently. Compute time is important in this case. I opted to build a caching system for common types of paths. Turns out the cache hit rate is about 99%, so it's well worth it.

The bug in question is a caching bug, so a bit more detail there is relevant. Each map-vertex stores a set of known target cells, and which neighbor to move to to get to it. That is, a cell 'A' may remember paths to 'B', 'C', and 'D', which lead through its neighbor 'B', 'C', and 'C' respectively. The paths are incremental. Rebuilding a path from cache is a simple linear sequence of hashtable lookups to determine the next step.

When a path is not available from the cache, A* is executed to build a path. However, at each A* vertex inspection, if there's a path directly to the end target, we can shortcut the remainder of the A* traversal and simply append the remaining known path to the newly calculated derived path. At this point the derived path's entries' caches are updated with both the tail of the derived path and the known path. Thereby the cache is augmented in the case of a found partial path.

That's it. Where's the bug?

The bug is that somehow, through some sequence of queries, a new path is requested that is broken. The cache implies a sequence to reach the target, but one of the vertices along the way has no cache entry to the target. What's missing?

Throughout the last months when the exception would pop up I'd inspect the cache data. However, inspecting 10000 vertices in a small map isn't exactly tractable. Confirming the cache coherency after all changes would make the game unplayably slow, and so never exhibit the bug. Then, some perfectly reasonable person suggested I just brute force the damn thing and find the problematic sequence.

So I rigged up a unit test to build a 10x10 map cell grid and exhaustively checked all possible sequences of 3 queried paths, and then test the cache coherency. 2 hours later I realized that I'd just executed an O(n^6) algorithm with n=100, broke into the debugger, and realized that I'd be waiting a few years for results, and slapped myself in the face.

So I rigged up a unit test to build a 10x1 map cell grid and bla bla bla. It ran, and it.. didn't crash. Figures.

So I rigged up a unit test to build a 4x4 map cell grid instead. 35 minutes later an exception came up. I checked the pattern of queries. It was not what I expected to see. I was expecting a bug in the partial-path re-caching behavior. It's the most complicated thing in the system and therefore the most likely to fail. What I saw took another 30 minutes to just understand. I made pictures.

Those are the relevant nodes. Actually the node above b is relevant too but I forgot when I made the image. The first queried path was from a to h:

The second path was from b to i:

When I first saw this path I was briefly confused. It seemed curious that the path through e to h would differ between the two. However, given A* on this graph it is reasonable because of the differing endpoint. g is closer in cartesian distance to its target than f, so the path is subtly different. It seemed weird but possibly a problem. I didn't know how yet though.

The third path was from c to h:

This path has a partial-cache hit at d. That path is d, e, g, h, the purple path. However, you'll note that the path obtained (in orange) is d, e, f, h! Well, it turns out the cache is a single store, and so only the first path from e to h is stored. d knows that it can get to h through e (and historically this path went through g). However, when the path resolution steps into e, e says the next step is f, because that was the first path from e to h stored.

This bit of inconsistency result in some confusion when the breaking query fires, from c to f:

The queried path should be entirely in cache, since it lies along the orange, previously queried and cached, path. However, when the path cache steps into d, d has no known paths to f. d has only ever pathed through g, and so the cache read has failed.

Why did this fail? It's because of a combination of three assumptions that are not always correct:

Given a point A and B, there will only ever be one path determined from A to B. This is true if B is the final destination, but not if a vertex beyond B is the target.
A partial cache hit, already being written to the cache, does not need to be written to the cache. This is true if the earlier (incorrect) assumption is, but since the known path can differ internally, the newly found path needs to be written to the cache as well. When this is written the c to f path is correctly written when the c to g path is finalized.

Bug fixed. :)

Next time, the new terrain engine. Here's a preview.

Friday, October 12, 2012

TOASTcapture!

So; Motion Capture!

A few weeks ago I got the bright idea that we could use a Kinect to record animation data to use for the game. That sounded like fun so I bought one and started working on building a recording program. It sorta started working, so we decided that, while a friend was in town, we'd invite a pile of friends over, stick them in front of the camera, and have them record some silliness.

Now, it took a few weeks of difficult math that's still being worked on, but the translation to our skeletal system is good enough to get a decent preview of now, hence this post. First, some numbers, a video link, then some discussion on challenges.

30 mocap sessions recorded. Each of these is about 2.5MB of raw floating point data; 21 points in 3d space per frame, a timestamp per frame, somewhere around 30 frames per second. Each session consisted of a sequence of typical animations we use for the game. Most of the glorious participants recorded a set for a specific class or enemy type, so we have a wide variety of styles here.

6.5 hours to run the skeletal solver for all the recorded mocap data. I'll be the first to admit this could be improved, but it sounds somewhat impressive or something.

60 MB of additional animation data required to be downloaded by the client (for now). Given that the entire rest of the content is about a quarter of that, it's a noticable hit. This data is horribly horribly de-compressed so it can be fixed, and the keyframe data should be smoothed anyway, but again; DATA GRAR.

27:38 of recorded animation data. Each person recorded just under a minute of live-time; even though each session took about 10 minutes to perform. The script was set up to give some time to think of what to do, as well as some lead-in and lead-out time per animation.

The Video is here. It's... WAY WAY longer than I was expecting honestly. I don't expect people to watch 30 minutes of random animating; but it could be fun to skip through and see how much we did :) OR, if you were HERE, find your own set to see what it looked like!

There are a pile of problems with the data. The most obvious is that some of the recordings just flat out come out wrong. Anything that isn't mostly standing up, moves to fast, rotates a lot, hides arms behind (my scout pose), hides legs behind the body (kneeling), etc, all totally fail to be recorded correctly. This is completely expectable given the nature of the camera, but it does mean we need to adjust what we can do with it. In some cases we could rotate the actor (and invert it when processing).

Others are problems with the translation; if you look at the data even briefly you'll see the magical swimming torso problem... MATH IS HARD. The hands and feet aren't tuned yet, the rest poses weren't all recorded correctly, etc.

But its fun! And it was educational. Once the keyframe compression is managed (either manually or automatedly) I'll be able to realistically reference the animations in the game. There's also a matter of time syncing; most of the actions recorded don't conform to the timing standards of the rest of the game, so they'd appear really late or never actually appear to swing.

Always more to do :)

Saturday, October 6, 2012

Math is Delicious!

Alright, I totally was gonna spend tonight making an awesome movie showing some new fun stuff, like the scenario with 200 dudes, and then knocking them all off cliffs, and stuff like that.

But instead I spent 5 hours wrangling the Jacobian transpose iteration method for syncing our skeletal data to that provided by the Kinect SDK. And massaging the data into proper form, and getting every last damn transform right. It's complicated! But its almost almost there now. I had a serious breakthrough in solving the main problems today. The method is working very well now, except for the hip and torso joints which still corner out for some reason. Once that's figured out there's some keyframe compression required, but then I'll have some nice motion captured imagery to share with the blag.

I never really talked mocap here, did I? Oh well. I'll talk about it when it works :)

Friday, September 28, 2012

Cleaning up the World

Last time on the World Update was meta-systems programming, so what's this time?

The last two months have brought primarily 3 things: A new scenario, massive performance improvements, and some new unit action design in progress. It's been a surprisingly meaningful set of improvements, worth pondering how it came to be. So let the prose commence: :)

1. The Watchtower Scenario

I wanted to make a defensive scenario; one where the players hold an area for a known amount of time under considerable attack, and succeed if their target lives long enough, and otherwise fail. Turns out there was a lot of new tech required to really pull it off well enough to play.

First, Map triggers.

To be clean, I implemented a somewhat generic triggering system for the scenario editor, and tied the win condition of this scenario type to a trigger. In this case, it's just a timer. I also wanted to be able to control the start conditions and start time of the attack, so there's another trigger required, and one that the user has to be able to execute. I then needed to be able to control waves of enemies spawning, so, more triggers.

It turns out trigger/response systems are pretty nice; they're the obvious alternative to some kind of embedded scripting language. Lots of games go this route and for good reason; it's simple, powerful, and not error prone. I set up trigger possibilities for timers (triggered from other triggers), for the scenario starting, for entering a region/map cell, and for entities that are triggerable. Oh, I also implemented entity grouping, so that I can trigger off an entire group being killed.

The next problem when building the scenario is that there are lots and lots of enemies. The first version had 495 spawns on the map. Managing that number was so impossible that I just deleted them all and started over when I wanted a change. In fact, instead of rebuilding the scenario's enemies to adjust difficulty, I just changed the spawn mechanism to allow spawning multiple units. Yay! Now it's just a number.

So now spawns can make many units; turns out that complicates a lot of logic that watches spawn counts and spawners. Figures.

So while I was at that, I decided to make it harder and make the spawn count variable. No sense making it easy on myself. Variable in what way? Well, how bout adding a spawn count for each allied player, and for every enemy player? That gives us a way to scale difficulty with player count. It also lets us make allies contingent on player count, which is neat (base line 4 spawns, -2 per player, =2 spawns when alone, none if 2 or more players). On top of that, let's add a difficulty parameter to the scenario, let the user control the difficulty when triggering the attack, then multiply the spawn count using that. Fun!

Well, briefly fun. I hit the hardest mode I set up and the first wave spawned 164 easy kobolds, and 30 allies for my team. Turns out the game can't handle that kind of load, and it.. well, it doesn't work. In fact, nothing moves at all since its so slow. On the plus side, my base was still alive after 10 minutes, so I won. :)

In any case, this brings us to:

2. Performance

Alright, actually I'm lying. The very very first time I made the scenario, with 15 kobolds, it completely died. Yeah, it didn't take 140 to bring it to its knees. 15 lousy kobolds was enough. Why now? Because the kobolds are attacking, rather than waiting for you to get near, so the path calculations are very long, and very slow, and actually just completely broken as it turns out.

I spent a good month just working on performance, and probably all but a week on one singular task. The first profiling yielded some useful information; 99% of the time was being spent on map traversal code. This includes evaluating area of effect, moving to range, estimating action range, and most importantly, path finding. (Somewhat surprisingly, the AoE calculation was 35% of the time, because every unit pulses a 20 hex radius aoe effect every 2 seconds, as part of the code game mechanics. More later).

So path finding. Lots of literature on speeding it up. A* is the common algorithm, but we were already using it, so...

Actually it turns out we were only 'technically' using it. We were actually using 'it badly'. Okay, that didn't work. Whatever. The algorithm was broken on many levels. If I could count the ways I would but I don't want to because I'd feel stupid.

I made some unit tests to help iron out the kinks. That wasn't enough, I updated teh scenario editor to show detailed debugging information on the paths. I wrote heuristics and double checked. I allocated 40 or 50 megs of ram to caching pathing information (oh, and wrote the caching). I bla bla bla bla fixed it.

Before, my test path took 60ms to compute. (Hopefully you're thinking 'Wow that's terrible!', and not 'Wow Nick is a terrible programmer!'). After, it takes .1 ms. Also, the cache hits about 98% of the time over the new scenario playthrough, and the cache lookup is faster than that even. Needless to say it's better now.

So it's faster. Turns out it's WAY more than that. There were some interesting side effects.

First, the old algorithm would scan the entire map if a path couldn't be found. This sounds uncommon, but note that any time any unit is on their destination spot, there is no valid path! In this case the units would take the directest route possible. In addition to making it super super slow, this meant that units would not path around their allies, leading to single file lines of extremely stupid enemies that get stuck a lot. This worked against the player's units in interestingly broken ways too.

Secondly, because of this fix some aspects of the autoattack algorithm became obvious. When things can path around each other successfully the AI's decision making because more obvious. It turns out the AA would only ever try to attack the highest threat, even if it was inaccessible. This made the game feel incredibly slow, as everything shifted around (unsuccessfully) to try and act, and inevitably didn't. The game would even get into 100% stalemate situations with 20 units all in a jumble trying to move around each other. LAME.

So I rewrote auto attack. While I was at it, rewrote the movement control AI for players and enemies, and the special attack code (same as auto, really...) Also rewrote the trigger for the AA. Previously it was timed (instead of using attack speed), and now it actually acts when the Attacks come off cooldown.

Honestly its amazing these things went so long without even being noticed. The game is COMPLETELY DIFFERENT after these fixes. It just... behaves! It's.. .like it actually works! It's 100% amazing to me how much better the game plays now. I'm really struggling to convey JUST HOW satisfying it is when the behaviors start working like they should, and JUST HOW stupid you feel when you realize how long there've been seriously broken issues that just weren't quite explainable. The game always.. felt right before? But it really wasn't.

Now it is.

One last note...

The defense scenario, the first time I ran it, was extremely easy. After all these fixes, the attacks are faster, the enemies close better, the ranged attacks work correctly, and your units are more dependent than ever on positioning. The game got a lot lot harder, but harder in lots of really good ways.

I did more than just these main things, and we've got more stuff we're trying (I did mentioned a third thing up there, didn't I...), but that 'Now it is' line was just too good not to (try to) end on.

Peace. More soon :) It really needs a movie to show the unit motion and count... Tomorrow night maybe.