Saturday, December 5, 2009

A programming aside: SRP in World's code

It's a bit different than most recent posts, but I felt like talking about about good code, bad code, and some of the World's code design that has paid off, in theory.

Divide-and-conquer is a pretty well known strategy, in any field. Any task which is sufficiently complicated has to be broken into pieces to be completed. Needless to say, almost all programming tasks are beyond the mental capacity of an individual to solve. Breaking the problem into pieces is a fundamental task in programming, and most advances in programming, historically, have been to make this break easier.

Now, there are many ways to divide things up. Or rather, there are many ways to combine the pieces together. Some of them are bad. For example, introductory object oriented programming teaches us to split things up by classes, and further by their respective nouns (data) and verbs (functions). This is bad! Very bad. In the long run, you end up with a class with a dizzying variety of tasks, data, and code, none of which may be terribly related to each other. Furthermore, the noun/verb class abstraction often breaks down in terrible ways. They say a square is a rectangle, but try and implement a square and a rectangle class with that relationship embedded and you'll have trouble, I promise you.

Anyway. We all agree that dividing the work up is a necessity. However, as soon as we did the work, we assumedly integrate it back together to form the whole. But, and here's the point, if we have to go back to it, how to we redivide it? We broke up the work, but only once, and then we threw the division away! That seems silly. Why not retain the division somehow, so that if we have to go back to code a year later, it is already divided into understandable components?

This has been codified into a programming principle called SRP, the Single Responsibility Principle. When following this principle, the rule is that any given piece of code must only 'care' about a single aspect of the program. Defining 'care' is tricky of course, but I define it to mean anything I can understand without reference. If the code brings two complicated systems together, and each is too complicated to keep in my brain, that's too much responsibility, and the offending code must be further divided (and conquered).

An anecdotal example. In Twilight 1, each potential action a unit could perform was a discrete class. Each individual class had a large set of knowledge required to perform its job. A 'rob that building' task action needed to know how to move, how to respond to danger, how to rob, and how to return with the results. Not only was each action very difficult to get working, but they inevitably had bugs. The code was too complicated, because it knew too much.

To fix this, in Twilight 2 I implemented an action sequencing system. Robbery was defined as 'Move to Building', 'Rob', and 'Return', and these were reusable pieces. How could this single element, the sequence, possibly be incorrect? It can't. There are no possible bugs. The sub-components can be wrong, but that's not the worry of this action, so it MUST work. I never had to touch the sequences again. (Actually, since I implemented my own scripting language to build these sequences, I overdid it greatly and suffered from second-system-syndrome, but that's a talk for another post).

So yes, Single-Responsibility is good. It makes for a HUGE number of classes, but each one is individually understandable, debug-able, and iterate-able. Ironically, the only good way to get SRP out of traditional Object-Oriented Programming, is to invert the traditional approach. Classes are no longer nouns with function verbs. Classes are now verbs, that operate on classes that are nouns. This separates data storage and activity. Each potentially complicated verb is split into its own responsibility, and life is good.

World's design, entirely on accident, absolutely enforces this paradigm. I didn't even realize it till a year past I had been working with it.

When working with the game entities in World, there are only 3 concepts available for use: Attributes (just data), States (usually implemented as a State Machine), and Actions (which are just functions, but with some attached metadata). When constructing a new action, there is a set of action classes already implemented, which can be used, or a new one can be implemented. An action implementation has a known definition and limited data availability. It cannot possibly do more than an Action is supposed to do. Each action definition, therefore, is extremely easy to understand, debug, or copy.

States are slightly more complicated, since state machines are not a simple construct. A state machine is a set of states (just a number, really), each of which has a set of transitions (conditions for moving to a new state), and behaviors (just some operation again, like an Action). If new code is needed, the only places to hook in are at these two types. Either you make a new transition, or a new behavior. And, like actions, these consist of a very small set of functions and responsibilities. They are simple. There are a lot of them, but each one is simple.

So as it turns out, actions can only act, and transitions can only transit. But perhaps more importantly, only actions can act, and only transitions can transit. This means that nothing else can possibly care about acting or transiting, and that, in effect, simplifies everything else.

Perhaps I should redefine SRP. Single Responsibility Principle: Any given piece of code must only 'care' about a single aspect of the program, and ONLY that piece of code is allowed to 'care' about it. Divided and Conquered.

Wednesday, December 2, 2009

Laziness

Some have said that laziness is a virtue of a good programmer. Perhaps this because a lazy programmer will often do a bit of extra work to be sure he doesn't have to work hard later. We'd rather write an automatic solution than exert manual effort.

Now, I think that's kind of a fallacy really. A really lazy programmer won't bother with the efficient solution, and just plods along doing whatever works. Perhaps there is a Ballmer-peak-esque point where laziness yields maximal efficiency, however. Indulge me in a brief allegory from the engagement implementation.

In Skirmish, there are two tables of hostile-entity information retained. First is engagement, which is a constant value, and is recalculated every 2 seconds. Second is damage-threat, which is additive, but decays at a rate of 5% per second. The only important yield from all this data is knowing which entity is on top of each table, and then combining the two tables to know which entity is the overall 'winner'. Remember, the entire system is event based, so we won't be asking who is on top and reacting. Rather, we need the system to tell us when the top entity changes. Furthermore, we apply a threshold, and define this threshold to be 10%. Only when a new entity reaches 110% of the current maximal entity, will the event fire.

So there are three cases. First, a newly applied engagement value trumps the current top. This is extremely easy. We are informed when engagement changes, so we can easily fire the event if the math is correct. The damage-decay is easy to manage, since at any given time we can compute the new value correctly and easily. Equally easy, is when a newly applied engagement value is the previous top, and is now no longer top. Very easy to implement.

Second, a newly incoming damage-threat rises above. Also easy for the same reason.

The only tricky case is handling transitions due to damage-threat decay. Herein there are several cases as well. First, one damage-threat drops below another due to decay. But HA! This is easy. Each threat decays the same exponential rate, so they are guaranteed to never cross, however long time passes! Piece of cake, unless someone decides they want damage-threat to decay at different rates per class. We'll just pretend that'll never happen for now.

And now the point. What if damage-threat decays below 90.9% of the top engagement threat, so that engagement is now 10% greater than damage-threat? Since the decay is implicit (that is, it is never calculated and assigned a new value), we would have to compute the future time where the decay reaches 90.9% of max engagement, set a timer, and wait till then. While the scheduling system World is written on makes that sort of thing easy, I think you can guess my reaction. I decided that that was too much effort, got lazy, and forgot about it.

Later I grumbled a bit and decided I polish the implementation off and get it right. But I realized a curious thing. The timer will NEVER FIRE. That's right. If I implemented the delayed decay transit system, it would never be relevant.

Two simple facts combine to put a constraint on the time the decay would require. First, know we have a 10% threshold for the top damage-threat to be relevant. Then know that there is a 5% decay per second on damage-threat, and then note the 2 second recalculation of engagement.
In the 2 seconds before engagement is reassigned, there is only marginally more than 10% of decay possible. So it'll never transit, or if it did, it'd only be minutely before the engagement ping would fire anyway. Horray for laziness!

Actually, all that is wrong.

The 10% threshold is for overcoming threat. It could be that at the 2 second mark, entity A has 100 damage threat, and entity B has 109 engagement, and entity A is currently the 'top' entity, since B has not crossed the threshold. In this case the transit would need to fire in about .2 seconds, in order to be correct.

But you know what? It doesn't matter. The end effect is that in extremely rare cases, damage-threat will not be overcome by engagement immediately, and instead will need to wait a very rare worst case of 2 seconds. If that becomes a gameplay disaster, then you can call me wrong and make me write the bloody algorithm.

Guess I'm Lazy.

Skirmish Update

  • Implemented engagement value calculations and broadcast, and nullification when unconscious.
  • Added per-class engagement data.
  • Implemented Engagement table, which tallies up incoming engagement values, and tracks which entity is on top of the list.
  • Implemented Threat-by-damage tracking, with an appropriate table and top-tracking as well.
  • Added a new type of AI state called a Behavior. The only existing one is called 'Attack by Aggro'. The main scenario has been rewritten to start this state on the hostile NPCs. As expected, this state starts idle, and invokes an Attack Target whenever the top entity in threat (taking the greater of damage and engagement threat) changes.
  • Aggro works! The game is fundamentally different to play, and works masssively better.