Friday, April 22, 2011

The Importance of Profiling

Ah, profiling. A tool that I've neglected to use for far too long until forced to do so recently at work. No I don't mean that! No not that either!! Profiling is a tool which is used to examine how long each part of a given program executes to find out why it's slower than molasses in Antarctica.

A major tool that I support is a test automation framework that supports remotely exercising APIs under test via a scripting language. I had just finished the initial work of porting this system to support testing of some new software when I discovered a problem.

Under normal circumstances the program can run over one hundred commands per second depending on the time taken by the command itself. An early test however revealed that I was barely managing one or two commands per second. Something was wrong!

Worse, examining the code provided no obvious answers. There were no poorly made algorithms, no pointless sleeps, no redundant work, nothing. Embarrassingly, this was the first time I've actually used a profiling tool to solve a problem. All other times I've simply muddled around with the code until I found some obvious problem like the ones I mentioned. Whelp, no time like the present, right?

The tool I was using provided two ways to profile a program's internals. Both involve what's called instrumentation. What that means is the tool generates some extra code in the program for the purpose of measuring your code's behavior. The first profiling method, and by far the simplest, is to count the number of times every function is called. This is helpful in finding candidates for improvement or potential bugs that cause more calls to some code than necessary. The second measures the amount of time spent in each function in addition to the number of times it's called. This is better since it can tell you where your program is getting stuck longest.

In my case a function which waited for a command stood out as the long runner. Great! Except all it does is call a function to receive data. Digging through the function I found some informational prints, and a call to recv. Not much to go on. Prints are used in a lot of different places in the program and none of those calls took very long at all, so that's out. Time to put on my google Shinobi Shozoku and find out if there are situations that make recv slow.

The result? Yes! However it's not recv that's the problem. It's a little something called Nagle's Algorithm, which is part of the TCP/IP protocol stack. It's designed to reduce network traffic by attempting to combining multiple small packets and sending them at once. It does this by holding on to a small packet and waiting for a period of time before sending it, in case more data is coming. Since the tool was designed to minimize bandwidth and be very lightweight in design it would send encoded commands in the form of very small packets, typically a few dozen bytes being large for the tool. Disabling Nagle's Algorithm did the trick.

Without profiling I might have found the issue in question, but I would have wasted significantly more time either in trial and error. Or worse, not realizing I've made an error, and wasting hours trying to fix something that isn't the problem with negligible improvement.

Sunday, April 17, 2011

Component Based Architecture

This is a relatively old presentation now, but it's a useful one.
http://cmpmedia.vo.llnwd.net/o1/vault/gdccanada09/slides/marcinchadyGDCCanada.ppt

It describes some of the architecture of the game Prototype. It has a very nice description of what goes wrong with simple OO solutions for games when as the scale of a project grows.

Saturday, April 16, 2011

Looking Forward to this One

Well it's been... far too long since the last time I've posted here so I'm going to try to get back into the habit again.

Short story: I'm married, and have been since August 2010, work varies between satisfyingly challenging to insane, I have a pet cat named Doom, and I've barely had any time to do real hobby development work.

I'm going to keep things short so that I'm less likely to get overly lazy and stop posing for months at a time. I'm also not going to restrict my posts to anything in particular... beyond avoiding certain particulars of my job due to NDAs.

During work, I'll spend a smallish amount of time browsing the web. This helps to prevent me from going into mental vapor-lock and keeps me working faster than I would otherwise. I have a few sites I tend to go to that don't take much time to read, and relate just enough to what I'm doing that I don't switch out of the appropriate mindset for coding. One of these sites is the TIGsource blog. This is where I discovered a game that I'm very very excited to see. Hawken.

The game is a mech-sim like game with a pace more on the level of a classic multiplayer deathmatch. Either way, it's been far too long since I got a real fun mech game (years since I got Mechwarrior 4) and Hawken has a very nice feel based on the released gameplay footage. I also quite like its futuristic high tech shantytown setting. Hope the game turns out as awesome as it looks.