Performance Optimization: Doing Science

I couldn’t hold a candle to Brian Green on such topics as Quantum Entaglement, Higgs Boson, or Grand Unified Theory (despite obtaining a B.A. in Physics), however I can apply the scientific method to improving the performance of your software.

The Scientific Method (sciencebuddies.org)

In this article I will explain a basic, but often overlooked foundation for improving the performance of any software application.

Much of software development is an art, but performance tuning is a science.  I’ve seen a lot of good developers waste time significant amounts of time on performance with little to show for it, or just as bad, improve performance without knowing exactly which change had the desired effect.

Do you remember talking about the Scientific Method from your high school science class?  The diagram on the right is a refresher.  The scientific method is the repeatable process on which all scientific exploration is based.  It gives scientists across the world a common language and framework to compare the process and outcomes of experiments.

The scientific process provides a few of important points that can be applied to software performance optimization:

  1. Repeatable process – use the same process for every performance enhancement you make
  2. Only modify one variable at a time – Do not make multiple tweaks at the same time.
  3. Record the results of each optimization.  Track what you did and how much it helped.
Performance Optimization Method (clickonchris.com)

This sounds simple right?  It is.  The tough part for software developers is to never break these rules during a round of optimizations.  To the right I’ve also included a more detailed diagram of what the scientific process looks like when applied to performance optimization.  Let’s call it the Performance Optimization Method.

But I know what I’m doing!  Why shouldn’t I make multiple tweaks at once?

Lets say you do make two changes at once.  You optimize two queries and drop the page load time from 3s to .1s.  Do you know how much relative impact the changes had?  Did each change reduce the cost by the same amount (50%/50%)?  Did one query account for most of the cost (75%/25%)?  Or did one of the changes not even have any impact (100%/0%)?  What if the two changes were somehow interdependent?  For the most part these questions are impossible to answer unless you use a repeatable process and only modify one variable at a time.  There are exceptions *(there are always exceptions.  If you have a good profiling tool that tells you exactly what two different method calls cost and you are absolutely sure they are not somehow related then you could cut a corner and make multiple changes at once.  If the results do not turn out as expected you still need to go back and make the changes one at a time).  By the way, I hope you are testing against a volume of data you expect in production.

Don’t forget to record the result of each optimization.  This way you can throw your results into a table, and with a little explanation about the process and results you turn it into a report and send it to management so they can see how you’re spending their budget (and how good you are at science).  Having these sorts of metrics reports also makes it easy for stakeholders to justify the time spent on performance optimization activities.

The law of diminishing returns applies to performance enhancements.  At some point you will have picked all of the low-hanging fruit and enhancements start to get progressively more expensive.  Stakeholders need insight into how this is progressing on your project so they can make decisions on how much more to spend on performance.  Metrics reports should provide sufficient detail for stakeholders to make those decisions.

Ultimately you will end up with a faster application and a clear story of how you got there.  Isn’t science fun?