Jun 172011

Download as ebook

Fig. 4. Illustration of the constrained optimi...

Image via Wikipedia

Abridged version here.

By now everybody and their screensaver have heard the Optimization Mantra: Don’t Do It! This is commonly wrapped in a three-rule package (I suspect there is a word for that). The first two of which are copies of the mantra, and the third adds the wise word “Yet” to the one-and-only true rule and addresses it to the “expert.” I suspect originally the middle “rule” didn’t exist and it was later added for effect, and perhaps to get the total to the magic-number of three.

I can almost imagine Knuth after figuring out a single-character bug in a bunch of code, with coffee mugs and burger wraps (or whatever it was that was popular in the ’60s) scattered around the desk… eyes red-shot, sleepless and edgy, declaring “Premature optimization is the root of all evil.” (But in my mind he uses more graphic synonyms for ‘evil’.)

Knuth later attributed that bit about premature optimization to Tony Hoare (the author of QuickSort) thereby distorting my mental image of young Knuth swearing as he fixed his code, only later to be saved by Hoare himself who apparently doesn’t remember uttering or coining such words. (Somebody’s got bad memory… may be more than one.)

Smart Aleck

Premature optimization; Most of us have been there. And that’s what makes those words very familiar. Words of wisdom, if you will. We’ve all decided to do a smart trick or two before fleshing out the algorithm and even checking if it compiles, let alone checking the result, only to be dumbfounded by the output. I can figure it out! we declare… and after half a day, we’d be damned if we rewrote that stupid function from scratch. No chance, bub.

Probably the smarter amongst us would learn from the experience of such dogged persistence and avoid trickery the next time around. While few would admit to the less-intelligent decisions they took in the past, at least some will have learned a lesson or two when the next opportunity knocked.

The aforementioned trickery doesn’t have to be optimization trickery, mind you. Some people (read: everyone) likes to be a smart-ass every so often and show off. Sure, many end up shooting themselves in the foot and making fools of themselves. But that doesn’t stop the kids from doing a crazy jump while waving to their friends, iPod on and eating crackers, just to impress someone… who typically turns around precisely when they shouldn’t. (Did I mention skateboards?)

The rules are sound. No doubt. Another rule of optimization, when the time comes, is to use profilers and never, ever, make costly assumptions. And any assumption is probably costly. That, too, is sound. These are words of wisdom, no doubt. But, taken at face-value they could cause some harm.

Let’s take it from the top. Leaving aside the smaller projects we might have started and for years tended and developed. Most projects involve multiple developers and typically span generations of developers. They are legacy projects, in the sense of having a long and rich history. No one person can tell you this history, let alone describe all parts of the code. On such a project, if performance is an issue, you shouldn’t go about shooting in the dark and spending days or even weeks on your hunches. Such an approach will not only waste time, add a lot of noise and pollute the code-base and source repository (if you commit to the trunk, which you should never do, until done and ready to merge.)

In such a case, one must use a profiler, consult with others and especially talk with module owners, veteran developers and the architects before making any changes. The change-set must be planned, designed and well managed. The larger the project, the more this stage becomes important. No funny tricks, please.

Efficient Code != Premature Optimization

Of the standard interview questions we often ask (and get asked) are those on the data-structures and algorithms (discrete math and information theory.) I typically ask candidates to compare the data-structures in terms of performance which should cover both internal details and complexity characteristics (big O). It’s also a good opportunity to see how organized their thoughts are. We use arrays, lists and maps/dictionaries quite often and not having a good grasp of their essence is a shortcoming. As a follow-up to this I typically ask how they decide which to use. This isn’t an easy question, I realize. Some things are hard to put into words, even when we have a good understanding of them in our minds. But, interviews aren’t meant to be easy.

The worst answer I ever got was “I use List, because that’s what I get.” To which I had to ask “Where?” Apparently, the candidate worked on a legacy project that used Lists almost exclusively and bizarrely she never had a need for anything else. The best answer typically gives a description of the use-case. That is, they describe the algorithm they’ll implement, and from that, they decide which container to use.

The best answer isn’t merely a more detailed or technical answer. Not just. It’s the best answer because it’s the only answer that gives you reasons. The candidate must have thought about the problem and decided on an algorithm to use, they must’ve quantified the complexity of the algorithm (big O) and they must’ve known the performance characteristics of the different containers for the operations their algorithm needs. They have thought about the problem and their solution thoroughly before choosing containers, designing classes and whatnot.

The traditional wisdom tells us to avoid premature optimization and when absolutely necessary, we should first use a profiler. But both of these can also be interpreted as follows: it’s OK to write inefficient and bloated code, and when necessary, we’ll see what the profiler comes up with.

Performance as an afterthought is very costly. Extremely so. But I don’t recommend premature optimization. There is a very thin line between well-thought and designed code that you’d expect a professional to output and the student toy-project style coding. The latter focuses on getting the problem-of-the-moment solved, without any regards to error handling or performance or indeed maintenance. We’ve all done it; Multiple similar functions; Same function reused for different purposes with too many responsibilities; Unreasonable resource consumption and horrible performance characteristics that the author is probably oblivious to. And so on.

It’s not premature optimization to use dictionary/map instead of a list or the most common container in your language of choice. Not when we have to read items most of the time. It’s not premature optimization if we use an O(n) algorithm instead of the O(n2) that isn’t much more complicated than what we’ll use (if not an O(log2 n) algorithm). It’s not premature optimization if we refactor a function so it wouldn’t handle multiple unrelated cases. Similarly, moving invariant data outside a loop isn’t premature optimization. Nor is caching very complex calculation results that we don’t need to redo.

Regex object construction is typically an expensive operation due to the parsing and optimizations involve. Some dynamic languages allow for runtime compilation for further optimization. If the expression string isn’t modified, creating a new instance of this object multiple times isn’t smart. In C# this would be creating a Regex object with RegexOptions.Compiled and in Java a Pattern.compile() called from matches() on a string. Making the object a static member is the smartest solution and hardly a premature optimization. And the list goes on.

As much as I’d hate to have a pretentious show-off in my team, who’d go around “optimizing” code by making wild guesses and random changes, without running a profiler or talking with their colleagues, I’d hate it even more if the team spent their time cleaning up after one another. It’s easy to write code without thinking more than a single step ahead. It’s easy to type some code, run it, add random trace logs (instead of properly debugging,) augment the code, run again, and repeat until the correct output is observed.

I don’t know about you, but to me, writing and modifying code instead of designing and working out the algorithm beforehand is simply counter productive. It’s not fun either. Similarly, debugging is much more interesting and engaging than adding random trace logs until we figure out what’s going on.

I’m not suggesting that this extreme worse-case that I’ve described is the norm (although you’d be surprised to learn just how common it is.) My point is that there is a golden mean between “premature optimization” and “garbage coding.”

The Cost of Change

When it’s time to spend valuable resources on optimization (including the cost of buying profilers,) I don’t expect us to discover that we needed a hash-table instead of an array after all. Rather, I should expect the profiler to come up with more subtle insights. Information that we couldn’t easily guess (and indeed we shouldn’t need to.) I should expect the seniors in the team to have a good grasp of the performance characteristics of project, the weak points and the limitations. Surely the profiler will give us accurate information, but unless we are in a good position to make informed and educated guesses, the profiler won’t help us much. Furthermore, understanding and analyzing the profiler’s output isn’t trivial. And if we have no clue what to change, and how our changes would affect the performance, we’ll use the profiler much like the student who prints traces of variables and repeatedly makes random changes until the output is the one expected. In short, the profiler just gives us raw data, we still have to interpret it, design a change-set and have a reasonably sound expectation of improved performance. Otherwise, profiling will be pure waste.

It’s well documented that the cost of change increases exponentially the later a project is in it’s development cycle. (See for example Code Complete.) This means a design issue caught during designing or planning will cost next to nothing to fix. However, try to fix that design defect when you’re performing system-testing, after having most modules integrated and working, and you’ll find that the change cascades over all the consequent stages and work completed.

This cost is sometimes overlooked, thanks to the Rule of Optimization. The rule highly discourages thinking about performance when one should at least give it a good thinking when the business and technical requirements are finalized (as far as design is concerned) and an initial design is complete. The architecture should answer to the performance question. And at every step of the development path developers must consider the consequences of their choices, algorithms and data-structures.

This doesn’t suggest optimization-oriented development. Rather, having a conscious grasp of the performance implications can avoid a lot of painful change down the road. As we’ve already iterated, designing and writing efficient code doesn’t necessarily mean premature optimization. It just means we’re responsible and we are balancing the cost by investing a little early and avoiding a high cost in the future. For a real-life example see Robert O’Callahan’s post linked above.

I know there is a camp that by this point is probably laughing at my naïve thinking. By the time I finish optimizing or even writing efficient and clean code, they’ll say, their product would’ve shipped and the customer would’ve bought the latest and faster hardware that will offset their disadvantage in performance. “What’s the point?” they add. While this is partially true, (and it has happened before,) given the same data, the better performing product will still finish sooner on the same hardware. In addition, now that processors have stopped scaling vertically, the better designed code for concurrent scalability (horizontal scaling) will outperform even the best algorithm. This, not to mention, data outgrows hardware any day.


Premature optimization is a major trap. We learn by falling, getting up, dusting off and falling again. We learn by making mistakes. The wisdom of the community tells us to avoid experimenting on our production code and postponing optimization as much as possible. Only when the code is mature, and only when necessary should we, by the aid of a profiler, decide the hot-spots and then, and only then, very carefully optimize the code.

This strategy encourages developers to come up with inefficient, thoughtless and -often- outright ugly code. All in the name of avoiding premature optimization. Furthermore, it incorrectly assumes that profiling is a magic solution to improving performance. It neglects to mention how involved profiling is. Those who had no clue as to why their code is bogged down, won’t know what to change even if the profiler screamed where the slowest statement is.

There are no excuses to writing inefficient code if the alternative is available at a small or no cost. There is no excuse to not thinking the algorithm ahead of typing. No excuse to leaving old experimental bits and pieces because we might need them later, or that we’ll cleanup later when we optimize. The cost of poor design, badly performing code is very high. It’s the least maintainable code and can have a very high cost to improve.

Let’s optimize later, but let’s write efficient code, not optimum, just efficient, from the get-go.

Download as ebook


  5 Responses to “The Second Law of Optimization”

  1. Agreed wholeheartly. While you only discussed the use of containers another good example of non-premature optimization is when I discovered that your webapp project contained code as follows:

    1. Download the full table from database (a list)
    2. From a sublist (java.util.List.subList(int,int)) parametrized using the paging variables (offset + count) …
    3. … get the database identifier of each element to a Set
    4. Add the set somewhere else

    This also happened to be the largest table we have. People often blame ORM’s for being slow, but Hibernate and PostgreSQL worked happily together delivering sub 10 second page loads and of course, never any memory problems as it was only the developer testing it out.

    We were lucky to find this before shipping it to our clients, where two users at the same time would had set the server in flames.

    Reason for behind all this was, the developers words: “i didn’t want to do any premature optimization … and forgot about it”. In reality, he must have known that you can project the results, e.g. select only the column you need, and do the paging in rdbms, which happens to be very optimized for that sort of jobs. He was just being lazy, calling it “premature optimization”.

    This could’ve ended up way more badly; suppose if the query had been more complex and was because of that implemented in code, not sql. That would have meant a serious oversight in planning our database model, and could’ve been very expensive to fix later on. That’s why you must use the right tools right from step 0, until you know there are not going to be any problems.

    Like or Dislike: Thumb up 3 Thumb down 0 (+3)

    • This is precisely why this article exists. There are many cases that shouldn’t need a profiler to find, nor should the customer complain for developers to take action. At some point it’s hard to say whether the developer deliberately wrote slow and bad code or they are really dull.

      If you’re reading a whole table in memory, and you can’t predict the table size, then you’re doing it wrong. No need to wait for that customer who’s got 100M rows to tell you the product is broken. There are many cases like that. And if a team-member needs a profiler for such a case, then I think we should consider switching teams.

      Like or Dislike: Thumb up 0 Thumb down 0 (0)

    • @reisi.

      The ORM mantra is that the database is just something you use to save your business model and that 2nd lvl cache and lazy loading take care of the performance.

      Most ORM or code generator do even come bundled with “select all” queries for free, as a feature. It is no surprise your developper use theses then.

      They just do what the software design tends them toward too. Use ORM magic and manipulate everything naively in java.

      Like or Dislike: Thumb up 1 Thumb down 0 (+1)

  2. Some remarks:

    “I use List, because that’s what I get.” : Is a wiser response than it might appear. Cost of converting data structure is typically high and most of the time we just need to iterate or find a value of a very small dataset. Often you send back you structure to the API, and so need to convert again. Stuffing your code with lot conversion is not good either.

    “It’s not premature optimization to …”:

    Taking care of best datastructures and algorythms is hard (they are dependant of each other). Trying to find the best datastructure can give worse result than using the most natural one.

    Big(O) notation:
    Quick sort worse complexity is O(n²), most other searching algorythm are O(n log n) but quick sort is still faster. This is even more important are most agorythms we use on a dayly basis are used on very little n value to validate user entries and process a few dozen result from database.

    Using cache:
    I remember a project when a “global” cache was used at a premature optimasation to increase performance:
    – the thing didn’t need cache anyway as it was fasr and not consuming much time.
    – the caching system was very inefficient because most of the time data was accessed one time only.
    – this introduced bugs because of stale data.

    “Making the object a static member”. I remember that. This was the first bug I found in my actual assignment. Someone was thinking using a static member for a NumberFormat would eliminate the cost of creating a new one each time. Problem is NumberFormat is not threadsafe… Not only the performance of parsing numbers was irrelevant toward the overall performance of the application (like 0.0000001%), but this introduced random errors on heavy load. (I know Pattern is thread safe in Java).

    “While this is partially true, (and it has happened before,) given the same data, the better performing product will still finish sooner on the same hardware.”

    It only matter if you optimize what impact most the performance and if performance is part fo the requirements

    if your application layer is 10X slower than it could be but is using only 10% of the processing time, who care that you managed to go from 100ms to 10 when network and browser layouting together consume 1 second?

    I understand what you are trying to say. Performance is important. Architecture is important. The fact is the experienced and skillfull developper/architech will be able to grap a simple solution to the problem that is effiscient and scalable (if needed). This architecture will adress the key points with whatever needed but will stay intuitive and simple (KISS) for whatever has no strong constraints.The beginner or the “hacker” will do complex things to please himself and impress others. The management will use sonar to force everybody to change new Integer() to Integer.valueof everywhere just to fail to see that the compiler could perform autoboxing more elegantly and that the performance impact is negligible (but sonar stats would improve).

    Like or Dislike: Thumb up 1 Thumb down 0 (+1)

    • Good points. Let me add some of my own remarks…

      > “I use List, because that’s what I get.” : Is a wiser response than it might appear. Cost of converting data structure is typically high and most of the time we just need to iterate or find a value of a very small dataset. […]

      That’s good if we really need no other type of operation. But is that a good excuse to not even know when to use other containers? I mean, could it be that they never needed any other container precisely because they don’t know what they are for?

      > Quick sort worse complexity is O(n²) […]

      If the worst-case condition is common, then QuickSort is a bad choice. However, if it isn’t then it’s a better choice than the home-brewed alternative. The point is to know your case, and not just pick the algorithm with the best complexity.

      > Using cache: […]

      Perfect example of premature optimization. But I couldn’t say the same for a web-server that uses out-of-the-box memcached. Disabling it will probably be a bother. Even if you don’t need it, it’s probably a good idea to leave it enabled by default. Your case however implies that the coders went out of their way to design and write a caching system. That’s not “an obvious improvement that requires little or no more effort.” That’s a huge design decision that isn’t backed with data.

      > Problem is NumberFormat is not threadsafe… […]

      Again, know your problem and code is the key. There are traps both ways. And any example can be shut-down by counters. I mean, I can find cases where using multiple local objects caused a bunch of problems, because the instances were internally accessing the same server, which had a limit on the connections and so it barfed every so often. Perhaps the best way to put it is to say: There is no default position, all decisions have consequences for both performance and correctness. We need to balance them.

      Your point about boxing is a good one. It kills me when management go about doing such obviously self-defeating stunts. But it hurts more to see devs wasting their valuable time on such things. I’m more concerned on scalability issues and design decisions done between architecture and boxing. The point where the author of a class or particular function needs to make decisions regarding containers, algorithms and so on. That responsibility seems to be lost on everyone. No, architecture shouldn’t dictate how each function is wrote, unless that functionality is key and globally consequential. And no, we shouldn’t worry about boxing, unless our whole data processing pipeline has to suffer it. (Yes, a project of another team of a previous job failed because they ported a data-parser that processed millions of numbers per day because boxing killed the performance. Primitive types were all they had to deal with, so I guess they used the wrong tools.)

      Like or Dislike: Thumb up 0 Thumb down 0 (0)

 Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>


(required, not public)


QR Code Business Card