May 302011
 

The story I’m about to tell is the worst case of leaky abstraction that I’ve encountered and had to resolve. Actually, it’s the most profound example that I know of. Profound here is used in a negative sense, of course. This isn’t the performance issues Joel brings as examples to leaky abstraction. Nor is it a case of getting a network exception while working with files, because the folder is now mapped to a network drive. Nor a case of getting out-of-disk-space error while writing to a file-backed memory block. This is a case of frustration and a very hard-to-catch defect. A case of a project that came too close to failing altogether, had we not figured it out in time.

http-Request from browser via webserver and back

Image via Wikipedia

Background

Our flagship project worked with 3rd party servers which for technical reasons had to be installed on the same machine as our product. This was an obvious shortcoming and had to change. The requirements came in and we were asked to move the interfaces with all 3rd party servers into lightweight distributed processes. These new processes were dubbed Child processes, while the central process, well, Central. A Child would run on a different machine on the network, communicating with the Central process to get commands, execute them and return the results.

The design was straightforward: all processes would be running as services listening on some configurable port. The commands were simple application-level message objects, each with its own type, serialized on the wire. The operations were synchronous and we needn’t progress updates or heartbeats. We left the network design synchronous as well, for simplicity.

The developer who was assigned the communication-layer had background in web-services and proposed to use standard HTTP protocol. We thought about it and declared that while it would have some small overhead, the simplicity of reusing some library would be a plus. After all, HTTP has data recovery and is a standard protocol. And if we really cared about overhead, we’d use UDP which has no duplicate detection, data recovery or even ordered packet transmission. Plus, the developer who’d work on this feature was comfortable with HTTP. So why not?

As it turns out, HTTP was the single worst decision made on this particular project.

Since we were now transmitting potentially sensitive data, the requirements were amended to include data encryption to protect our customers’ data from network sniffers. We used a standard asymmetric encryption for extra security. This meant that we had to generate a pair of public and private keys each time we connected. We devised a protocol to communicate the key the Child must have using a symmetric encryption algorithm. We were confident this was secure enough for our needs and it wasn’t overly complicated to implement.

Trouble

The project was complete when I took the product for a final set of developer white-box testing. This is something I learned to do before shipping any product, as I do have the responsibility of designing the features, I also feel responsible for looking under the hood in case there is some potential mechanical or technical issue. Much like your car mechanic would do before you go on an off-road trip.

That’s when things started to fall apart. All worked fine, except every now and then I’d get errors and the Child would disconnect. Inspecting the logs showed data-encryption exceptions. The deciphering function was failing. Every single developer who run the code verified that it worked fine without any problems whatsoever. I asked them to pay attention to this issue. They came back saying all was perfectly fine. It was only on my machine!

Mind you, I learned not to blame before I eliminate every possible culprit. And the prime suspect is, always, a bug. A developer error. So I started sniffing around. Visiting the design back and forth. Nothing made sense of the issue. The code works, almost all the time. Then it fails. Reconnect again, and it works fine… until it fails again.

Troubleshooting this issue wasn’t fun, precisely because it wasn’t fruitful. No amount of debugging helped, or, in fact, could ever help. The puzzle had to be solved by reason. Experimentation showed as much as I had already gathered from the logs. Still, I tried different scenarios. One thing was for sure, you couldn’t tell when it’ll fail next.

The Leak

Remember that HTTP is a connectionless protocol. That is, it’s designed to communicate a single request and its response and disconnect. This is the typical scenario. It holds no connections and no states, therefore, it has no session. On the web, sessions are realized by the HTTP server. An HTTP server would typically create some unique key upon login or first-request missing a key, and it would track all subsequent requests by getting the said key either in the URL or using cookies. In any event, even though a web-service may have support for sessions, the underlying protocol is still connectionless and stateless.

To improve performance, an afterthought of reusing connections was added. This is typically called Keep-Alive. The idea is that a flag is added to the HTTP header which tells the server not to close the connection immediately after responding to the request, anticipating further requests. This is reasonable, as a web-page typically loads multiple images and embedded items from the same server. The client and server supporting Keep-Alive reuse the same connection for several requests, until one of them closes the connection. What is most important in this scenario is that if either party doesn’t respect this hint, nothing would break. In fact, nothing would work any different, except, of course, for the extra connections and disconnections that would occur for each request.

Since the implementor of this feature was a savvy web developer, he always had this flag set. And so, as long as the connection wasn’t interrupted, or indeed, the underlying library we were using didn’t decide to close the connection on whim, all was well and we had no problems. However, when a new request went with a new connection, rather than an existing one, the Child’s server would accept a new socket, on a new port, rather than the previously open socket. This is what was happening on my test environment. Perhaps it was the fact that I was testing across VM images that triggered the disconnections. Anyway, this newly opened socket on the Child has no encryption details associated with it. It’s a brand-new connection. It should expect encryption key exchange. But due to implementation details, the request would have an ‘encrypted’ flag set and the Child wouldn’t mind that we negotiated no cryptographic keys. It’d go ahead and try to decipher the request, only, it couldn’t. Resulting in the logged encryption exception followed by disconnection.

Post Mortem

Once the issue was figured out, the solution was simple, albeit costly. The HTTP abstraction had leaked an ugly property that we had assumed abstracted away. At design time, we couldn’t care what protocol we used to carry our bits. Encryption was added almost as an afterthought. True that encryption does require a state. However looking at our code, the socket-level connection was abstracted by layers and layers of library code. In fact, all we had was a single static function which took a URL string for a request. We had serialized and encoded the request message in base-64 and appended to the URL, which contained the server hostname/ip and port; standard web request, really.

On the communication layer, we had this single URL construction and the request call. On the data layer, we had the encryption, serialization and data manipulation logic. On the application layer, well, there were no network details whatsoever. Most of the previous code which worked locally had remained the same, with the implementation changed to interface the new network layer. So in a sense the code evolved and adapted to its final form and it wasn’t anywhere near apparent that we had leaked a major problem into our code.

In hindsight, we should’ve taken matters into our hands and implement a session-based protocol directly. This would make sense because we’d be in complete control of all network matters. For one, with HTTP we couldn’t change the sockets to use async logic, nor could we change the buffer sizes and timeouts. Perhaps we didn’t need to, but considering the gigabytes/hour we expected to transfer, sooner or later we’d have to optimize and tune the system for performance. But, the developer assigned was inexperienced and we couldn’t afford the time overhead. Personally, I feared things would get too complicated for the assigned developer to handle. I let him pick the protocol he was most comfortable with. And that’s the real danger of leaky abstraction; everyone is tricked, including the experienced.

Indeed, we ended up rewriting the communication layer. First the HTTP code was replaced with plain sockets using TCP/IP. Next, sessions were added, such that disconnections were recoverable. That is, the data layer didn’t care whether communication was interrupted or not. We weren’t going to rely on the fact that we controlled the sockets. Disconnections were made irrelevant by design. And finally, our protocol required a strict sequence of initialization and handshake that insured correct state. Once the code was working as expected, we changed the sockets to use async interface for maximum performance.

Overall, we spent an extra 2 man/months and, as a result, the data+communication layer was sped up several times over. Still, this was one hell of a case of leaky abstraction.

Further Reading:


Update:
Many are asking why not use SSL? The answer is because HTTP was the wrong choice in the first place.
We weren’t building a web-service. This was a backend server communicating with the main product. We didn’t want to limit the protocol, features or extendability by the choice of communication details. SSL would resolve the encryption issue, but we’d have had to implement an HTTPS server. In addition, whatever application protocol we would eventually realize, it had to be connectionless. The encryption layer simply uncovered this design implication that we had overlooked, hence the leak in the communication protocol abstraction. At that point we didn’t have any control messages nor did we have requests that needed states, later we added both. In fact, we added a command to iterate over a large set of data, returning one item at a time. HTTP/S would’ve made this harder to implement, as the state would have had to be sent with each request. Control messages and heartbeats would’ve been pointless.
In short, HTTP gave us very little and the connectionless nature caused as a lot of trouble. We got rid of both. We exchanged a wrong solution with the right one, hardly reinventing the wheel, if you ask me.

May 022011
 

Data sharing between threads is a tricky business. Anyone with any kind of experience with multi-threaded code will give you a 1001 synonyms for “tricky,” most of which you probably wouldn’t use in front of your parents. The problem I’m about the present, however, has zero to do with threading and everything with data sharing and leaky abstraction.

This is a pattern that is used very often when one object is used symmetrically at the beginning and end of another’s lifetime. That is, suppose we have a class that needs to get notified when a certain other class is created, and then again when it’s destroyed. One way to achieve this, is to simply set a flag once to true and a second time to false, in the constructor and destructor of the second object, respectively.

This particular example is in C++ but that’s just to illustrate the pattern.

class Object
{
public:

Object(SomeComponent& comp) : m_component(comp)
{
    m_component.setOnline(true); // We’re online.
}

~Object()
{
    m_component.setOnline(false); // Offline.
}
};

This looks fool-proof, as there is no way the flag will not get set, so long that Object is created and destroyed as intended. Typically, our code will be used as follows:

Object* pObject = new Object(component);
// component knows we are online and processing...

delete pObject; // Go offline and cleanup.

Now let’s see how someone might use this class…

// use smart pointer to avoid memory leaks...
std::auto_ptr<object> apObject;

// Recreate a new object...
apObject.reset(new Object(component));

See a problem? The code fails miserably! And it’s not even obvious. Why? Because there are implicit assumptions and a leaky abstraction at work. Let’s dice the last line…

Object* temp_object = new Object(component); // create new Object
  Object::Object();
    component.setOnline(true);  // was already true!
delete apObject.ptr; // new instance passed to auto_ptr
  Object::~Object(); // old instance deleted
    component.setOnline(false); // OUCH!
apObject.ptr = temp_object;

See what happened?

Both authors wrote pretty straightforward code. They couldn’t have done better without making assumptions beyond the scope of their work. This is a pattern that is very easy to run into, and it’s far from fun. Consider how one could have detected the problem in the first place. It’s not obvious. The flag was set correctly, but sometimes would fail! That is, whenever there is an Object instance, and we create another one to replace the first, the flag ends up being false. The first time we create an Object, all works fine. The second time, component seems to be unaware of us setting the flag to true.

Someone noticed the failure, assumed the flag wasn’t always set, or may be incorrectly set, reviewed the class code and sure enough concluded that all was correct. Looking at the use-case of Object we don’t necessarily run through the guts of auto_ptr. After all, it’s a building block; a pattern; an abstraction of a memory block. One would take a quick look, see that an instance of Object is created and stored in an auto_ptr. Again, nothing out of the ordinary.

So why did the code fail?

The answer is on multiple levels. First and foremost we had a shared data that wasn’t reference counted. This is a major failing point. The shared data is a liability because it’s not in the abstraction of object instances. The very same abstraction assumptions that auto_ptr makes; it points to independent memory blocks. What we did is we challenged the assumptions that auto_ptr makes and failed to safe-guard our implicitly-shared data.

In other words, we had two instances of Object at the same time, but the flag we were updating had only two states: true and false. Thereby, it had no way of tracking anything beyond a single piece of information. In our case, we were tracking whether we were online or not. The author of Object made very dangerous assumptions. First and foremost, the assumption that the flag’s state is equivalent to Object’s lifetime proved to be very misleading. Because this raised the question of whether or not more than one instance of Object can exist. That question would have avoided a lot of problems down the road, however it wasn’t obvious and perhaps never occurred to anyone.

Second, even if we assume that there can logically be one instance of Object, without proving that it’s impossible to create second instances by means of language features, we are bound to misuse, as clearly happened here. And we can’t blame the cautious programmer who used auto_ptr either.

If something shouldn’t happen, prevent it by making it impossible to happen.

Solutions

The solutions aren’t that simple. An obvious solution is to take out the flag setting calls from within Object and call them manually. However this defies the point of having them where one couldn’t possibly forget or miss calling them, in case of a bug. Consider the case when we should set the flag to false when Object is destroyed, but this happens due to an exception, which automatically destroys the Object instance. In such a case, we should catch the exception and set the said flag to false. This, of course, is never as straight forward as one would like, especially in complex and mature production code. Indeed, using the automatic guarantees of the language (in this case calling the ctor and dtor automatically) are clearly huge advantages that we can’t afford to ignore.

One possible solution is to prevent the creation of Object more than once at a time. But this can be very problematic. Consider the case when we have multiple component instances, and we are interested in a different Object per component, not a globally unique Object instance.

As I said, no easy solution. The solution that I’d use is the next best thing to instance creation prevention. Namely, to count the number of instances. However, even if we reference count the Objects, or even the calls to setting the flag, in all events, we must redefine the contract. What does it mean to have multiple instance of Object and multiple calls to set the flag to true? Does it mean we still have one responsible object and what guarantees that? What if there are other constraints, might some other code assume only one instance of Object when that flag is set?

All of the questions that flow from our suggested solutions demand us to define, or redefine, the contracts and assumptions of our objects. And whatever solution we agree on, it will have its own set of requirements and perhaps even assumption, if we’re not too careful.

Conclusion

Using design patterns and best practices are without a doubt highly recommended. Yet ironically sometimes they may lead to the most unexpected results. This is no criticism of using such recommendations from experienced specialists and industry leaders, rather, it’s a result of combining abstractions in such a way that not only hides some very fundamental assumptions in our design and/or implementation, but even creates situations where some of the implicit assumptions of our code are challenged. The case presented is a good example. Had the developers not used the ctor/dtor pattern for setting the said flag, or had they not used auto_ptr, no such problem would’ve arisen. Albeit, they would have had other failure points, as already mentioned.

Admittedly, without experience it’s near impossible to catch similar cases simply by reading code or, preferably, while designing. And inexperience has no easy remedy. But if someone figures out a trick, don’t hesitate to contact me.

Apr 172011
 

With the introduction of .Net and a new, modern framework library, developers understandably were very cheerful. A shiny new ecosystem with mint library designed without any backwards compatibility curses or legacy support. Finally, a library to take us into the future without second guessing. Well, those were the hopes and dreams of the often too-optimistic and naive.

However, if you’d grant me those simplistic titles, you’d understand my extreme disappointment when the compiler barfed on my AddRange call on HttpWebRequest with a long parameter. Apparently HttpWebRequest lacks 64-bit AddRange member.

Surely this was a small mistake, a relic from the .Net 1.0 era. Nope, it’s in 2.0 as well. Right then, I should be used 3.5. What’s wrong with me using 2.0, such an outdated version. Wrong again, I am using 3.5. But I need to resume downloads on 7GB+ files!

To me, this is truly a shocking goof. After all, .Net is supposed to be all about the agility and modernity that is the web. Three major releases of the framework and no one put a high-priority tag on this missing member? Surely my panic was exaggerated. It must be. There is certainly some simple workaround that everyone is using that makes this issue really a low-priority one.

Right then, the HttpWebRequest is a WebRequest, and I really don’t need a specialized function to set an HTTP header. Let’s set the header directly:

            HttpWebRequest request = WebRequest.Create(Uri) as HttpWebRequest;

            request.Headers["Range"] = "bytes=0-100";

To which, .Net responded with the following System.ArgumentException:

This header must be modified using the appropriate property.

Frustration! Luckily, somebody ultimately took notice of this glaring omission and added the AddRange(long, long); function to .Net 4.0.

So where does this leave us? Seems that I either have to move to .Net 4.0, write my own HttpWebRequest replacement or avoid large files altogether. Unless, that is, I find a hack.

Different solutions do exist to this problem on the web, but the most elegant one was this:

        /// <summary>
        /// Sets an inclusive range of bytes to download.
        /// </summary>
        /// <param name="request">The request to set the range to.</param>
        /// <param name="from">The first byte offset, -ve to omit.</param>
        /// <param name="to">The last byte offset, less-than from to omit.</param>
        private static void SetWebRequestRange(HttpWebRequest request, long from, long to)
        {
            string first = from >= 0 ? from.ToString() : string.Empty;
            string last = to >= from ? to.ToString() : string.Empty;

            string val = string.Format("bytes={0}-{1}", first, last);

            Type type = typeof(WebHeaderCollection);
            MethodInfo method = type.GetMethod("AddWithoutValidate", BindingFlags.Instance | BindingFlags.NonPublic);
            method.Invoke(request.Headers, new object[] { "Range", val });
        }

Since there were apparently copied pages with similar solutions, I’m a bit hesitant to give credit to any particular page or author in fear of giving credit to a plagiarizer. In return, I’ve improved the technique and put it into a flexible function. In addition, I’ve wrapped WebResponse into a reusable Stream class that plays better with non-network streams. In particular, my WebStream supports reading the Length and Position members and returns the correct results. Here is the full source code:

// --------------------------------------------------------------------------------------
// <copyright file="WebStream.cs" company="Ashod Nakashian">
// Copyright (c) 2011, Ashod Nakashian
// All rights reserved.
// 
// Redistribution and use in source and binary forms, with or without modification,
// are permitted provided that the following conditions are met:
// 
// o Redistributions of source code must retain the above copyright notice, 
// this list of conditions and the following disclaimer.
// o Redistributions in binary form must reproduce the above copyright notice, 
// this list of conditions and the following disclaimer in the documentation and/or
// other materials provided with the distribution.
// o Neither the name of the author nor the names of its contributors may be used to endorse
// or promote products derived from this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
// OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT 
// SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, 
// INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
// LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
// </copyright>
// <summary>
//   Wraps HttpWebRequest and WebResponse instances as Streams.
// </summary>
// --------------------------------------------------------------------------------------

namespace Web
{
    using System;
    using System.IO;
    using System.Net;
    using System.Reflection;

    /// <summary>
    /// HTTP Stream, wraps around HttpWebRequest.
    /// </summary>
    public class WebStream : Stream
	{
		public WebStream(string uri)
            : this(uri, 0)
		{
		}

        public WebStream(string uri, long position)
		{
			Uri = uri;
			position_ = position;
		}

        #region properties

        public string Uri { get; protected set; }
        public string UserAgent { get; set; }
        public string Referer { get; set; }

        #endregion // properties

        #region Overrides of Stream

        /// <summary>
        /// When overridden in a derived class, clears all buffers for this stream and causes any buffered data to be written to the underlying device.
        /// </summary>
        /// <filterpriority>2</filterpriority>
        public override void Flush()
        {
        }

        /// <summary>
        /// When overridden in a derived class, sets the position within the current stream.
        /// </summary>
        /// <returns>
        /// The new position within the current stream.
        /// </returns>
        /// <param name="offset">A byte offset relative to the <paramref name="origin"/> parameter.</param>
        /// <param name="origin">A value of type <see cref="T:System.IO.SeekOrigin"/> indicating the reference point used to obtain the new position.</param>
        /// <filterpriority>1</filterpriority>
        /// <exception cref="NotImplementedException"><c>NotImplementedException</c>.</exception>
        public override long Seek(long offset, SeekOrigin origin)
        {
            throw new NotImplementedException();
        }

        /// <summary>
        /// When overridden in a derived class, sets the length of the current stream.
        /// </summary>
        /// <param name="value">The desired length of the current stream in bytes.</param>
        /// <filterpriority>2</filterpriority>
        /// <exception cref="NotImplementedException"><c>NotImplementedException</c>.</exception>
        public override void SetLength(long value)
        {
            throw new NotImplementedException();
        }

        /// <summary>
        /// When overridden in a derived class, reads a sequence of bytes from the current stream and advances the position within the stream by the number of bytes read.
        /// </summary>
        /// <returns>
        /// The total number of bytes read into the buffer. This can be less than the number of bytes requested if that many bytes are not currently available, or zero (0) if the end of the stream has been reached.
        /// </returns>
        /// <param name="buffer">An array of bytes. When this method returns, the buffer contains the specified byte array with the values between <paramref name="offset"/> and (<paramref name="offset"/> + <paramref name="count"/> - 1) replaced by the bytes read from the current source.</param>
        /// <param name="offset">The zero-based byte offset in <paramref name="buffer"/> at which to begin storing the data read from the current stream.</param>
        /// <param name="count">The maximum number of bytes to be read from the current stream.</param>
        /// <filterpriority>1</filterpriority>
        /// <exception cref="System.ArgumentException">The sum of offset and count is larger than the buffer length.</exception>
        /// <exception cref="System.ArgumentNullException">buffer is null.</exception>
        /// <exception cref="System.ArgumentOutOfRangeException">offset or count is negative.</exception>
        /// <exception cref="System.NotSupportedException">The stream does not support reading.</exception>
        /// <exception cref="System.ObjectDisposedException">Methods were called after the stream was closed.</exception>
		public override int Read(byte[] buffer, int offset, int count)
		{
            if (stream_ == null)
            {
                Connect();
            }

            try
            {
                if (stream_ != null)
                {
                    int read = stream_.Read(buffer, offset, count);
                    position_ += read;
                    return read;
                }
            }
            catch (WebException)
            {
                Close();
            }
            catch (IOException)
            {
                Close();
            }

            return -1;
		}

        /// <summary>
        /// When overridden in a derived class, writes a sequence of bytes to the current stream and advances the current position within this stream by the number of bytes written.
        /// </summary>
        /// <param name="buffer">An array of bytes. This method copies <paramref name="count"/> bytes from <paramref name="buffer"/> to the current stream.</param>
        /// <param name="offset">The zero-based byte offset in <paramref name="buffer"/> at which to begin copying bytes to the current stream.</param>
        /// <param name="count">The number of bytes to be written to the current stream.</param>
        /// <filterpriority>1</filterpriority>
        /// <exception cref="NotImplementedException"><c>NotImplementedException</c>.</exception>
        public override void Write(byte[] buffer, int offset, int count)
        {
            throw new NotImplementedException();
        }

        /// <summary>
        /// When overridden in a derived class, gets a value indicating whether the current stream supports reading.
        /// Always returns true.
        /// </summary>
        /// <returns>
        /// true if the stream supports reading; otherwise, false.
        /// </returns>
        /// <filterpriority>1</filterpriority>
        public override bool CanRead
        {
            get { return true; }
        }

        /// <summary>
        /// When overridden in a derived class, gets a value indicating whether the current stream supports seeking.
        /// Always returns false.
        /// </summary>
        /// <returns>
        /// true if the stream supports seeking; otherwise, false.
        /// </returns>
        /// <filterpriority>1</filterpriority>
        public override bool CanSeek
        {
			get { return false; }
        }

        /// <summary>
        /// When overridden in a derived class, gets a value indicating whether the current stream supports writing.
        /// Always returns false.
        /// </summary>
        /// <returns>
        /// true if the stream supports writing; otherwise, false.
        /// </returns>
        /// <filterpriority>1</filterpriority>
        public override bool CanWrite
        {
			get { return false; }
        }

        /// <summary>
        /// When overridden in a derived class, gets the length in bytes of the stream.
        /// </summary>
        /// <returns>
        /// A long value representing the length of the stream in bytes.
        /// </returns>
        /// <exception cref="T:System.ObjectDisposedException">Methods were called after the stream was closed.</exception>
        /// <filterpriority>1</filterpriority>
        public override long Length
        {
            get { return webResponse_.ContentLength; }
        }

        /// <summary>
        /// When overridden in a derived class, gets or sets the position within the current stream.
        /// </summary>
        /// <returns>
        /// The current position within the stream.
        /// </returns>
        /// <filterpriority>1</filterpriority>
        /// <exception cref="NotSupportedException"><c>NotSupportedException</c>.</exception>
        public override long Position
        {
			get { return position_; }
			set { throw new NotSupportedException(); }
        }

        #endregion // Overrides of Stream

        #region operations

        /// <summary>
        /// Reads the full string data at the given URI.
        /// </summary>
        /// <returns>The full contents of the given URI.</returns>
        public static string ReadToEnd(string uri, string userAgent, string referer)
        {
            using (WebStream ws = new WebStream(uri, 0))
            {
                ws.UserAgent = userAgent;
                ws.Referer = referer;
                ws.Connect();

                using (StreamReader reader = new StreamReader(ws.stream_))
                {
                    return reader.ReadToEnd();
                }
            }
        }

        /// <summary>
        /// Writes the full data at the given URI to the given stream.
        /// </summary>
        /// <returns>The number of bytes written.</returns>
        public static long WriteToStream(string uri, string userAgent, string referer, Stream stream)
        {
            using (WebStream ws = new WebStream(uri, 0))
            {
                ws.UserAgent = userAgent;
                ws.Referer = referer;
                ws.Connect();

                long total = 0;
                byte[] buffer = new byte[64 * 1024];
                int read;
                while ((read = ws.stream_.Read(buffer, 0, buffer.Length)) > 0)
                {
                    stream.Write(buffer, 0, read);
                    total += read;
                }

                return total;
            }
        }

        #endregion // operations

        #region implementation

        protected override void Dispose(bool disposing)
        {
            base.Dispose(disposing);

            if (stream_ != null)
            {
                stream_.Dispose();
                stream_ = null;
            }
        }

        private void Connect()
        {
            Close();

            HttpWebRequest request = WebRequest.Create(Uri) as HttpWebRequest;
            if (request == null)
            {
                return;
            }

            request.UserAgent = UserAgent;
            request.Referer = Referer;
            if (position_ > 0)
            {
                SetWebRequestRange(request, position_, 0);
            }

            webResponse_ = request.GetResponse();
            stream_ = webResponse_.GetResponseStream();
        }

        /// <summary>
        /// Sets an inclusive range of bytes to download.
        /// </summary>
        /// <param name="request">The request to set the range to.</param>
        /// <param name="from">The first byte offset, -ve to omit.</param>
        /// <param name="to">The last byte offset, less-than from to omit.</param>
        private static void SetWebRequestRange(HttpWebRequest request, long from, long to)
        {
            string first = from >= 0 ? from.ToString() : string.Empty;
            string last = to >= from ? to.ToString() : string.Empty;

            string val = string.Format("bytes={0}-{1}", first, last);

            Type type = typeof(WebHeaderCollection);
            MethodInfo method = type.GetMethod("AddWithoutValidate", BindingFlags.Instance | BindingFlags.NonPublic);
            method.Invoke(request.Headers, new object[] { "Range", val });
        }

        #endregion // implementation

        #region representation

        private long position_;
        private WebResponse webResponse_;
        private Stream stream_;

        #endregion // representation
	}
}

I hope this saves someone some frustration and perhaps even time writing this handy class. Enjoy.

Apr 102011
 

The WordPress plugin I use to manage my reading list is an updated version of the popular Now Reading plugin, called Now Reading Reloaded.

Original NRR book management page.

The original Now Reading plug was developed by Rob Miller who stopped maintaining it at around WP2.5. Luckily, Ben Gunnink at heliologue.com picked up where Rob left off and gave us Now Reading Reloaded (NRR). Unfortunately, it seems that the maintainer has no time to donate to the project and ceased maintaining it.

As one can see, the plugin is a great way to organize and share reading lists. This is exactly what I was looking for when I searched the WP plugin database. Up until then I was tracking my reading lists using simple lists in WP posts and pages. The database back-end of NRR gives it much more potential than one can hope to achieve with simple lists. This is not to say anything of the Amazon search-n-add feature with link and thumbnail of the book cover. All very welcome features.

Limitations

Unfortunately, NRR is far from perfect. One can run into it’s quirks multiple times during a single book update. But, for the price, I can’t complain. None of the issues were too big to annoy me enough to get my hands dirty debugging and patching the code. None, that is, except for a little obvious feature conspicuously missing. By the time I added most of the books I had in my lists and started updating what I read and adding current-reading items, I wished I didn’t have to jump from page to page until I found the book I had just finished to update its status. That is, I wished I could simply sort my library by status, started-reading or finished-reading dates. Even better, I wished by default the library showed me the latest books I started reading, which I was most probably interested in updating.

While open-source programs are great for too many reasons to list here, they all suffer from the same problem. Namely, the freedom to update the code also, by definition, forks and diverges it from the source. This immediately adds the overhead of merging any updates or bug-fixes added to the source to your hand-modified version. However, since it seems that NRR is no longer maintained, I had no reason to fear such a hustle and I still had all the reasons to get sorting functionality in the admin library page, also known as the Manage Books page.

Figuring out the code

First test with new sorting code.

First I had to figure out the wheres and the hows of the code. I wasn’t familiar with NRR, so I had some detective work ahead of me. First, I browsed the page in question: wp-admin/admin.php?page=manage_books. I browsed the pages and noticed how the URL was formed. Apparently, selecting the page to display is chosen by specifying a ‘p’ argument and the index of the page. For example, the second page would be /wp-admin/admin.php?page=manage_books&p=3. Next I had to find the PHP file responsible for generating this page. Since each plugin is stored in its own separate folder, first place to look in was the NRR plugin folder within my WP installation. On my WP3.1 installation, this folder is at /wp-content/plugins/now-reading-reloaded/admin. Within this folder a few PHP files exist with the “admin” prefix. The file “admin-manage.php” seems a reasonable start. I looked for any table construction code that resembles the Manage Books page, and sure enough it’s right there.

Patching

Page sorting works.

(Note: Breaking the PHP files may leave your WP in limbo. Make sure you have backup of your site before you make manual changes, and that you can replace broken files via ssh or ftp.)

The key to adding sorting is to figure out how the database query is generated. This, as it turns out, wasn’t very difficult to find. Within the admin-manage.php file the get_books function contains the complete query

$books = get_books("num=-1&status=all&orderby=status&order=desc{$search}{$pageq}{$reader}");

From this, it’s quite obvious which filter is responsible for what. The ‘orderby’ filter selects the sorting column, ‘order’ decides the direction of sorting and the rest are for searching, pagination etc. Instead of using hard-coded values, we need to get these values from the browser. Let’s add a couple of optional parameters to the page:

            if ( empty($_GET['o']) )
                $order = 'desc';
            else
                $order = urlencode($_GET['o']);

            if ( empty($_GET['s']) )
                $orderby = 'started';
            else
                $orderby = urlencode($_GET['s']);

So ‘o’ will decide the ‘order’ and ‘s’ the ‘orderby’ value. Before I move on, I have to test that this is working in practice and not just in theory. Manually loading the page with the newly added parameters give the expected results. /wp-admin/admin.php?page=manage_books&s=author&o=asc loads as expected the table of books, sorted by the author name in ascending order. Now, all we have to do is add links to the column headers and we can get a working page.

			if ( $order == 'asc' )
				$new_order = 'desc';
			else
				$new_order = 'asc';

			$book_link = "{$nr_url->urls['manage']}&p=$page&s=book&o=$new_order";
			$author_link = "{$nr_url->urls['manage']}&p=$page&s=author&o=$new_order";
			$added_link = "{$nr_url->urls['manage']}&p=$page&s=added&o=$new_order";
			$started_link = "{$nr_url->urls['manage']}&p=$page&s=started&o=$new_order";
			$finished_link = "{$nr_url->urls['manage']}&p=$page&s=finished&o=$new_order";
			$status_link = "{$nr_url->urls['manage']}&p=$page&s=status&o=$new_order";

            echo '
				<table class="widefat post fixed" cellspacing="0">
					<thead>
						<tr>
							<th></th>
							<th class="manage-column column-title"><a class="manage_books" href="'. $book_link .'">Book</a></th>
							<th class="manage-column column-author"><a class="manage_books" href="'. $author_link .'">Author</a></th>
							<th><a class="manage_books" href="'. $added_link .'">Added</a></th>
							<th><a class="manage_books" href="'. $started_link .'">Started</a></th>
							<th><a class="manage_books" href="'. $finished_link .'">Finished</a></th>
							<th><a class="manage_books" href="'. $status_link .'">Status</a></th>
						</tr>
					</thead>
					<tbody>
			';

Notice that I didn’t forget to invert the current sorting order in the link. This is pretty straight forward. Reloaded the page, tested the header links and sure enough all was well. One thing missing is the page selection links – they don’t obey the current sorting. The page selection links had to be patched as well:

                $pages .= " <a class='page-numbers prev' href='{$nr_url->urls['manage']}&p=$previous&s=$orderby&o=$order'>«</a>";
                $pages .= " <a class='page-numbers' href='{$nr_url->urls['manage']}&p=$i&s=$orderby&o=$order'>$i</a>";
                $pages .= " <a class='page-numbers next' href='{$nr_url->urls['manage']}&p=$next&s=$orderby&o=$order'>»</a>";

New default book management page view.

Download

Perfecto! Works great.

One last thing I thought would be useful was to reorder the columns, such that it’s “Book, Author, Status, Started, Finished, Added” instead of the default “Book, Author, Added, Started, Finished, Status”. Because frankly, I don’t care much when I added a book, I care most about its current status and when I started reading it. Once I’m done, I want to set the Finished date and move on to the next. After reordering the columns, I set the default sorting on the Started date and in descending order, as you can see in the screenshot.

Download NRR v5.1.3.2 with the sorting patch.

Mar 112011
 

A product of ours had a serious problem. We could reproduce it by running the main application for several hours, the result of which was that a back-end service was crashing. Initial investigation showed that there was excessive memory usage. Leaks were hunted and we tried to patch and improve all possible situations that may result in leaks.

This isn’t as simple as I make it sounds. There are many cases with exceptions being throws from the network layer and these exceptions bring the whole stack undone. Sure we’re using smart pointers all over, but still, there are some tricky situations with cached objects, COM, network buffers and whatnot.

Memory leaks aside, I noticed something more disturbing: Handle leaks. Memory leaks would affect the ill application, but the system is more or less immune until the culprit crashes in heaps of smoke. But if you have kernel handles leaking, then you are bound to have an unstable system. This is the type of leak that one must resolve ASAP. Apparently, we were leaking kernel handles at an alarming rate.

Windbg screenshot

Third Party?

An immediate suspect is a 3rd party library we were using to handle some MAPI chores. Could that be the source of the leaks? Sure it’s an easy call, but without the right research it’s no use. Plus, replacing that library is not trivial. And who’s to say we’d do a better job rewriting it ourselves? If leaks are such a hard issue to notice and fix, who’s to say we won’t have a bunch more ourselves? After all, rewriting the library functions means we have to rewrite both the good and the bad ones. There is no marker to hint us where the leak is either. So really, we aren’t better off rewriting anything ourselves.

In addition, it’s not very difficult to proof whether or not the said library is to blame. Some investigation was warranted. First thing was to get some stack traces on the leaks. With the stack trace we can tell where the leaked object construction was coming from. This can be done using windbg, the great debugger. Windbg has a handle tracing extension called htrace. After loading our service, run as a console for simplicity, I hit “!htrace -enable” in windbg. “g” to run the application. I run it through a cycle of data processing to let it do it’s necessary allocations and settle. Back in windbg, Break. Now we can set a marker to track the handles “!htrace -snapshot” then “g” again to run it back. Another cycle of data processing and we break it again in windbg. Time to collect the data with “!htrace -diff”.

Sure enough I got a rather long list of leaked objects. Copied the list into my favorite text editor for eyeballing. Looks like a bunch of them were duplicates. I looked very carefully to where the handles were allocated, I was looking for evidence that the 3rd party library was allocating, and hence leaking, them. Well, turns out that there was very little reason to think that the leaks were from the unnamed 3rd party library, and every reason to suspect MAPI was the culprit. These lines sort of gave it away:

0x7c80ea2b: kernel32!CreateMutexA+0x00000065
0x35f7452f: MSMAPI32!SzGPrev+0x00000293

 

The Leak

Ok, so we suspect MAPI32 to be the source, now what? We need to prove it. But before that, let’s examine the leaks. Procexp, the great task manager replacement by Russinovich, would be all we need. I fired it up and run the application again. In procexp I noticed the number of handles before and after each cycle. Sure enough the number was growing. In procexp one can see new items and deleted items in (the default) green and red colors. This is called highlight. The duration for highlight difference is set to 2 seconds by default. I maxed the duration to 9 seconds to give me enough time to pause procexp and prevent it from updating the data. The space bar is the shortcut to pause/resume refreshing.

Ready. Now we need to select the application from the processes list and bring up the handle view. Here we see all the handles allocated by the application. Anything new will show in green anything deleted in red. Right now we’re interested in the green stuff that never go away (even though the color will revert to the default after 9 seconds).

I run another data cycle and sure enough a bunch of reds and greens showed up. Sure enough a few of them remained. 3 to be exact. Looking at the type, they were “Mutant”, “Mutant” and “Section”. Obscure? Not really. Mutant is the internal name for Mutex and a Section is a memory block backed with a file, most probably a mapped file view.

These aren’t exactly light-weight objects to leak.

But our work isn’t over yet. We need to figure out a workaround. Let’s fire up visual studio to type some code. What we need to do is to simulate the execution of the key MAPI functions that we know the intermediate library must execute to carry out the operations we ask it to perform. To make things easy for me, I assigned a number to each operation, such that on the console I could hit a number and get an operation executed. I added both the operation and cleanup as separate commands. Here is some code.

int _tmain(int argc, _TCHAR* argv[])
{
	HRESULT hr;

	int count = 1;
	while (true)
	{
		char c = getch();

		switch (c)
		{
			case '0':
			{
				std::cout << "Calling MAPIUninitialize." << std::endl;
				::MAPIUninitialize();
			}
			break;

			case '1':
			{
				std::cout << "Calling MAPIInitialize." << std::endl;
				MAPIINIT_0 mapiinit= { MAPI_INIT_VERSION, MAPI_NT_SERVICE | MAPI_MULTITHREAD_NOTIFICATIONS };
				hr = ::MAPIInitialize(mapiinit);
			}
			break;

			case '2':
			{
				std::cout << "Calling MAPILogonEx." << std::endl;
				LPMAPISESSION lpMAPISession = NULL;

				ULONG ulFlags = MAPI_NO_MAIL |
								MAPI_NEW_SESSION |
								MAPI_EXPLICIT_PROFILE |
								MAPI_EXTENDED |
								//MAPI_NT_SERVICE |
								MAPI_LOGON_UI;

				hr = ::MAPILogonEx(0,
					  NULL,//profile name
					  NULL,//password - This parameter should ALWAYS be NULL
					  ulFlags,
					  &lpMAPISession);//handle of session
				if (FAILED(hr))
				{
					std::cout << "Failed to logon " << hr << std::endl;
				}

				UlRelease(lpMAPISession);
			}
			break;
		}
	}

	::CoUninitialize();
	return 0;
}

I omitted the other cases, since it’s redundant. You get the idea. Basically, after adding about 10 cases, I started playing around with the different commands to simulate a run of our application. Procexp was monitoring the number of handles and I was hitting on the num-pad to simulate the application. Soon I noticed a trend, and a trend I was looking for. Turns out MAPIInitialize was allocating a lot of handles and MAPIUninitialize wasn’t releasing all of them.

Lo and behold, turns out they are 3 handles and of the same type as I traced in the application!

Now that we strongly suspect MAPIInitialize to be the source of the leak, let’s put the final nail in the proof’s proverbial coffin. Of course, using windbg. Back to windbg, this time let’s load the test application which helped us suspect MAPIInitialize in the first place. Same drill. Enable htrace, run the application, call MAPIInitialize by hitting ‘1’, call MAPIUninitialize by hitting ‘0’, break in windbg, take a snapshot, run the application, hit ‘1’ then ‘0’ again, break in windbg and finally get a diff from the snapshot. Here is the full run:

Microsoft (R) Windows Debugger Version 6.11.0001.404 X86
Copyright (c) Microsoft Corporation. All rights reserved.

CommandLine: C:\Test\MapiTest\Debug\MapiTest.exe
Symbol search path is: SRV*C:\WINDOWS\Symbols*http://msdl.microsoft.com/download/symbols
Executable search path is:
ModLoad: 00400000 0041f000   MapiTest.exe
ModLoad: 7c900000 7c9b2000   ntdll.dll
ModLoad: 7c800000 7c8f6000   C:\WINDOWS\system32\kernel32.dll
ModLoad: 61e00000 61e1f000   C:\WINDOWS\system32\MAPI32.dll
ModLoad: 77dd0000 77e6b000   C:\WINDOWS\system32\ADVAPI32.dll
ModLoad: 77e70000 77f02000   C:\WINDOWS\system32\RPCRT4.dll
ModLoad: 77fe0000 77ff1000   C:\WINDOWS\system32\Secur32.dll
ModLoad: 7e410000 7e4a1000   C:\WINDOWS\system32\USER32.dll
ModLoad: 77f10000 77f59000   C:\WINDOWS\system32\GDI32.dll
ModLoad: 774e0000 7761d000   C:\WINDOWS\system32\ole32.dll
ModLoad: 77c10000 77c68000   C:\WINDOWS\system32\msvcrt.dll
ModLoad: 77120000 771ab000   C:\WINDOWS\system32\OLEAUT32.dll
ModLoad: 10480000 10537000   C:\WINDOWS\system32\MSVCP100D.dll
ModLoad: 10200000 10372000   C:\WINDOWS\system32\MSVCR100D.dll
(a98.864): Break instruction exception - code 80000003 (first chance)
eax=00251eb4 ebx=7ffdf000 ecx=00000003 edx=00000008 esi=00251f48 edi=00251eb4
eip=7c90120e esp=0012fb20 ebp=0012fc94 iopl=0         nv up ei pl nz na po nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000202
ntdll!DbgBreakPoint:
7c90120e cc              int     3
0:000> !htrace -enable
Handle tracing enabled.
Handle tracing information snapshot successfully taken.
0:000> g
ModLoad: 76390000 763ad000   C:\WINDOWS\system32\IMM32.DLL
ModLoad: 3fde0000 40221000   C:\WINDOWS\system32\MSI.DLL
ModLoad: 35f70000 360bd000   C:\Program Files\Common Files\SYSTEM\MSMAPI\1033\MSMAPI32.DLL
ModLoad: 74720000 7476c000   C:\WINDOWS\system32\MSCTF.dll
ModLoad: 3fde0000 40221000   C:\WINDOWS\system32\msi.dll
ModLoad: 35e80000 35f40000   C:\Program Files\Common Files\SYSTEM\MSMAPI\1033\MAPIR.DLL
ModLoad: 773d0000 774d3000   C:\WINDOWS\WinSxS\x86_Microsoft.Windows.Common-Controls_6595b64144ccf1df_6.0.2600.5512_x-ww_35d4ce83\Comctl32.dll
ModLoad: 77f60000 77fd6000   C:\WINDOWS\system32\SHLWAPI.dll
ModLoad: 77c00000 77c08000   C:\WINDOWS\system32\version.dll
ModLoad: 755c0000 755ee000   C:\WINDOWS\system32\msctfime.ime
(a98.ac): Break instruction exception - code 80000003 (first chance)
eax=7ffdf000 ebx=00000001 ecx=00000002 edx=00000003 esi=00000004 edi=00000005
eip=7c90120e esp=00a4ffcc ebp=00a4fff4 iopl=0         nv up ei pl zr na pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=0038  gs=0000             efl=00000246
ntdll!DbgBreakPoint:
7c90120e cc              int     3
Missing image name, possible paged-out or corrupt data.
0:001> !htrace -snapshot
Handle tracing information snapshot successfully taken.
0:001> g
ModLoad: 3fde0000 40221000   C:\WINDOWS\system32\msi.dll
ModLoad: 35e80000 35f40000   C:\Program Files\Common Files\SYSTEM\MSMAPI\1033\MAPIR.DLL
(a98.c08): Break instruction exception - code 80000003 (first chance)
eax=7ffdf000 ebx=00000001 ecx=00000002 edx=00000003 esi=00000004 edi=00000005
eip=7c90120e esp=00a4ffcc ebp=00a4fff4 iopl=0         nv up ei pl zr na pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=0038  gs=0000             efl=00000246
ntdll!DbgBreakPoint:
7c90120e cc              int     3
Missing image name, possible paged-out or corrupt data.
Missing image name, possible paged-out or corrupt data.
0:001> !htrace -diff
Handle tracing information snapshot successfully taken.
0xdd new stack traces since the previous snapshot.
Ignoring handles that were already closed...
Outstanding handles opened since the previous snapshot:
--------------------------------------
Handle = 0x000001bc - OPEN
Thread ID = 0x00000d9c, Process ID = 0x00000a98

0x7c80ea2b: kernel32!CreateMutexA+0x00000065
*** ERROR: Symbol file could not be found.  Defaulted to export symbols for C:\Program Files\Common Files\SYSTEM\MSMAPI\1033\MSMAPI32.DLL -
0x35f74552: MSMAPI32!SzGPrev+0x000002b6
0x35f74476: MSMAPI32!SzGPrev+0x000001da
0x35f76c46: MSMAPI32!FOpenThreadImpersonationToken+0x00000fb9
0x35f76bc9: MSMAPI32!FOpenThreadImpersonationToken+0x00000f3c
0x35f76ba3: MSMAPI32!FOpenThreadImpersonationToken+0x00000f16
0x35f7662e: MSMAPI32!FOpenThreadImpersonationToken+0x000009a1
--------------------------------------
Handle = 0x000001c4 - OPEN
Thread ID = 0x00000d9c, Process ID = 0x00000a98

0x7c80ea2b: kernel32!CreateMutexA+0x00000065
0x35f7452f: MSMAPI32!SzGPrev+0x00000293
0x35f74476: MSMAPI32!SzGPrev+0x000001da
0x35f76c46: MSMAPI32!FOpenThreadImpersonationToken+0x00000fb9
0x35f76bc9: MSMAPI32!FOpenThreadImpersonationToken+0x00000f3c
0x35f76ba3: MSMAPI32!FOpenThreadImpersonationToken+0x00000f16
0x35f7662e: MSMAPI32!FOpenThreadImpersonationToken+0x000009a1
--------------------------------------
Handle = 0x000001cc - OPEN
Thread ID = 0x00000d9c, Process ID = 0x00000a98

0x7c80955f: kernel32!CreateFileMappingA+0x0000006e
0x35f744d4: MSMAPI32!SzGPrev+0x00000238
0x35f74476: MSMAPI32!SzGPrev+0x000001da
0x35f76c46: MSMAPI32!FOpenThreadImpersonationToken+0x00000fb9
0x35f76bc9: MSMAPI32!FOpenThreadImpersonationToken+0x00000f3c
0x35f76ba3: MSMAPI32!FOpenThreadImpersonationToken+0x00000f16
0x35f7662e: MSMAPI32!FOpenThreadImpersonationToken+0x000009a1
--------------------------------------
Displayed 0x3 stack traces for outstanding handles opened since the previous snapshot.

Well, there they are “0x3 stack traces for outstanding handles” all coming solidly from within MSMAPI32.dll. Now that we know who’s responsible for the allocations, let’s see if it’s any of our settings that is causing this. In other words, is there some flag that we can use or remove that might fix this issue?

Fortunately, there are only 3 flags that can be set in MAPIINIT_0. Since MAPI_NT_SERVICE is absolutely necessary, we are left with only two others: MAPI_NO_COINIT and MAPI_MULTITHREAD_NOTIFICATIONS. Trying the combinations of these flags, in addition to the default when MAPIINIT_0 is not set, unfortunately, doesn’t do away with the leak.

Another important fact is that I was using an old MAPI provider. Since we support Outlook 2003 as the minimum version for a MAPI provider, that’s what I was using. I had no service packs installed. The relevant DLLs and their respective version numbers that resulted in the aforementioned behavior are the following:

  • MSMAPI32.dll – 11.0.5601.0
  • MSO.dll – 11.0.5606.0
  • MSPST32.dll – 11.0.5604.0

Conclusion

Tracking memory leaks can be painful, but with the help of the right tools it could be very organized and rewarding. With the help of windbg we easily tracked the memory leak to the source, which turned out to be the MAPI DLL. Unfortunately, the only thing we might be able to do is to avoid calling MAPIInitialize more than once per process. Another is to upgrade the MAPI DLL, with the hope that Microsoft has fixed these issues. But until then, our clients would appreciate if we had found workarounds. Understanding and pinpointing the problem gives the developers the edge necessary to devise a workaround for our customers. And a workaround we implemented.

Update (April 02, 2011):

While trying to figure out workarounds of this issue, I ran into The Intentional Memory Leak by Stephen Griffin, the author of MfcMapi. Apparently, they left the leak in because some ancient applications (going back to the early 1990’s) were accessing freed memory. To avoid penalizing bad code, it was decided to make everyone else pay the price in leaks. This solution isn’t new or specific to MAPI, many other cases exist on OS level as well as other platforms. Still, it doesn’t fail to frustrate.

Mar 012011
 

Lately my colleagues have been commenting on my relentless notes on code styling in our internal code-reviews. Evidently I’ve been paying more and more attention to what could rightly be called cosmetics. Sure there are bad conventions and, well, better ones. Admittedly, coding conventions are just that, conventions. No matter how much we debate tabs vs. spaces and how much we search for truth, they remain subjective and very much a personal preference. Yet, one can’t deny that there are some better conventions that do in fact improve the overall quality of the code.

But code quality is not what worries me. Not as much as worrying about eye strain and accidental misreadings. Both of which are also factors in the overall code quality, no doubt. It’s not rare that we find ourselves agonizing over a mistake that if it weren’t for haste and misunderstanding, could’ve been completely avoided.

C Sample Code: Function Parameter Duplication.

Image via Wikipedia

Developers don’t write code, not just. We spend the majority of our time adding, removing and editing small pieces of code. In fact, most of the time we do small changes in an otherwise large code-base. Rarely do we write new independent pieces of code, unless of course we are starting a project from scratch. Most projects are legacy projects in the sense that they are passed from generations of developers to the next. We add new functions to implement new features, rewrite some and edit some others to meet our needs. All of which means reading a great deal of code to understand which statement is doing what and decide the best change-set to get the output we plan. But even when we work on a completely independent module or class, once we have the smallest chunk of code written, we already need to go back and forth to interweave the remaining code.

Oddly, as I grew in experience I also spent increasingly more time reading code before making additions or changes. This seems reasonable in hindsight. It’s much easier to write new code, much more difficult to try to understand existing code and to find ways to adapt them. Experience tells us that the easiest route is not necessarily the best in the long run. Sure, we might get faster results if we write that case-insensitive string comparison function for the 30th time whenever we need one, or so we might think. But the fact is that chances are that we’ll get a few incorrect results as well, which we might not even notice until it’s too late. With a little effort, we should find a good implementation that we can reuse and learn how to use it. Indeed, that implementation might very well be in our code-base. My example of string comparison is not all that artificial, considering that some languages don’t come with standard string comparison functions, with Unicode, case and locale options. Some languages do have standardized (or at least canonical) functions, but you need to explicitly include the library reference in your projects and that might involve SCM and build server issues. Not fun when the deadline is looming, and when exactly was the last time we didn’t have an imminent deadline? (Notice that I do not mean to imply that DIY coding is a deadly sin. It probably is in most cases, but surely there are good reasons to write your own. Like early optimization, the first two rules are don’t do it. But that’s another topic for another day.)

Another great side-effect to reading code is that one will eventually find bugs and at least report them, if not fix them on the spot (and send for review). So instead of creating new sources of bugs, reuse reduces code duplication and possibly resolves old bugs.

When we anticipate and reasonably-accurately estimate the overhead of possible bugs in our home-grown functions, scattered around the code-base, in comparison to the overhead of learning and incorporating a higher quality library function, then and only then do we stand a chance of making reasonable decisions. Unfortunately, understanding the difficulty of such comparisons and especially grasping the perils of code duplication combined with the Dunning–Kruger effect when it comes to estimating expected bugs in one’s own code, requires, paradoxically, experience to become obvious.

The whole issue of maintaining legacy code vs. writing from scratch is a different battle altogether. However, I will only say that improving on legacy code in any pace is almost always superior to the demolition approach. It’s less risky, you still have working bits and pieces and you have a reference code right on your plate. And here is my point: to avoid rewriting, one must learn to read existing code, and read them well.

To make things more concrete, here are some of my top readability points:

  1. Consistency.
    I can read code with virtually any convention, no problem. I’ve grown to learn to adapt. If I think I need to read code, then I better accept that I may very well read code not written in my favorite style. One thing that is nearly impossible to adapt to, however, is when the code looks like it’s written by a 5 year old who just switched from Qwerty keyboard to Dvorak.
    Whatever convention you choose or get forced to use, just stick to it. Make sure it’s what you use everywhere in that project. 

  2. Consistency.
    There is no way to stress how important this is. People experiment with braces-on-next-line and braces-on-same-line and forget that their experiments end up in the source repository for the frustration of everyone else on the team. Seriously, experimentation is fine. Finding your preferred style is perfectly Ok. Just don’t commit them. Not in the Trunk at least. 

  3. Space out the code.
    Readable code is like readable text. As much as everyone hates reading text-walls, so do developers who have to read sandwiched code. Put empty lines between all scopes. An end-brace is a great place to add that extra line to make it easier to read and find the start of the next scope. Need to comment? Great. Just avoid newspaper-style coding, where the code is turned into a two-column format, code on the left, comments on the right. There are very few cases that one needs to do that. Comments go above the code block or statement, not next to it. Comments are also more readable if they are preceded by a blank line. There is no shortage of disk space. Break long functions, break mega classes and even break long files. Make your functions reasonable in size. Use cyclomatic complexity metrics to know when your code is too complex and hence hard to read and understand. Refactor. 

  4. Stop banner-commenting.
    If you need to add 5 lines of boxed comments within functions, then it’s a good indication you need to start a new function. The banner can go above the new function. As much as I’d rather not have it there either, it’s much better than having a 1000+ lines in a single function with half a dozen banners. After all, if you need to signal the start of a new logical block, functions do that better. Same goes to breaking long functions, classes and files. (Unlike what the image above may imply, that image is just an image of some code, I do not endorse the style it adopts.) 

  5. Don’t use magic numbers.
    Constants are a great tool to give names to what otherwise is a seemingly magic number that makes the code work. Default values, container limits, timeouts and user-messages are just some examples of good candidates for constants. Just avoid converting every integer into a constant. That’s not the purpose of constants and it reduces code readability. 

  6. Choose identifier names wisely.
    There has been no shortage of literature on naming conventions and the perils of choosing names that lack significant differences. Yet, some still insist on using very ambiguous names and reuse them across logical boundaries. The identifiers not only need to be meaningful and unambiguous, but they also need to be consistent across functions, classes, modules and even the complete project. At least an effort to make them consistent should be made. If you need to use acronyms, then make sure you use the same case convention. Have database or network connections, use the same identifier names everywhere. It’s much, much more readable to see an identifier and immediately know what the type is. Incidentally, this was the original purpose of Hungarian notation (original paper). Same goes to function names. ‘Get’ can be used for read-only functions that have no side-effects. Avoid using verbs and nouns in an identifier then implement logic that contradicts the natural meaning of those words. It’s downright inhumane. If a functions claims to get some value, then that function should never modify any other value (with the possible exception of updating caches, if any, which should only boost performance without affecting correctness.) Prefer using proper names in identifiers. Meaningful names, nouns and verbs can be a great help, use them to their limit. 

  7. Don’t chain declarations.
    Chaining variable declarations is a very common practice in C. This is partially due to the fact that Ansi C didn’t allow declaring variables after any executable statement. Developers had to declare everything up-front and at the top of their functions. This meant that all the variables of a given type were declared in a chain of commas. Declare variables in the innermost scope possible and as close to the usage statement as possible. Avoid recycling variables, unless it’s for the very same purpose. Never re-purpose variables, instead, declare new ones.
    A similar practice is sometimes used to chain calls where a single function call which takes one or more arguments is called by first calling other functions and passing their return values as arguments to the first call. These two types of chaining make readability a painful process. Some people chain 3 or 4 function calls within one another, thinking it’s more compact and therefore better, somehow. It’s not. Not when someone needs to scan the code for side-effects and to find a hard bug. A possible exception to this is chaining calls one after another. Some classes are designed to chain calls such that each call returns a reference to the original instance to call other members. This is often done on string and I/O classes. This type of chaining, if not excessively used and too long, can be helpful and improve readability. 

  8. Limit line and file lengths.
    Scrolling horizontally is a huge pain. No one can be reasonably expected to read anything while scrolling left and right for each line. But some people have larger screens than others. Yet some have weaker eye-sight, so use larger fonts. I’ve seen code apparently aligned to 120 characters per line, with occasional 150+ characters per line, back when 17″ screens were the norm. Some don’t even care or keep track of line length. This is very unfortunate as it makes other people’s lives very hard, especially those with smaller screens, lower resolutions and large fonts (read: bad eye-sight.) Code can’t be word-wrapped like article text (without the help of advanced editors.)
    A good starting point is the legacy standard of 80 characters-per-line (cpl). If we can stick to that, great. We should make an effort to minimize the line-length and 80 characters is not too narrow to make things garbled. For teams who can all work with 100 or 120 cpl, the extra line width can give quite a bit flexibility and improve readability. But again be careful because too long a line and readability suffers yet again. Think how newspapers and magazines have narrow columns and how readable they are. Our wide screens with large footprints are like newspapers and moving our heads to read across the line, as opposed to moving just our eyes, is an extra physical strain and certainly a speed bump that we can do without. 

  9. Make scopes explicit.
    Languages often have implicit scopes. The first statement after a conditional or a loop in C-style languages implicitly fall under the conditional or loop, but not consecutive statements. It’s much harder to parse the end of statements using one’s eyeballs to figure out which statement belongs to the implicit scope. Making these cases explicit makes it much easier to see them without much effort. Adding braces and an empty line after the closing brace, even around single-statement scopes does improve readability quite a bit. 

  10. Comment.
    Code is very well understood by machines, even when obsecure and mangled. Humans on the other hand need to understand purpose and reason, beyond seeing what a piece of code would do. Without understanding the reasons behind particular constructs, operations and values, no developer can maintain the code in any sensible way. Comments are a great tool to communicate the less obvious aspects of the code. Explain the business needs, implicit assumptions, workarounds, special-requests and the temporary solutions. Talk to your fellow developers through comments. But, as every rule comes with exceptions, avoid over-commenting at all costs. Resist the temptation to declare that an error is about to occure right before throwing an exception. The throw statement is self-commenting. Don’t repeat yourself. Rather, explain why the case is considered unrecoverable at that point, if it’s not obvious. Prefer to write code that speaks for itself. Self-descriptive code is the most readable code.

As a corollary, some code should be made hard to read. You read that right. Code that depends on hacks, bad practices or temporary solutions should be made harder to read. Perhaps my best example is casting. It’s not rare that we need to cast instances of objects to specific types, and while this is a standard practice, it isn’t exactly highly recommended either. Making the casting a little less readable is a good way to force the reader to squint and make sure it’s not a bogus cast. This isn’t unlike italic text which makes reading the text harder and therefore makes it stand out. In the same vein, temporary solutions and hacks should be made to stand out. Adding TODO and FIXME markers are a good start. BUGBUG is another marker for known issues or limitations. Actually, if there is any place for that banner-comment, this would be it. Make known issues stand out like a sore thumb.

As with most things, the above is mostly a guideline, rules of thumb if you will. They are not set in stone and they aren’t complete by any stretch of imagination. There are many obviously important cases left out. Languages, ecosystems, frameworks and libraries come with their own nuances. What’s clear in one language might be very ambiguous in another, and therefore frowned upon. The important thing is to write with the reader in mind, to avoid confusing styles and especially to be mindful of the context and when an explanatory comment is due, and when refactoring is a better choice.

Conclusion

Good developers write code fast and with low bug-rate. Great developers read existing code, reuse and improve them, generating higher quality code with no duplication and reduced bug-rate for both new and existing code. We should write readable code; written for humans and not just for machines. And we should read code and improve, rather than rewriting.

Mar 282009
 

Recently I’ve been interested in web dev and web standards after disappearing from the scene for about a decade. Back in ’97 I was enjoying the ride of the web technologies, design, server and client-side programming, browser wars and what not. But all that had to end someday and that day came in mid 2000. I have a performance craze and wanted to dive deep into the world of desktop applications, servers and particularly highly scallable architecture, low-level (read: assembly) optimization and advanced algorithms. So I left the world wide web only to come back and rediscover it a decade later.

Finally interactive-web is here. Finally! Looking back at static HTML and how boring they looked, how hard and complicated it was just to get feedback from users and, even with that much primitive functionality, how incompatible web browsers were, gives me a shiver.

Now I have to cover a lot of ground. So much has changed and so much to catch up with. I have some experience with Python, honestly I love it. I’ve been using it on and off for a few years now. I thought I’d start with Python-based solutions. Some simple pythong http server (web.py). I do enjoy reading code, so I started by reading web.py. Fun stuff.

Django Tutorials? How about one better: Screen Casts! If reading code is good, reading working-site code should be better. Last, but never least, I had to get dirty with some code and created a toy Wiki in Django. I should say, the learning curve was rather smooth. Django has some quirks (queries) and some things must be done is a very specific way (projects/apps, models/views/urls …) but the thing that I find a bit annoying (especially during development) is the lack of support to import/create initial data (in the DB). The syncdb command is handly, but doesn’t work as expected when you change existing models. Probably works for some trivial cases, but add/remove some column, or mark one as index (or remove an index) and it will silently ignore you.

It’s come a long way since ’05 though. So I’ll cut it some slack. Plus, it’s probably the best python web framework. Not to mean that it’s perfected, far from it. Version 1.0 is in the very recent past (current stable version is 1.0.2 with 1.1 beta just released.)

Django and Python, what next? jQuery, JSON and Ajax. Then jump to CSS, Selectors and HTML 5.0. Not that they all work properly on all modern browsers, but exciting technologies nonetheless.

Jul 222008
 

Here is a story from the trenches…

Almost 3 years ago during a job interview one of the interviewers made a comment about how they never hire MBA’s for managerial positions. I didn’t get a chance to ask, when she continued to highlight the advantages (brag, really) of working with a superior who had worked as a programmer and climbed his/her way up the food chain. “We never have to talk to an MBA around here,” she added, and went on to finish with “everybody knows the code.”

The Ideal Project Manager?

You know, she’s talking about the kind of person who would know how to make time estimations and complexity assessment because, well, they had worked on similar things themselves. Not only that, imagine a supervisor who has actually worked on the very same projects that you’d be working on! No more lengthy meetings discussing time estimates and project plans, no more trying to hopelessly convince the manager why we need that extra person to pitch in to finish on time, no more wasted time attempting to find analogies of “spaghetti code” and hacks that must be refactored away before we add the next big thing, and more importantly, you’d have someone up the managerial food chain getting more realistic requirements and deadlines right off the bat before you even hear about the next project. After all, they know as much as the team everything there is to know about the codebase and software engineering… or so I thought!

I got hired alright. At first, I was rather amused by the idea; most everyone had worked on what I was working on at some time and every body seemed to have some idea about it. I imagined talking to the project manager about some project and how he’d point out some tricky part and that I should probably add in an extra week just to make sure it’s covered. I imagined hearing anecdotes and funny stories. In my mind I thought I would never have to talk the manager into anything; s/he would know what I’m talking about, and many times I expected them to tell me they’d extend the deadline because they know more testing would be needed. They know the code!

Enter Mr. MBA

Well, before I tell you what you probably expect anyway, let me tell you about an MBA who got hired at a time when there were no candidates for the position of a PM. They needed a PM, and they needed one immediately. So they hired Mr. MBA. Unlike what I had expected from an MBA, and unlike the typical Mr. MBA that I ran into throughout my career, this one was different. For a start his estimates were very realistic and often more accurate than the team’s! He was also very understanding. He’d get feedback from the team, make his own calculation (almost always to the team’s benefit) and take it up the chain. So often he’d take my estimates and add a 20 or 30% and make a comment to the effect of “QA is too busy, they’ll probably delay this project” or “we have external dependency; you can never know how effective they will communicate”. And almost always he was right on the money! The only times that he’d push for a shorter schedule or demand more aggressive results is when the success of the project directly depends on these factors and he thinks there is a way to optimize the project plan. And he’d find a way to reorganize things around and optimize things to get the right results.

How well do you think the other managers did? Needless to say they were a huge disappointment. They were the typical MBA I had in mind. Extremely rigid in their methods. Very opinionated and argumentative. Snobbish. Typically cutting any estimate by half and exclaiming why it would take that long to get such a trivial task done. Complaining and whining at every barrier or issue. Complete babies. And at the end of the day, they cared more about proving they were right than to get things done.

Why?

Here is why I think things turned out the opposite of what I expected. I think the reason that managers with technical background failed is because they almost always overestimate their abilities and undervalue their peers’ assessment. You see, where the MBA doesn’t try to imagine himself working on the tasks and projects, instead relies on historic data, experience, statistics and plain common sense, the tech manager tries to guess how long it would take him to do it.

Programmers are not the most humble. It’s easy to overestimate one’s abilities especially when you don’t want to look like you’re on the slow side in front of your once-peers, now subordinates. This is exacerbated by the fact that the manager in question probably hasn’t typed a single line of code in months. But even if they do, they still almost always undervalue the feedback of the team. In short, it’s very hard for someone who’s done a job to play a different roll and assume that s/he know nothing about the subject. After all, their advantage was supposed to be the very fact that they have been programmers. They are expected to use their experience! However, those who can’t get over the feeling of competing with their subordinates are doomed to do a bad job. While the tech manager is wasting energy completing with his/her subordinates, Mr. MBA was trying to solve real problems and work with the team.

Do you have a similar story? What’s your take on the subject? Are managers with programming experience better software PM’s or MBA’s know best?

Jul 172008
 

I’ve been researching Web Services and Web API when I came across this cunningly-named protocol: REST*.

I’ll try to put together the key points and pointers that I collected in my journey and try to give an introduction to the whole affair.

What Services?

Web Services is the fancy name behind the simple idea of making documented requests to load/store some data from/onto a web server. The idea is simple because you could (and some did) do much of the same things if you were smart and determined enough., way before the name was invented. Consider getting some HTML page and parsing data from it. Same would be true if one knew the parameters of search queries, generate the POST or GET commands on HTTP and processed the returned HTML files. Through hackery, one could do quite a bit. But any change in the web site in question, and your code would break; you’d have to re-hack it.

The main advantage of web services I’ve already mentioned; they are documented. This is not a trivial fact. The details of the resources (or services, to use the new terminology) are well defined and documented (no need to guess the POST details or URI structure and naming) and not only that, but the results are in well-formatted and documented form (typically in XML, but YAML is making headway). So the whole thing can now safely be called API (Application Programmer Interface). In addition, more of the internal functionality are usually exposed, where before one would be limited to whatever public services were available.

At some point some web sites thought it would be a good thing for their business growth to give more power to the users and 3rd party vendors. They documented their service interface. More accurately, they added a 3rd party API to their internal system (typically as an afterthought). Many followed suit. Now it’s usually designed into new web sites and web applications.

Not surprisingly there are many ways to go about designing, implementing and accessing web services. The roughest approach is some form of RPC, while mature form would be something like SOAP. REST, on the other hand, is somewhere in between. Apparently there was a huge debate about it (I’m not sure it’s settled).

According to Wikipedia:

REST strictly refers to a collection of network architecture principles which outline how resources are defined and addressed. The term is often used in a looser sense to describe any simple interface which transmits domain-specific data over HTTP without an additional messaging layer such as SOAP or session tracking via HTTP cookies. These two meanings can conflict as well as overlap.

So what is REST anyway?

REST (Representational State Transfer) is an architecture of structuring, organizing and managing (at runtime) a set of resources and services. REST overloads the existing resource identifiers to create, read, modify and pass state information from one to another.

REST Principles

Here are the main REST principles in a nutshell:

  1. URI’s are the API; create the missing, view/edit the existing, delete is terminal editing.
  2. The server is stateless; state transitions are in the URI’s.
  3. XML is not necessarily the format for input and output; use whatever is handy.
  4. HTTP is typically used for the communication layer; while canonical, HTTP is replaceable.

Conclusion

After reading many debates and comparisons between the REST approach, RPC and SOAP (and other WS-* protocols) I think I’m of the opinion that REST is the way to go. It has so many advantages that appeal to me that it’s hard to see why not use it. WS-* seem to have its use cases too, but it’s a rather limited one.

REST is very simple and is a natural fit for the web. If you have even the most static web site, it’s a rather retarded form of REST. However, to have the simplest working WS-* protocol, you’d have to invest an untold amount of time and energy. In addition, REST is the simplest thing that works, which translates into flexibility points. The more rigor and formality you put into a system, more robust it might seem, but also much harder to adapt and change. REST drives its rigor from the high-level principles that, when followed, give high predictability to the system.

It’s like your classic monolithic concrete building; once you poor the concrete in the casts, you can’t change without blowing it up and starting over. On the other hand, a more modular and componentised approach will give you the flexibility of changing the shape and form of your structure over time. As things mature and experimentation fades into a working system, one would undoubtedly need to make non-trivial changes. Any of the gorilla-sized WS-* architectures would be a liability at that point, when REST is ready for change.

* Generally speaking REST is how the web works. It’s not a novelty, just a fancy name given to the architecture.

Further Reading

Mar 292008
 

Programmers are by and large lucky for having the luxury of practicing what they enjoy virtually anywhere. Compare that to most engineering disciplines and you’ll see that you can’t go far in your garage. May be electrical engineering comes in next, but that’s about the closest you get. Programmers on the other hand can code anything. I mean, anything. The only costs are the hardware (dirt cheap, by the way) and the time to do it. Want to create your game, no problem. Try an OS! Crazy, but doable. And the thing that really tops it all, is that you don’t need to move from your chair, or bed!

A selection of programming language textbooks ...

Image via Wikipedia

This is not a tiny advantage. I’d argue it is the number one reason behind the boom that the field experienced. It’s still going strong. There was a time though when things weren’t any different from rocket science. But that was before the 80’s and the affordable generation of PCs.

So since anything is possible, let’s see what are the top problems any serious programmer should solve at least once (in many cases a dozen!). Here is my list of the top 5 problems that I think give the maximum benefit when solved per unit time spent solving them. These have the benefits of being limited in scope, touch on many areas and sub-issues and are potentially useful. Also, they require only the most basic knowledge and tools to solve. You can solve any and all of them in text mode. Even the games may be simulated with text-based interfaces (although that way you won’t learn about graphics).

1) Calculator

This is really simple. Just write a program that would do the 4 basic math operations: add, subtract, multiply and divide. Then add some more. The point of the exercise is of course parsing and operator precedence. You are exposed to the world of parser design, expression trees and possibly symbol tables (if you add variables or unknowns to the mix). What I like about this problem is that it’s very easy to get a basic version running and then add complexity. Also, you get to define the rules (such as precedence) and complexity (whether there are parens and variables or not). It pays off real quick.

2) Arcade Games

I’d say Tetris, but I don’t think it’s the only one that qualifies. There are so many games one can write that have the same basic features: Pacman, Arkanoid, Snakes and Sokoban just to name a few. The main benefit is to design a state-machine, real-time processing, AI, Graphics, collision-detection and much more. Yet another problem that may start with very limited features and extended as much as necessary. 3D worlds not being an exception. Besides being fun, games are so multi-faceted that you have to touch upon so many domains and problems that they are very educational. The experience is just unparalleled.

3) Maps and Trees

You can’t go far without arrays and list and queues and the numerous data structures. Most programmers have to write similar structures indirectly and unintentionally. Still, many try to dodge writing them by using standard libraries or 3rd party code. There is nothing like writing an abstract map based on a hash-table and another based on a binary search tree and put them head-to-head in performance comparison. Try programming a map that would beat your best alternative in both speed and space. Balancing trees is yet another complex problem. Many algorithms with many trade offs. The benefits of this problem is to learn more about the basic tools of the trade while learning all about complexity and efficiency along the way. Did I mention that you will use lists and arrays in maps and trees? Writing highly efficient maps and trees is one challenging task that is not even apparent until you start thinking about it.

4) Database

There are multi-million dollar databases out there. But can you create your home-grown version that has any number of the features the big guys have? Try something that is as flexible as storing any data type. Then add efficient searching features. Then try adding relations, hierarchies, indexing, even a query language! You’ll learn all about file formats, on-the-wire data types, serialization, complex data structures, sorting and searching, data manipulation and data mining to name a few. Furthermore, you’ll appreciate what it takes to create efficient and scalable data structures on and off the disk.

5) Programming languages

This is one of the most complex problems of the programming world. I’m of course talking about compilers and language design. Try writing an interpreter for a popular language or a dialect of one. Say Basic or Pascal. Try Lisp. Why not even go as far as inventing your own. You’ll learn about language design, parsing, symbol tables, optimization, simulation, automata and many, many more. This is not only fun, but it’s also a very challenging task. I can’t promise that it would become the next big thing, in fact, it probably will only be used by you. But this is one of the more challenging problems that may be limited in scope and extended as needed. And if you feel crazy, write a debugger for your interpreter.

I’m sure one may add other problems, replace some in my list or simply say there is no such list. Anyway, here you have it. This is what I’d recommend for practice.

QR Code Business Card