Two most obscure bugs in my life

Wed
14
Nov 2018

Fixing bugs is a significant part of software development, and a very emotional one - from frustration when you have no idea what is happening to euphoria when you find the cause and you know how to fix it. There are many different kinds of bugs and reasons why they are introduced to the code. (Conducting a deep investigation of how each bug happened might be an interesting idea - I will write a separate blog post about it someday...) But the most frustrating ones are probably these that occur only in some specific conditions, like only on one of many supported platforms or only in "Release" and not in "Debug" configuration. Below you will find description of the two most obscure bugs I've found and fixed in my life, for your education and amusement :) Both were in large, multiplatform, C++ codebases.

1. There was this bug reported causing incorrect program behavior, occurring on only one of many platforms where the program was built and used. It was probably Mac, but I can't remember exactly. It was classified as a regression - it used to work before and stopped working, caught by automated tests, after some specific code change. The strange thing was that the culprit submit to the code repository introduced only new comments and nothing else. How can change in a code comment introduce a new bug?!

It turned out that the author of this change wanted to draw a nice "ASCII art" in his comment, so he's put something like this:

// HERE STARTS AN IMPORTANT SECTION
//==================================================================\\
Function1();
Function2();
(...)

Have you noticed anything suspicious?...

A backslash '\' at the end of line in C/C++ means that the logical line is continued to next line. This feature is used mostly when writing complex preprocessor macros. But a code starting from two slashes '//' begins a comment that spans until the end of line. Now the question is: Does the single-line comment span to next line as well? In other words: Is the first function call commented out, or not?

I don't know and I don't really care that much about what would "language lawyers" read from the C++ specification and define as a proper behavior. The real situation was that compilers used on some platforms considered the single-line comment to span to the next line, thus commenting out call to Function1(), while others didn't. That caused the bug to occur on some platforms only.

The solution was obviously to change the comment not to contain backslash at the end of line, even at the expense of aesthetics and symmetry of this little piece of art :)

2. I was assigned a "stack overflow" bug occurring only in "Debug" and not in "Release" configuration of a Visual Studio project. At first, I was pretty sure it would be easy to find. After all, "stack overflow" usually means infinite recursion, right? The stack is the piece of memory where local function variables are allocated, along with return addresses from nested function calls. They tend not to be too big, while the stack is 1 MB by default, so there must be unreasonable call depth involved to hit that error.

It turned out not to be true. After few debugging sessions and reading the code involved, I understood that there was no infinite recursion there. It was a traversal of a tree structure, but the depth of its hierarchy was not large enough to be of a concern. It took me a while to realize that the stack was bloated to the extent that exceeded its capacity by one function. It was a very long function - you know, this kind of function that you can see in corporate environment which defies any good practices, but no one feels responsible to refactor. It must have grown over time with more and more code added gradually until it reached many hundreds of lines. It was just one big switch with lots of code in each case.

void Function(Option op)
{
  switch(op)
  {
  case OPTION_FIRST:
    // Lots of code, local variables, and stuff...
    break;
  case OPTION_SECOND:
    // Lots of code, local variables, and stuff...
    break;
  case OPTION_THIRD:
    // Lots of code, local variables, and stuff...
    break;
  ...
  }
}

What really caused the bug was the number and size of local variables used in this function. Each of the cases involved many variables, some big like fixed-size arrays or objects of some classes, defined by value on the stack. It was enough to call this function recursively just few times to exhaust the stack capacity.

Why "stack overflow" occurred in "Debug" configuration only and not in "Release"? Apparently Visual Studio compiler can lazily allocate or alias local variables used in different cases of a switch instruction, while "Debug" configuration has all the optimizations disabled and so it allocates all these variables with every function call.

The solution was just to refactor this long function with a switch - to place the code from each case in a separate, new function.

What are the most obscure bugs you've met in your coding practice? Let me know in the comments below.

Comments | #debugger #visual studio #c++ Share

Comments

[Download] [Dropbox] [pub] [Mirror] [Privacy policy]
Copyright © 2004-2024