Tuesday, October 20, 2009

Software Complexity Crisis

[This rather large post is almost entirely a personal rant. I complain a lot about wxWidgets in here, because it's fresh in my mind and a very good example of a specific type of software engineering crisis. Note that I'm still using the WX libraries for the Alter Aeon client project; clearly it has value to me in spite of its faults, and I appreciate the effort the WX team has put it. That said, if I could easily move to any another library that met my constraints, I would do it in a heartbeat.]

Various recent attempts on my part to use large software libraries have made me re-examine the issue of the general software-engineering crisis. I've run up against typical software crisis problems many times in the past, so I tend to keep my eyes open when I see material related to the topic. This is one of the reasons that Vernor Vinge's book "A deepness in the sky" caught my attention.

One of the basic premises of this book is that complexity failure can be sufficient to bring down entire societies. This was put very succinctly in a blog posting by Jeremy Bowers on his iRi Blog, part of which I quote here:

"One of the less well known concepts which informs his sci-fi writings is one possible fate of societies that do not or can not end in a "singularity", which is the eventual unavoidable collapse of the society in a cascading failure state brought on by excessive, uncontrollable complexity in the ever-more-sophisticated systems that drive the society. In this case, take "system" in the broad sense, including not just software, but business practices, government, and societal mores. A failure occurs somewhere, which brings down something else, which brings down two other something elses, and perhaps quite literally in the blink of an eye, you are faced with a growing complex of problems beyond the ability of any one human to understand or contain."

This relates to software in that I'm beginning to see more and more examples of how this can occur. Two software platforms in particular come to mind - IBM's WebSphere, which I was peripherally involved with a decade ago, and wxWidgets, which I am involved with today.

Both of these platforms are very complex. Both form an abstraction layer which builds on top of other layers - in the case of wxWidgets, there is a huge amount of API reuse from the lower layers of Windows, or GTK, or X, depending on which configuration it's built for. Each of these layers is built upon other layers, and other layers, sometimes with very deep call trees.

The most egregrious example of this kind of layering that I can recall was in WebSphere. A friend had asked me to take a look at a stack trace from a WebSphere crash; somewhere deep in Java land, a 'null object exception' had been thrown. (Thank god it wasn't a NULL pointer, that would have been much worse!) The exception handler that caught it was basically the main loop, because apparently no other layer could be bothered to check for failure conditions along the way.

There were over 160 stack frames to walk through. Not one of them was due to a recursive algorithm or function. I don't know about you, but that level of stack depth is quite frankly beyond my ability to manage or debug. I don't care what it does.

WxWidgets is clearly beginning to show stress of its own, of a different character: it's becoming more and more impossible to guarantee consistency across platforms. The WX guys have made tremendous progress in this regard, so that most of the core features work right, but there are simply too many details to keep track of and too many paths that will never be tested.

Here's a couple of examples of this, one of which I'm STILL fighting:

-------------------------------------------

When I first started switching the Alter Aeon client project over to WX, I initially used the wxTextCtrl class, which is built out of Microsoft system libraries in the Win32 world, or built on the GTK libraries in my development environment. I had hoped to use the class for both the input window, and for the main display; with a small amount of effort, I got the client running and working, but there were minor, very persistent issues.

The first of these was the input window. Various events, such as backspacing when empty, cause a system beep/bell under windows. They don't cause a bell under Linux. And further, there's no way to disable this. I don't know about you, but I'll be damned if I'm going to ship a product that beeps every time someone hits backspace.

I managed to take care of some of this by trapping out various keystrokes in the CHAR handler. It seemed like a poor hack at the time, but at least it helped. However it didn't help enough; a number of keystrokes simply don't generate CHAR events, yet they still fucking beep. I finally ended up writing a raw keyboard event handler, which tracks nearly all of the keyboard state, to trap out events that would generate a beep when passed to the lower layer. In the time it took me to disable beeping, I could have written and debugged a keyboard handler from scratch, with exactly the desired behaviour.

While beeping has largely been taken care of, other issues with this so-called standard class have not. The biggest one is that the color of text displayed in the class returns to black occasionally, and depending on the versions of the system DLLs for the particular Windows installation. My first attempts at fixing this were effective on all my development environments, but failed on about half of the release environments - text typed in the window would occasionally simply vanish.

By adding forced color setting in various places where it shouldn't be needed, I eventually managed to fix this problem for about 90% of my users. Out of sheer disgust at this point, I did some extremely vicious forced color setting in various event handlers, and this appears to have fixed 'most' of the problem. I still am receiving sporadic reports of it happening on current client builds, but at least the problem goes away now and seems to be triggered at random.

When attempting to use this same class for the main window display, I ran into what seemed to be minor issues regarding the scroll bars and scrolling of text in the window. No matter what I tried, I never did find a way to get reliable scroll positioning for this class across all platforms.

After fighting this off and on for several months, I became desperate. I finally wrote my own text display class from the ground up, using nothing more than bitmaps and font drawing routines. The total from-scratch implementation time was less than the time I had previously wasted trying to get the scrollbars to work properly. It's also faster, especially for very large data sets.

-------------------------------------------

This, my friends, is the software engineering crisis in action. Each layer, while hiding some of the problems of the lower layers, introduces its own; the overall result is a system with fewer catastrophic issues, but exponentially more minor issues.

Those minor issues are surely tolerable, are they not? To an extent, yes - but at what point do you die the death of a thousand cuts?

Catastrophic issues might be catastrophic and obvious failures, but that's one of the best things about them: they're catastrophic and obvious. They HAVE to be fixed. They must be understood, they must be cleaned up and dealt with. The minor problems on the other hand, can just keep accumulating. They just keep getting worse, they just keep getting more obscure, more complicated, more difficult to find and rectify. And worse, they compound each other.

In a good scenario over the long term, they become so prevalent that the system no longer becomes usable. In a bad scenario, the system becomes critical and unmaintainable. It's a swiss cheese of buggy modules and misunderstood patches.

Is that really what you want to build critical infrastructure out of? Is this where we're headed? I certainly hope not.

Sunday, October 11, 2009

DClient 0.982 Released!

After a lot of hard work, intermediate releases, bug reports and beta testing, the latest version of the Alter Aeon game client is now available! It's only been about a month since the last release, but the number of updates and new features is pretty impressive. Here's some of the major things that have been updated:

  • Grouping status bars added.

  • A new version of 'left hand layout' is now available.

  • Added basic support for color themes and backgrounds.

  • Side buttons now use the same color scheme and format as popup and function key buttons.

  • (For builders) 'connect' command allows connecting to arbitrary ports.

  • Redraw flicker reduced in popup windows.

  • Add support for GMUD color scheme.

  • Smoother scrolling for the automap.


In addition, there's been a pile of minor bug fixes, as well as a lot of work to clean up and improve the interface. You might also notice things like removal of scrollbars and word wrap on popup windows that don't need them, fewer buttons to be confusing for newbies, and improvements in the input text bar. Here are some examples of a couple of possible configurations:

Alter Aeon Client Screenshot

Alter Aeon Client Screenshot

The basic theme support and the overall interface cleanup should help us a lot with new players. The first 30 seconds of pretty buys you the next 30 seconds, etc. It seems to take forever, but incremental improvements do really add up over time.

Again, the download is at http://www.dentinmud.org/AlterAeon.exe. Snag a copy and check it out!

Wednesday, October 7, 2009

Yet Another wxWidgets Fail

I've been trying off and on for a while to get transparency to work in my Linux/GTK builds, and for whatever reason it's just not working (at all.) After running some experiments last night, the best I can determine is that there simply is no alpha channel available in the wxBitmap/wxMemoryDC that I'm trying to use.

It's not that I didn't try to get one. Supposedly all you need to do is construct bitmaps with a 32 bpp and everything else will take care of itself; however, even simple operations like putting a pixel or drawing onto the DC returns 0xFF for the alpha channel every time. The alpha simply disappears. Because the alpha is gone, when I attempt to draw or blit the bitmaps, I get a full overwrite instead of a proper alpha blend.

Once again, it's beyond me how something so simple can silently fail so catastrophically while giving no indication of error or dysfunction. I'm using the modules according to the published API, and the expected results do not occur.

Naturally, there is coupled to this another issue: wxWidgets does not have very good drawing/shading and blending primitives. I could probably draw what I want using the tools, but it will be costly, both in time and processing power.

The obvious two solutions present themselves:

1) Find another library, such as OpenGL, that can I can integrate into the toolkit, or

2) Write my own graphics subsystem.

Size is, in my mind, at a premium here, so I would prefer not to bloat out the executable with another five megabytes of drawing and 3D utilities. It also just so happens that I've done a lot of game and graphics programming in the past. Graphics programming that included various primitive libraries.

Now, the only issue is how to get direct access to wxBitmap data so that I can do my drawing elsewhere, then copy into the bitmap for blitting. It just so happens that there's an undocumented 'wxRawBitmap' class that looks like it might do the job. Hooray for undocumented features. What the hell.

Saturday, October 3, 2009

Client Customization

As part of my quest to improve the client, I accidentally embarked on sanitizing the button color scheme and making various colors consistent across the codebase. When this was done, it occurred to me to fiddle with them, and I ended up with some very interesting results, which you can find in the dentinmud.org screenshots directory.

Here's some direct links:

Red Alter Aeon Screenshot

Blue Alter Aeon Screenshot

Default Color Alter Aeon Screenshot

I personally like the blue a lot, but that's just me. A lot of players expressed concern at the blue layout or found it ugly. The red one was almost universally disliked, which I found surprising. Attempts to tweak it were really unproductive; for whatever reason, it seems very difficult to come up with red background color schemes that don't look terrible.

The current plan for the next day or so is to at least make these run-time selectable, though most players disable all of the buttons when they play anyway.

Strange as it may seem, I think the blue adds a whole new level of cool factor to the initial impression of the client. It's definitely eye catching, which is probably a good thing given the stunningly high dropout rates of people who try the client cold.