Sunday, March 29, 2009

Clients and Newbies

I recently released a new client update for Alter Aeon, one that includes (but is not limited to) the following cool changes:

- vastly upgraded automap
- vote menu
- 'last command select'
- various settings now save
- function key alias mapping

This really is a big step forward in the evolution of the client, and I've received a lot of compliments on it. Unfortunately, there is a problem with this client release, and it all goes back to the problem of newbie retention. But before we go into that, here's some background on the current state of newbies on the game.


There appears to be three primary types of new player right now:

1) Sighted players from mudlists, who generally already have a client. These players usually don't download the custom client, and if they did they probably would go back to their own client anyway.

2) Blind players who use blind clients. They generally don't use the AA client, as it has crap for blind player support.

3) True new players who probably have never mudded before. These people are the long-term future, as the number of mudlist players is shrinking and blind players will eventually get their eyes fixed. This is also by far the largest market to tap. You don't compete with World of Warcraft unless you're hitting up ordinary sighted people.

The mudlist players will pretty much do their own thing. There's a limit to how much you can do with them, as they have their own opinions already. Many are unrecoverable, as they have been damaged by some other mud.

Blind players we seem to be doing reasonably well with. I have no complaints here, other than making the AA client work really well with readers would probably help a good amount.

The third category, by far the biggest market, is the problem. Since they have no prior mudding experience, odds are very good they downloaded the client, which shows up easily in my stats. I can also track logins based on web page hits. If this class of player is sticking around, I'll see it by looking for newbie players using the client.

This is exactly what I don't see. I see the expected players log in with the client, just as they should. There's a large initial dropout rate, but a good percentage of them level and play for a while. After that they all disappear. We're not keeping people entertained.

I suspect what's going on is they play for a while, run out of stuff to do or don't see the point, and bail. That pretty much describes my first mudding experiences; I got a low level character and basically just wandered around looking at stuff. None of the text meant anything in particular to me, and I would wander ridiculous places because I didn't know where to go or what to do. I didn't even know what the point of the game was.

I clearly need to add a lot more statistics tracking and try to see where the dropoffs are occuring - what level ranges, number of logins, etc. With this information in hand we can improve the introductory areas; but so far, my initial numbers indicate that the dropoffs happen for the newbie areas that are well tested and considered in good shape. So what is really going on?


The final piece of the puzzle revolves around the massively improved automap function in the new client. I tried really hard to play with the mouse for a while, and it worked as well as could be expected. Unfortunately, when using and concentrating on exploring using the map, the entire text window (which contains everything that's important) disappears. Not from the client, but from your mind. It simply isn't as important as the automap. Looking back at it takes effort.

So say you're exploring with the map. You see unexplored spots on the map, you know you can go that way, and you click the button to go there. Nothing happens. You click the button again - you just want to go explore that part of the map - and nothing happens again. How irritating.

Nothing happens because you're fighting. But unless you take your eyes off the map and look back to the main window, you don't notice it. You just get irritated that you didn't move when you clicked the button. And all the scrolling crap in the main window, highlighted and all, really doesn't mean anything to you.

I've got a few things in the works to combat this, including audio sounds for fighting, changing the buttons in combat, and changing the area description to list various positions such as "Combat!", sleeping, resting, etc. That will have to wait until next week though, as I've got a lot of web work to get caught up on as well.

It's definitely a good problem to chew on.

Sunday, March 22, 2009

Statistics can lie

Today's topic is the Alter Aeon "asshole" detector. We've had code in the game for years that tries to do a rough estimation of how big an asshole people are, and after multiple years of tweaks and updates, we have something that produces pretty valid results that correlate spectacularly with bad behavior.

The problem is that it correlates with the wrong bad behavior.

Take a look at the following asshole numbers:

10.249 Demon
-1.848 Cow
-6.941 Woem

In this example, Woem is a very well-liked socialite, and is known for being a nice, helpful person. At the other extreme is Demon, who currently holds the top seven slots on the all-time asshole list. As of this writing, he has no competitors for top slot.

However, Cow is the real problem here. This guy is a real asshole, and is extremely discriminatory toward blind players (and other groups.) I have considered several times explicitly flagging him as an asshole. Both fortunately and unfortunately, most of his bad behaviour happens on private or clan channels that I don't or cannot monitor - fortunately it stays away from the general population, unfortunately it festers and breeds hatred in his social circles.

After thinking about it for a while, the explanation appears to be related to spam. Demon, and his predecessor Gamlin before him, are both prolific and irritating spammers. They would both constantly pester people and spam channels with inane and generally low-brow conversation. Even I have ignored them for periods of time simply to stop the stupidity and prevent them from bothering me. On the other hand, I cannot recall ever having ignored Cow.

The overall point is that people feel free to ignore spam because it's meaningless, irritating, and usually undirected. On the other hand, ignoring the purveyor of a pointed, directed and possibly malicious attack seems like a bad idea. If I wouldn't do it, why would I expect anyone else to do so? Cow remains off the asshole list because people don't dare ignore him. He's too dangerous.

Somehow, I need to find a way to either counteract this, or allow people to mark or flag assholes. This is of course the age-old problem of automated board moderation, and I don't expect there to be an easy answer.

On the plus side, the current algorithm could simply be renamed the 'spammer factor' instead of the 'asshole factor'. At least then people wouldn't have misconceptions about what it meant.

Thursday, March 12, 2009

Web 2.0

Begrudgingly, I am being dragged into the 21st century. First it was XML formatted output of various pieces of data; now it's XML RSS feeds.

That's right! Alter Aeon now has its first RSS feed on the changelog page:

Alter Aeon Changes and Updates Page

It just so happens that I have a player that's a web developer for a living, which is really nice - he helps me work through protocol and presentation issues that otherwise would take me quite a while to figure out. He tells me that I've not quite got the right idea for the descriptions in the current feed, but I should have that fixed in the next day or two.

I'm also planning on having the player boards in both XML and RSS formats by tomorrow, now that most of the infrastructure is in place. One of the big showstoppers for doing work on the boards (which are the logical place for an RSS feed anyway) was the lack of unique message IDs. As of yesterday, this is no longer an issue.

Honestly though, I'm not sure how much of this is relevant or useful. How many people will use the feeds? Will anyone other than my resident web guru bother to use the XML output?

Either way, Alter Aeon is rapidly heading into the world of new media, and it's really hard to tell where we're going to make the most gain. An excellent example of this is my recent experiment with Twitter.

A couple of weeks ago, I set up a Twitter just for the hell of it and linked it up to the main web pages. I've been using it to just drop periodic status reports about what I'm working on, and hooked up the game to post reboot notifications there. In virtually no time, I have 8 followers there, while this blog currently has only one, despite being active for months. Additionally, I've received quite a few web hits from new people through Twitter, also more than from here.

You never know what's going to work, and just to make things complicated, things work differently for different numbers of users. As the game grows, I'll have to constantly be adding and reorganizing. At least it'll be fun.

Sunday, March 1, 2009

Chaos seems to reign

The field of software arguably contains some of the largest and most complex functional systems ever created by man. Even relatively simple systems can demonstrate incredibly complex behavior, and can hide potentially major flaws through indefinitely long periods of use, only to demonstrate them at inconvenient times. The Alter Aeon server codebase is no exception.

Pretty much the most complicated discrete object in the server is the socket stack. It handles a huge number of protocols and a bunch of different filtering layers, with hooks in various places. The last major redesign of the socket stack was around 1998; it is a testament to the design of that module that it has required no major modification in that entire time period, though many minor modifications have been made to it.

Unfortunately, the server is not made of such lovely discrete objects. Often, obscure bugs lurk in the sea of wild code that makes up the substructure of the system. This weekend I had the incredible good luck to catch and haul two of these bugs to the surface.

For years, there have been a handful of issues that occurred seemingly at random. In the early, old days, some of these would even cause a server crash and reboot the game; but after being unable to find and fix them, the crashes were gradually protected against and ways were found to ignore these spurious events.

Examples of the two most common of these are deceptively simple: sometimes, the get_room function would be asked to find a room but be given no instructions on how to do so, and other times a character (a monster or player, it was difficult to tell) would die and simply 'get stuck'.

In reality, these are monstrously difficult. The get_room function lacking instruction was mindboggling - it happened perhaps twice a year, the reports were never the same, and there were thousands of places that could be calling the function. Even as our debug facilities improved, no progress was made - there just seemed to be no rhyme or reason to it.

The dead character bug was just as bad. The entire destruct and event handling process was revisited and inspected several times, but no holes were ever found. Until thursday.


In the course of trying to add some new restrictions to the 'charm' spell, I noticed that the destruct sequences for the charm, possession, and entangling roots spells looked a little goofy. We didn't check or clear certain important things, but there were comments indicating that we didn't need to because something else over there would take care of it. Keep in mind that this was entirely my code; I had written this well over a decade ago.

In the course of looking at this, I suddenly had a realization: there were no 'holes' in the logic for charm or possession, but maybe, just maybe, there was a hole in entangling roots. Entangling roots did come later, and quite frankly the code for it was a complete hackjob.

Within ten minutes, I had my answer. There was indeed a conflict between entangling roots and the 'special' code that made possession and charm work. The problem then became, how exactly do you fix such a mess? After about two hours of thinking about it and five more of carefully backing things out and reorganizing, I got what appears to be a stable fix. There's still some debug logging in it, but this bug appears to be properly killed. The new code is simpler, the checks are stronger, and we don't rely on obscure handlers to clean up messes. I hope.


I thought that was the end of my troubles in the short term, but then Glorida shows up with some obscure problem of his own. He's been working on mob programs, and had built something rather complex that simply was not working. Not only wasn't it working, but it was doing something weird, and it was doing it reliably.

This is another one of those 'fairly complex' pieces of the system. It took me about two hours off and on to get it loaded into my brain so I could really think about it. It then took me probably another hour to really understand what was going on, and figure out what was happening. And then it occurred to me:

This explains a lot of those debug log reports over the years!

The symptoms he uncovered showed up as a very unusual sort of 'doing things before other things have completed' recursive issue. One example of it is that a monster would 'say' something to trigger another monster, and the second monster would perform its action before the first monster could fully complete its 'say' command. In obscure cases, this chaining could be several layers deep.

For simple things like monsters talking to each other, the worst that can happen is that some things get out of order, and you might not understand why it works sometimes but not others. But monsters do substantially more than just talk to each other.

And this is where the problem arose. In the course of doing more complicated actions, those actions could be interrupted mid-stream by other monsters trying to complete their triggers. One such set of actions would cause the get_room function to be passed trash. Another such set of actions would damage one of the monsters so that it could never properly die. Both of these sets of actions, and a number of others with similar strange effects, were possible and implemented in monsters on the game; and they were sufficiently rare to explain the infrequent bug reports.

This bug was easier to fix than the first, taking only about three hours to really think about and put together a proper solution. This fix also appears to work, though it does break a dozen or so special monsters that relied on the old behaviour to function.


There are several morals to this story, which all software engineers worth their salt will immediately recognize:

1) Never assume you'll remember anything about code you've written. When you start working on objects so complicated that it takes you an hour every day to load it into your head so you can go to work, what makes you think you'll remember every detail after a year?

2) Think about the design of any halfway complex system and stick with it. The charm/entangling roots bug was caused entirely by undesigned/spaghetti code for the character destruct process. It was never properly designed because I wasn't experienced enough at the time to know how. It's better now.

3) Never underestimate the power of race conditions and call trees. When even simple/obvious actions can invoke arbitrarily complicated effects capable of invoking other effects, you're walking on very, very dangerous ground.

Software continues to become more complex as time goes on. It pushes at the limits of our minds, bringing programmers to the limit of what they can understand and then begging them to add one more thing. It allows arbitrary expression, but with that comes the cost that our comprehension is limited even for structured objects, to say nothing of more arbitrary and abstract ones.

Where does the future of software lie? Undoubtedly toward increasing complexity. But we will need either better tools, or better brains, to be able to manage it. We are such a young species.