« December 2005 | Main | February 2006 »

On Being an Angel

Wcsrating_2 In the last month or so I've received a number of links to Life With Alacrity as a venture capital blog, and to myself as a venture capitalist.

However, I don't consider myself a venture capitalist. Instead, I am what is known as an "angel investor".

This week has also seen a new topic enter the blog zeitgeist: the topic of reforming or reinventing venture capital. This topic was initially raised by Dave Winer, followed by Robert Scoble, Doc Searls, Jeff Nolan, Michael Arrington, Thatedeguy and many more.

All types of venture investment -- seed, angel, venture, and institutional alike -- carry with it great risks and great rewards. But before we can reinvent venture capital and related venture funding methods like angel capital, we need to understand how it works.

So What is a Venture Capitalist?

A venture capitalist is a partner or associate in a venture capital management firm, which manages money on behalf of large institutional investors.

Basically, a large institutional investor (such as a pension fund or an insurance company) can statistically afford to invest a small part of their portfolio -- perhaps from 1% to as much as 5% -- in high-risk, long-term investments. If they lose the money outright, their other more stable investments have a good chance of making up the loss. But if the high risk investment does well, they can substantially improve their IRR (internal rate of return). To a certain extent they can't lose if they are careful. So these institutional investors invest in a number of types of high-risk funds, including such investments as venture capital funds.

A venture capital management company will manage one or more of these funds, investing in private companies. These VC management firms operate off of a management fee, from 2% to 3% of the capital invested to date. Thus all of the salaries for the staff of a VC management firm are paid, even if the investments are a failure. In addition, if any of the investments are successful, the VC management company earns 20% off of the top of the gain (called a "carry"), which is distributed to all of the full partners in the VC management firm, and sometimes a little of it to the associates.

It is the VC associates that do the brunt of the work for a VC management firm. They make a good salary, but the real return is if they are able to do well in identifying, managing, and selling new startups; then they are invited to become a partner the next time the VC management firm raises a fund. Then if the fund that they are a partner in does well, they can make a true fortune, or even start their own VC management firm.

However, the odds are against the VC associate. It's common wisdom that an associate can't easily manage more the 7 firms at a time. Other common wisdom says that 1 in 5 investments will survive to break even and that 1 in 20 will "make the fund", i.e. pay for all the losses in the other 19 investments. Some newer firms say 1 in 10, but I'll go with the older more conservative numbers. Thus associates are incentivized to try to manage more then 7 investments and to be smarter than their peers in the firm, so that at least one of their investments will be the 1 in 20 that makes the fund. This makes it easier for the associate to become a partner in the future, as at best 1 in 3 or 1 in 5 associates becomes a partner. Cutthroat competition between associates exists in some firms. This pressure often adds the perception that associates don't give enough attention to companies in their portfolio; they want their startups to do well, but the odds are it is another company that will make it, or a startup managed by peer associates. So they divide their attention. This is not unrelated to the Dunbar Triage problem.

Another problem that VC management firms face is the number of investments they are able to effectively handle. If there are 5-6 associates and 2-4 partners, there is probably a max of 50 investments that they have time to manage. If they are managing a $500 million dollar fund, that means that they have to invest at least $10M in a company, but in fact that is more likely to be $25M over time. If 1 in 20 makes the fund, that $25.0M has to give a return of $250M.  Thus when entrepreneurs complain that VCs will not invest in their company, it is often because the VCs can't figure out how to invest a minimum of $25M and turn out at the end with $250M. A related problem is that a startup that might have a successful business model that could grow into a profitable $50M annual revenues will be encouraged to take a more risky route so that they can go public, which requires a minimum of $100-200M annual revenues.

There is a lot of variety in VC management firms; some VCs have smaller funds under management, others give their associates more of a share, others have different management fees or carry percentages, and most specialize in some way: either vertically in a particular field, or horizontally in a particular stage of investment. For instance, there are some VC firms known as mezzanine firms that only invest in your company right before they think it can go public.

This is the way most VC management firms work. Periodically a new VC management firms will explore and push the limits of the above boundary conditions, but the more edges they attempt, the more likely they will fail.

My Three Angels

So what is an angel investor? I learned a lot of what I know from the 3 angel investors that invested in my software startup, Consensus Development.

Gifford Pinchot -- Partner Angel

Gif_and_libba Gifford Pinchot, with his wife Libba, was my first angel investor in Consensus Development. We met at a Maxis meeting where Gifford had been asked to facilitate the formation of a new startup to create simulation software. At the end of the meeting we left frustrated with the results of the meeting, but Gifford liked what he heard about my broader vision. Gifford flew me to San Diego, where we walked the beach and discussed my vision for collaborative software. He liked what he heard, and later in the month flew me to his home in Connecticut, where I stayed for a month in a barn guest house near his home while we worked on our first business plan.

Gifford only invested a low 5 figures, which got me started. However, it wasn't his money that was his most valuable contribution -- it was his time. Over the years he probably put 5-10% of his into time as Chairman of Consensus Development working with me, talking to me, advising me, and coached me. When our first software effort, InfoLog (a folksonomy tagging program like del.icio.us that was a decade too early) failed, he didn't walk away and instead encouraged me to continue. I dug deeper into the problem, discovered that trust and security were a key obstacle, and created a profitable consulting business. But Gifford encouraged me when I said we were going to take the risk of dropping all of our profitable consulting and focusing on a product, SSL Plus. Later, when this company was being shopped around to various buyers, Gifford spent lots of time doing due diligence, and ultimately came on half-time as CEO so that I could concentrate on selling the business.

In the end, Gifford earned probably 7 figures on his initial 5 figure investment, close to a hundred-fold return on the dollars he invested. However, his real investment was the time he spent with me -- almost 10 years of never giving up.

Scott Loftesness -- Seed Angel

People_scott I met Scott Loftesness when he was the executive vice president at Visa International. We learned of each other through CompuServe, where we both were sysops in the 80s. I did some consulting for him at Visa in the groupware area over a couple of years and we grew to respect and trust each other. I came to him when I branched out from groupware consulting and began to include consulting on cryptographic security. I'd seen an opportunity--I had a potential contract from RSA Data Security to be a distributor of RSAREF--but in order to take advantage of this opportunity I needed some seed capital.

Scott invested over twice what Gifford invested, but still 5 figures. However, like Gifford, what I gained from my association with Scott was a lot more then the seed capital. He had a respected name in the industry -- a friend at Visa USA told me "Scott is where all innovation at Visa flows from." He joined my board of directors, supported our risky choice to drop all groupware and cryptographic consulting to focus on our SSL project, helped tremendously in doing due diligence on potential buyers, and was pivotal to the negotiations to close our final sale of Consensus Development.

In the end, Scott Loftesness also did quite well in his investment in Consensus Development. His involvement on the day-to-day operation of Consensus Development was significantly less, but he was always around to support and advise us when we needed him.

Jim Bidzos -- Hands-Off Angel

Bidzos Jim Bidzos was the CEO of RSA Data Security, whose firm had a critical patent on almost all meaningful cryptographic security. Over the years I did a lot of consulting for him to support various projects like RSAREF in standards, to create client tools for their Certificate Services Division, and to help with the founding of Verisign.

One day I told Jim that RSAREF would never be successful in his goal of promoting the RSA algorithm in security standards as long as it could only be sold through RSA salespeople. They preferred to sell RSA's premiere toolkit, BSAFE. I somewhat jokingly proposed that maybe Consensus Development should sell it instead. To my surprise, he agreed.

A couple of years later I leveraged the fact that Consensus Development had the only RSA toolkit available other then RSA's own to get the contract to develop the reference implementation of SSL 3.0 for Netscape. I took this Netscape contract back to Jim and said that I needed some investment to make this successfully. He invested a middle six figures in Consensus Development in return for a percentage that was roughly equivalent to that of Gifford and Scott, but because of his involvement as CEO of RSA Data Security he could not be on our board of directors.

After this investment, Jim had very little to do with Consensus Development. In fact, he had spread his angel money so widely in the cryptographic security industry that he was also invested in a couple of our competitors. In the end his investment was worth roughly 10 times what he invested, but the cachet of being able to tell others that Jim Bidzos was an investor made Consensus Development much more "legitimate", which also added significant value to us.

Founding of Alacrity Ventures

After I left Certicom, the company that had purchased my firm, Consensus Development (see Bad Business of Fear for more info), I wondered what I should do next. I could theoretically retire if I abandoned the Bay Area, but I was not ready for that and I thought I had maybe enough capital to start one more business of my own instead. Under a non-compete from Certicom, I was not sure what type of non-cryptographic business I wanted to start. So I decided that one thing I could do was some angel investing. In part this was to make money, but a larger part of it was that I enjoyed working with entrepreneurs. I wanted to do for others what Gifford Pinchot had done for me.

I did some study about how venture economics works, how angels and venture capital firms invest, and became concerned. I saw that being an angel investor in many ways is much harder then being a venture capitalist.

One of the biggest challenges is that angels share all the problems of the institutional investor, of the VC management firm, and of the VC associate.

The first challenge is deciding how much to invest. The institutional investors only risk 1%-5% of their capital. If I limited myself to that amount I could maybe invest in a couple of companies. I decided I was still young and could risk investing more.

The second problem was no management fee -- unlike a VC firm, angels don't get a management fee to cover salaries, legal fees, other expenses.

The third problem was my time. Most angels still work for a living -- being an angel investor is part-time, a venture capitalist typically works full-time. If only 1 in 20 investments "make the fund", but I could at most manage 7 investments, that meant that I had a 2/3rd's chance of losing my entire investment. I might be able to argue that for some kinds of businesses I might more informed than the average VC, and thus might be able to make better choices, but not that much better.

Harold The key, I decided, was to work with at least 2 other angel investors. That would theoretically allow us to invest in 21 companies, diversify our portfolios, and split the work. I approached my first angel investor, Gifford Pinchot, and he agreed to be one of the partners. The second was Harold Shattuck, who had done some due diligence and operations consulting for Consensus Development, and had been VC once before, but enjoyed being closer to the actual building of a new company with some operating interaction. I was the managing partner for files and accounting, but we all brought to the table our "deal flow", performed due diligence together, and worked closely with each other.

Lessons from Alacrity Ventures

Alacrity Ventures is over 6 years old, and I have learned many lessons from it.

First, I feel that we did a good job selecting our investments, during a time in which being an angel investor was very difficult. I discovered that Gifford, Harold and I were really good at due diligence; our differing skills, Gifford's in coaching and evaluating the management team, Harold's in operations and business models, and mine in technology truly complemented each other.

For a long time I could say that the good news was that that out of 13 investments, all but 1 were still in business. However, we were never able to invest in the 21 investments that we planned because we discovered a significant problem in angel investing: the VC.

The angel investor can only really afford to invest early on, as a seed investor, or in an early investment round such as series A. However, the firms we invested in needed more money along the way; in fact, almost all firms need money at more then one point. The venture climate at the time was such that the VCs required in their term sheets that previous investment rounds lose their liquidation preferences, and ultimately their investment.

Let me give a specific example -- we invested in a first round of an enterprise software company in 2000 that is still around today. In 2002 they needed more money, and because of the difficulty in getting VC investment, the lead VC insisted that the preferences from the previous rounds be removed, effectively making us common stock, unless we participated in this subsequent round. We reluctantly did invest some more, but because we don't have the funds that a VC has, we were only able to protect some of our preferred stock. A year and half later, the software company needed more money, and the VC did it again. This time, all our stock was converted to common. Now it is 2006, and the company might be acquired this year; however the VCs, because of their liquidation preferences, will get the first $65 million (or more). As I doubt the firm is worth more then $50M, we will not get anything, nor will any of the other founders that are no longer involved with the firm.

This has repeated itself over and over again. We made a decent choice and did our due diligence well, but subsequent VC investors have pushed us out. A few of our ventures have failed outright. That is understandable given our original 1 in 20 expectations. But what we didn't expect was how difficult it was going to be to participate in the upside. Yes, we had preferences in our early rounds that should have protected us, but they didn't.

So of our 13 investments, only 2 remain that may "make the fund": a very innovative high-tech titanium powder manufacturer ITT, and a high-tech manufacturer of ceramic devices Vapore. But even as these two investments survive, they are still vulnerable to requiring additional investment and possibly forcing us out.

Of the rest: one of our early investments sold to VeriSign at a 50% premium, our investment in Salon.com will give us a small return, MG Taylor paid off its loan, and Skotos may someday pay back its original investment. The other 8 are being written off as a loss.

Advice to Angels

So in spite of the odds, you still want to become an angel investor? Here is some advice...

Collaborate with other angels: Going it alone is dangerous -- there are a number of angel investor networks, such as Gathering of Angels, Band of Angels and others in listed the Directory of Angel-Investor Networks. Be careful, though, the enthusiasm of others can be contagious -- don't always go with the herd.

Do your own due diligence: I can't emphasize this enough. Talk to the entrepreneurs and meet their staff. Read their business plan and tear it apart. Find the hidden assumptions. Understand their business model. It needs to feel realistic. Try to get more eyes on the job: different people see different things. Don't follow others; they may have different investment criteria then your own.

Be an advisor first: Be an advisor first -- if the entrepreneurs don't listen to your advice, don't invest. If you have to invest to become an advisor, invest only a small amount, or have part of the money be contingent on a meaningful goal.

Guard your upside: When negotiating terms, don't worry about the downside. It is the VCs that need items on the term sheet for when things go wrong -- what you need to guard is for when things go right. Watch for changes in the executive staff -- they may be incentivized differently than you are.

Consider a secured loan: Somewhat contrary to the "guard your upside" advice, rather then investing only in stock, consider investing via a secured loan as well. The security can not only be on hard company assets, but intellectual property such as copyrights, trademarks or patents. Your return will be lower on the loan, but if you can get all of your investment back early and get a small percentage of the company, it can be a good way to balance risk. Just remember to file the property documents to make sure that the assets are properly secured, and be prepared that someday you may own that asset.

Save $2 for every $1: Almost every company you invest in, even if successful, will need additional funding. Make sure that you keep on hand $2 for every $1 initially invested. This will also help you from being squeezed out by later VC investors.

Invest in acquisition targets: Let the VCs take companies public -- the companies that you should be interested are the companies that will eventually be acquired. Creating an acquisition target requires the management to think differently -- coach them to do so.

Understand the founders dilemma: There are many founders dilemmas, however, one is particularly important to the angel investor. A founder may be incentivized to sell sooner then his early investors. Remember that most often, the only significant asset that a founder has is his company. If the founder has an opportunity to sell early and buy a house, he might, even if it may not be enough return on investment for the risk that the angel took. Find ways to keep your interest aligned with that of the founders, which may include even buying some stock directly from the founder.

Consider alternative exits: There are lots of boutique opportunities that are too small for VCs. I know of a local Berkeley software company that was number one in their market, but too small to go public. They had $20M in annual revenues, and profits of almost $10M, but little opportunity for growth -- early investors could have gotten their money back in dividends rather than sale of the company.

Time the cycle: We didn't invest at the ideal time for the angel investor. We picked well considering the times, but had we waited for a few years it would have been easier. Not to say that timing is everything; we'd have lost our titanium powder opportunity if we'd waited for better market timing.

Respect people: Treat the people you invest in like a paying client. Respect their time and concerns.

Be prepared that the plan will change: I've never been involved with a business where the business plan doesn't significantly change. As an angel investor you need to help your businesses to plan for those changes.

Advice to Entrepreneurs

So you want investment from an angel investor? Some advice...

Recognize the odds: The angel investor is taking a substantial risk investing in your company -- you need to be able show a scenario where the investor might be able to make 10x or 20x their investment. So if you are looking for $100K, you need to show how the angel can ultimately have stock worth $1M to $2M.

Consider their advice: Angel investors may not always be right, but show them that you are listening. If you use angels for more then just a source of money, you'll get a lot more value from them.

Draft your business plan: An angel investor does not need as complete a business plan as a VC does, but they need to see how you think. You should clearly identify what the product or service is, who is going to buy it, what is the marketplace that those buyers may find it in, what differentiates your product or service and why your team is good enough to deliver. Angel investors know that your plan will change, probably drastically, but if they understand your thinking process they can be more confident that your company will survive change.

It takes time: Don't count on the money from an angel investor (or any investor) until you get the check. Investors are always selecting from a number of choices, often very competitive choices. No matter how optimistic you are, it is likely it will take 6 months or likely more to raise angel money.

Team with Many Hats: Angel investors don't recruit new team members for you. You don't necessarily have to have your whole team in place, but there at least needs to be someone who has experience managing, someone with development experience, someone with marketing experience, and someone with sales experience. Whatever team is there, they need to be able to juggle all of those hats. Financial, HR, and administrative positions can all be part-time or farmed out.

Advice to Venture Capitalists

Value the angel investor: The angel investor serves a point in the marketplace that you are not able to serve. Rather then driving them out, find some way for them to continue to participate so that they can find other ventures for you.

Angels are not VCs: The angel investor can't afford to invest in later rounds -- their model is different than yours. It may make sense to force participation in subsequent rounds by other VCs, but carve out some room for angels.

The future of Alacrity Ventures

Though I've enjoyed some aspects of being an angel investor, I enjoy working with creative people to innovate new products more. I expect to spend most of my time in the next few years continuing to explore social software and collaboration tools, and the new product opportunities that may evolve from them.

Thus I expect that any future angel investments I make will be more along the lines of Gifford's style of investment in Consensus Development: a small investment of money and a large investment of time. Harold and Gifford both feel the same way. Currently we plan to continue monitoring our existing investments, but don't plan any new investments unless we can take a more active role in the firm -- for instance Harold is a board member in Vapore.

Gifford is now dedicating his life to building a better world by transforming business education. He is a co-founder and President of the Bainbridge Graduate Institute, which provides an MBA program integrating sustainability, green economics, the internet, and open source within a traditional MBA program. As an open source school, he helps other schools to use BGI’s curriculum. Check out his blog entry on Angel Philanthropy.

If there's one thing we've learned from six years of angel investing, one thing that may be more valuable than all the nuts and bolts I describe here, it's that Gifford Pinchot's partner-style of angel investment is what suits our investing style, not Jim Bidzos' style of hands-off angel investing, and that's a lesson that we're going to carry forward with Alacrity Ventures.

Posted on January 31, 2006 at 06:17 PM in Business, Social Software, Web/Tech | Permalink | Comments (13) | TrackBack

Collective Choice: Competitive Ranking Systems

by Christopher Allen & Shannon Appelcline

[This is the third in a series of articles on collective choice, co-written by my collegue Shannon Appelcline. It will be jointly posted in Shannon's Trials, Triumphs & Trivialities online games column at Skotos.]

In our first article on collective choice we outlined a number of different types of choice systems, among them voting, polling, rating, and ranking. Since then we've been spending some time expanding upon the systems, with the goal being to create both a lexicon of and a dialogue about systems for collective choice.

This time we're going to dig more into comparison ranking systems, by focusing on competitive rankings and looking more in depth at ELO Chess Ranking System and the other systems that we briefly mentioned previously. Our goal is to explicate these systems, to better address their flaws, to begin detailing the purposes of ranking systems, and to show how those purposes are critical in the design of ranking systems.

Subjective vs. Objective Rankings

In our original article we discussed rating systems as being largely subjective and ranking systems as being objective, but the situation isn't nearly as simple as that. In truth, there's a clear spectrum of ratings and rankings with varying amounts of subjectivity and objectivity in each collective choice system.

Bcs_1 Golfrankings_1 The Bowl Championship Series (BCS) for college football is a good example of a ranking system that explicitly allows a subjective component. It involves a complex mathematical formula that includes things like win/loss ratios, but also sportswriters' and coaches' ratings.

However, public opinion continues to show that people don't necessarily like seeing true ranking systems having subjective components, because they expect them to be "fair". The BCS formula has come under attack several times in the last few years precisely due to its subjective basis. Cal Berkeley was one of several teams denied a bowl position in 2004 when many felt that they were worthy.

The APL tennis rankings and the official world golf rankings also have a subjective component, but it is much more subtle. Each tournament is worth a certain number of points, and the allocation of those points is relatively arbitrary, based upon the "prestige" of each tournament and the quality of players who have traditionally played in it. The subjectivism isn't quite as near to the surface as that of the college bowls, but it's still something that can have a notable, and perhaps unwarranted, effect upon the final results.

Algorithmic Rankings

Wcsrating_2 This brings us back to the ELO system, a ranking system originally designed for chess which is fairly well-known and well-understood. As we said in our overview article, "[ELO] builds a simple distribution of player ratings around a norm (typically 1500 points), then awards or deducts points based upon wins and losses, with the total sum of all points in the system staying constant. Players are then ranked according to their comparative scores."

The big difference between this and the previously discussed systems is that it's almost entirely objective; in fact it uses a statistical basis to create an underlying mathematical model for rankings, rather than allowing human subjectivity to get in the way.

The simplest formulation for an ELO rating looks like this:

R' = R + K * (S - E)

R' is the new rating
R is the old rating
K is a maximum value for increase or decrease of rating (16 or 32 for ELO)
S is the score for a game
E is the expected score for a game

Much of the trick is in figuring out what the (E)xpected score of a game is. ELO uses the following formulas for players A and B:

E(A) = 1 / [ 1 + 10 ^ ( [R(B) - R(A)] / 400 ) ]
E(B) = 1 / [ 1 + 10 ^ ( [R(A) - R(B)] / 400 ) ]

It's a good model because, using the two formulas, it means that a great player gains little from beating an average player, but an average player gains a lot from beating a great player. Take the following example:

R(A) = 1900
R(B) = 1500
E(A) = 1 / [ 1 + 10 ^ ( [1500 - 1900] / 400 ) ]
     = 1 / [ 1 + 10 ^ ( -400 / 400) ]
     = 1 / [ 1 + 10 ^ -4 / 4 ]
     = 1 / [ 1 + 10 ^ -1 ]
     = 1 / 1 + .1
     = .91
     = 91%

E(B) = 1 / [ 1 + 10 ^ ( [1900 - 1500] / 400) ]
     = 1 / [ 1 + 10 ^ ( 400 / 400 ) ]
     = 1 / [ 1 + 10 ^ 1 ]
     = 1 / 11
     = .09
     = 9%

Player A is expected to score .91 in an average game, which is to say he should win 91% of the time, and will be punished accordingly if he loses to player B:

R' = 1900 + 32 * (0 - .91)
R' = 1900 - 29.12
R' = 1871

Conversely a win nets him very little:

R' = 1900 + 32 * (1 - .91)
R' = 1900 + 32 * .09
R' = 1900 + 2.88
R' = 1903

ELO is almost entirely mathematical. Players can gain or lose different amounts of points based upon playing different players, but this is all part of the formula. The only slightly subjective element is the definition of K -- how much a player can win or lose from a particular game. The most widely used ELO systems for Chess break K down into two values: 16 for masters and 32 for everyone else. So there is a subjective decision that masters should vary their score less frequently than other players.

That's a very minor element in an otherwise objective system, but as we'll see, more recent systems by Days of Wonder and Microsoft first reduce, then eliminate even this subjectivity.

Variations of a Theme: Days of Wonder

Dowlogo_1 ELO is probably the most used ranking system in the world. You can find it in use for Go, Tantrix, and many other games. Days of Wonder, producers of Gang of Four, Ticket to Ride, and many other games use a variant of the system which they describe on their website.

They identify three core problems with ELO:

  1. New players can take a long time to ascend or descend to their correct levels.
  2. Highly ranked players can be hesitant to play with provisional players whose ranking might be much more uncertain.
  3. There are no allowances for games with more than two players.

Days of Wonder resolved the first problem by creating a new formula for provisional players, allowing them to rise and fall in the rankings much more quickly.

Conversely when playing against provisional players, regular players can only lose a maximum of K*n/20 points, where n is the number of games that the provisional player has played--rather than the normal maximum loss of K. For example, playing someone who has just played one game, can only result in a loss of 1/20th of the regular K value, and so it really doesn't matter if the provisional player's ranking is wildly out of whack.

Both of these new formulas are set up to converge toward a normal ELO formula as a provisional player's number of games approaches 20 (making them a normal player at Days of Wonder).

(It should be pointed out that using the number "20" to define a provisional player, and making a player less provisional in clean 5% steps, inevitably offers yet another small, subjective element into this mathematical formula; as we'll see momentarily Microsoft has more recently incorporated the idea of provisional uncertainty into their core mathematical model, much as the whole ELO system originally turned subjective win and loss statistics into tighter mathematics.)

Ttrskotosrankings Finally, to resolve the situation of multiple players, Days of Wonder considers each game to be a set of duels, as described here:

There are 4 players in a Gang of Four game. Let's name A the winning player, B the second one, C the third one and D the last one. We consider that there were 6 duels: A won against B, C and D. B won against C and D. C won against D. We compute independently the new scores for each duel, and then we average the values for each player.

It's a fairly elegant answer that not only rewards or penalizes all players separately, but also encourages playing for second place, or even third, if first isn't possible.

There have been continued discussions of the Days of Wonder ELO variant in their forums, and the questions raised there are common to many different ranking systems. Some players wanted unranked games, while others thought that having unranked games would discourage people from playing good competitors except in unranked games.  There has also been a lot of discussion regarding Ticket to Ride, a strategy game that supports 2-5 people, and whether the ELO variant system discourages multiperson play.

The various lessons learned at Days of Wonder underline two basic ideas about rankings. First, even with a well-studied system like ELO, there's still a lot to understand, and, second, any ranking system needs to reflect the specifics of what it's ranking -- and what its purpose is.

Variations on a Theme: XBox 360 Live

Trueskillxbox360 An even more recent large-scale ranking system is the TrueSkill system developed by Microsoft for use with the XBox 360. It appears to be an expanded variant of the glicko ranking system used by the free internet chess server.

Many of the problems identified by Microsoft were the same as those already noted by Days of Wonder and others, including: the uncertainty of provisional ratings and the need to rank players in multiplayer games. However, the TrueSkill system notably expands both issues. Ranking uncertainty is now defined as a mathematical concept and the rankings now support not just multiple players, but also multiple teams.

TrueskillTrueSkill explicitly includes two values in any ranking: a skill level and an uncertainty level. The first, like the more common ELO ranking, tells how good a player is. The second states how sure that ranking is. The uncertainty rating is effectively a margin of error, similar to those we saw in polling systems. If a first-time player has a skill rating of 25 with an uncertainty rating of 8.3 that means that his skill is probably somewhere in the range of 16.7 to 33.3, a pretty wide range, but then this is a totally untested player. According to benchmarks that Microsoft produced, 99.99% of actual skill levels were within 3x of the uncertainty rating, and 100% were within 4x.

The rest of TrueSkill's innovations are built around this model of uncertainty. All players win or lose skill points, based upon how many players they beat or lose to, and they also decrease their uncertainty rating as they play more games. However, uncertainty is decreased more for players toward the middle of a pack within a game than those around the edges (because on the edges the players could actually be much better or much worse than it is possible to see from a specific game). In addition, TrueSkill is only a zero-sum ranking system for players at the exact same level of uncertainty. The more uncertainty that an opponent possesses, the smaller the weighting of any gain or loss (much like the simpler system that Days of Wonder uses, which bases weightings of games against provisional players as n/20).

Overall TrueSkill is a somewhat complex system that is described more fully at Microsoft's web site. Some of their expansions had already been considered by others, but still their system is notably innovative in two ways:

  • Expanding a competitive ranking system to include concepts of teams.

  • Incorporating the uncertainty of ratings further into the core mathematical model, rather than using a somewhat more subjective model such as that described by Days of Wonder for provisional players.

Trueskillcalculator_1 The TrueSkill calculations are a bit complex. In general, that's not a problem for a computer-based ranking model because you can have a computer doing all the computations, and players only need to understand the results. However the two-part ranking system used by TrueSkill, which notes both skill level and uncertainty, does offer a potential problem on this latter point. Can players understand it? In general, the concept of uncertainty will not be understood by people other that statisticians, thus raising a real user-interface question with the TrueSkill system -- and the exact sort of thing that designers of new ranking systems will need to consider.

Variations on a Theme: A Tale in the Desert

A_tale_in_the_desert_logo_1 The online game, A Tale in the Desert, identified a different problem with the ELO system: cheating. This is a uniquely Internet-based problem, because there users can create fake accounts, then defeat those accounts to win points. This can also be done more subtly, by having multiple additional accounts build up the rating of that fake account before the fake account is defeated. So a totally new ranking system, called the eGenesis Ranking System, was created.

Each player is ranked through a 256-bit vector, half of which is initially set to 0 and half of which is set to 1 (therefore creating an average ranking of 128). Whenever a match occurs between players a hash function based on the players' names mathematically selects 32 of those bits, 8 of which are then randomly selected. Among those bits, any 1s in the loser's vector which correspond to 0s in the winner's vector are "transferred".

This simple design corresponds in some ways to ELO's more complex formula. A good player will have more 1s and thus more to lose, and he will lose correspondingly more to a poor player who has more 0s in his vector.

However, the system also prevents the collusion earlier noted. Statistically, a single player will only ever gain 8 ranking points from another new player, since out of the 32 bit hash only eight of those will, on average, be in the correct 0-1 configuration. Expanding a group of players expands the number of points that can potentially be gained, but within real limits.

Wowsocialmap_1 In fact, the eGenesis system prevents cheating by measuring the size of social networks, then limiting the number of ranking points that can be earned within a social network. It's not necessarily the only way to measure social network size, but its methodology points toward social software as an interesting area for additional study of ranking systems.

As with XBox's TrueSkill, the eGenesis algorithms are overall fairly sophisticated and confusing, perhaps more so than TrueSkill itself. However, unlike TrueSkill the output is very simple: a skill number between 0 and 255. The intricacies are hidden by the system.

Competitive Ranking Goals

Ultimately, as we mentioned when discussing Days of Wonder, any ranking system has to be measured by what it's trying to do and how well it does that. ELO and similar numerical, long-term ranking systems, are most likely trying to achieve one of three goals:

Hierarchy: Players are divided into hierarchies of success, giving players goals to constantly strive for and ways to measure their success (or failure).

Matching: Players can play with other players at their same skill level, rather than having to play beginners or experts who are much better than they are. This generally increases everyone's enjoyment. For computer games, the complexity of a matching system can be largely moderated by the computer, thus ensuring better competition.

Handicapping: If players do play against others of different skill levels, the better players can be handicapped in automatic, appropriate ways for the game in question, again increasing the fairness of games and everyone's enjoyment. For instance, someone ranked 3-kyu in Go playing a less experienced 7-kyu player would give him a starting 4 stone advantage to make for better competition.

The ELO system may be a good matching system, which allows players to easily find other players of their same skill level and play against them. However it doesn't provide any way to handicap players, nor would the ELO method necessarily be a good one to analyze handicaps (and conversely a golf handicap might not do a good job of finding like players nor measuring players' ability in a hierarchy).

More recently the XBox system has stated that it's explicitly for matchmaking, with the goal being to always try and match up players at nearly the same skill level. It's also used for hierarchy (or "leaderboards" as it's described in the TrueSkill docs), but that's clearly a subsidiary purpose.

All of these systems would be ineffective for measuring a winner in a live event, which is a very different goal:

Tourney: A single player is listed as an absolute winner, the "King of the Hill". Often, second, third, and fourth place winners are measured too.

And, the systems we've discussed thus far may not be useful for measuring privileges, yet another goal:

Threshold: The best ranks of players can be given special privileges, including the ability to create games and form tournaments. Alternatively, they can be given privileges totally outside the game, again giving them something extra to strive for.

For each of these additional goals we may need to consider very different ranking systems, not just variations of ELO.

Different Themes: Tourneys

Tournament_1 There are a number of well-known tournament types which can be used to create a "King of the Hill" ranking.

The simplest is the single-elimination tournament, where the winner of each competition moves on to compete with other winners, until there is only one. However, this style of tournament is quite cut-throat and is not suited very well to events where the competition may result in a draw, or where chance is a notable factor in the competition. It also has a very subjective factor in the initial seeding of the rounds. The single-elimination tournament also does not rank the losers. However, by having the losers compete with each other in a Swiss-style tournament, the relative strengths of the players can be ranked.

Pseudodoubleelimination_1 An improvement is the double-elimination tournament which is now one of the best known tournament systems in sports. Players compete in series of two-player matches, and a player has to lose twice before he's eliminated. This is done through a system of winner and loser brackets, wherein people drop from the winners' brackets to the losers' brackets when they lose once, and drop out altogether when they lose twice.

One problem with standard double-elimination is that there are unusual situations where a significantly inferior player can still make it to the final round, or the last player to remain undefeated can lose only once and still be eliminated. These can be addressed through variants such as face-off (requiring the last two remaining competitors to compete again if the undefeated team is defeated for the first time in the finals) or by reconfiguring the loser's brackets.

Wsc_1 Round-robin tournaments, such as official Scrabble Tournaments involve every player playing a set number of games (24 in the 2005 World Scrabble Championship), facing opponents with similar win-lose records. They then ultimately rank players by their win-lose ratios.

The advantage of these sorts of tournament over an ELO-style ranking is that they're easily understandable and seem fair. In addition, they measure ranking in a much more topical manner: how well someone is playing during a singular instant, rather than over a longer career. As a result they work much better for a live tournament.

Different Themes: Thresholds

As we discussed in our original article on Collective Choice, thresholds are ranking barriers above which members get a special ability--or alternatively levels below which members lose a special ability. They can also act as another goal for a ranking system.

Gosmall In the game of Go there are both amateur and professional players. Although they aren't technically in the same hierarchy of rankings, the highest Go amateur ranking  (7 dan) is approximately equal to the lowest Go professional ranking (1 dan), forming a de facto threshold.

Uscf_1 Likewise the United States Chess Federation uses their ELO rankings to denote Chess Masters. Anyone who achieves 2200 UCSF is given a National Master threshold ranking and anyone who maintains it for 300 games is given a Life Master threshold ranking.

Acblopt2_1 The American Contract Bridge Association uses a threshold system where you have to win a certain number of tournaments and thus earn masterpoints in order achieve official rankings such as "section master". Furthermore, players may earn different "colors" of masterpoints depending the difficulty of the tournament, and some ranks require that you earn at least some specific colored masterpoints in order to meet the requirements for the next threshold.

These thresholds are fairly explicitly based on other hierarchical ranking systems, but this doesn't need to be the case. Since determining the purpose of a ranking system is often the first step in designing it, as we delve further into the area of thresholds we may well find that systems specifically dedicated toward measuring thresholds are more likely to do so well.

In our next article we'll consider among other things the Avogadro reputation system, which manages thresholds in such a way as to prevent cheating.

Conclusion

There's actually a lot of variety in ranking systems, and even though we'd like them to be totally objective, various subjective elements often creep into these systems. In addition, there's a lot of variety in what ranking systems can do. For competitive systems, hierarchy, privilege, matching, and handicapping are some of the top purposes of ranking. Determining what a ranking system is going to do is a necessary first step in designing the system, as different systems will accomplish various goals to a better or worse degree.

ELO, in several variants, is the best studied and most used competitive ranking system. It works particularly well as a matching system. However, even ELO has flaws in it, among them: issues with new player rankings; its core two-player basis; its lack of provisions for teams; a few minor subjective elements; and problems with cheaters. New systems continue to be rolled out on the Internet to resolve these issues, and overall, it's an area of interesting new study.

Tournament systems and threshold systems offer a few good examples of competitive ranking systems with very different purposes, underlying the need to understand what you're doing before you do it.

Ranking systems also lay very near yet another type of Collective Choice: reputation systems. We briefly addressed reputation systems when talking about threshold systems and will return to this in our next article.


Related articles from this blog:

  • 2005-12: Systems for Collective Choice
  • 2005-12: Collective Choice: Rating Systems
  • 2006-08: Using 5-Star Rating Systems
  • 2007-01: Experimenting with Ratings
  • Related articles from Shannon Appelcline's Trials, Triumphs & Trivialities:

  • #192: Managing User Creativity, Part One
  • #193: Managing User Creativity, Part Two
  • #196: Collective Choice: Ratings, Who Do You Trust?
  • #198: Collective Choice: More Thoughts About Ratings
  • Posted on January 3, 2006 at 11:37 PM in Politics, Social Software, User Interface, Web/Tech | Permalink | Comments (5) | TrackBack