« Systems for Collective Choice | Main | Collective Choice: Competitive Ranking Systems »

December 12, 2005


Feed You can follow this conversation by subscribing to the comment feed for this post.


I've ran into my fair share of voting systems, but by far the worst I've ever seen is the one used at www.animenfo.com, a database of anime products with user ratings. When giving a product a rating, a user is required to also write a constructive review on the product. However, the reviews are moderated and, often, removed from the database as "unconstructive" by the limited number of moderators. The problem with this approach is that the moderators are inevitably biased. A submitted, bad rating on a product, if reviewed by a moderator who personally likes the show being rated, is often removed from the database with the claim that the review was not constructive. Perhaps not entirely related, but an example of ratings-gone-awry, all the same.


Brad Templeton

It's rarely used, but asking people to rank from -5 to +5 can be better than ranking from 0 to 10. You make it very clear that "0" means average/neutral. You won't entirely eliminate the positive bias in ratings but you can reduce it. And you bring everybody closer to the same mean.

I've concluded over time that in fact the response to a negative feedback on eBay should be disabled, as it contains no information. Who, after all, is going to give a positive feedback to somebody after they have slapped a negative on you? The likely choices there are no-feedback or neutral from a charitable person and of course a revenge negative.

So they could make it simpler. If the first person gives a negative, the transaction is considered sour. In your stats it would list the number of first-negatives you left for others. (The number of negs you leave for others is already visible but harder to find.) Ideally split buyer/seller.

F. Randall Farmer

An interesting point missing from this good summary:

Many huge ratings sites see a clear bimodal distribution of scores when rating products(movies, cds, books, etc.) with a simple scoring scheme, such as a single 5-star scale: The nodes are at 1 and 5 stars, with the love-or-hate scores making up more than 80% of all scores.

Clearly, some of that is a function of the intrinsic cost vs. incentive structures of completing ratings and reviews. When there is a clear direct benifit to the user (such as Netflix, Slashdot, or Yahoo! Music's Launchcast) ratings tend to distribute a bit more evenly (but just a bit.)

My point is that the actual end-user application of the rating has a large effect on the nature of the scores that will be created.


I find when people give a bad rating on anything from eBay to those hotel and restaurant review sites its usually as a result of an extremely negative personal experience therefore the rating tends to be exagerated and totally not objective.

A blind system like the porposed solution to eBay bilateralism issue is probably best.

I've been a mystery shopper for a handful of restaurants and the system works quite well. In this system the purpose is to behave exactly like a regular consumer. You sit down, ask about the specials, take mental note of everything from the bathroom sink to the temperature of the food (and a hundred other details), but you don't behave in any way that would alter the staff you are a mystery shopper (for example writing things down is strickly prohibited).

Upon returning home you complete a lengthy questionnare about all the details you were supposed to notice. In this system the rewards is for a thorough assesment, regardless of its positive or negative slant. The more detail you provide the better assignments you get next time around. Even if you totally slam a place, but you provide substantial detail, you will be "promoted" to a "bar visit" or something much more rewarding than the typical lunch visit. So, I also agree with this articles suggestion to rate the raters, akin to what Amazon does but without the bias towards volume.

I would also addd that there is no middle ground on such ratings. Whether the choices range from Strongly Agree to Strongly Dissagree, or from "shortest wait" to "longest wait" there is no 'neutral' choice, nothing is "just right" as in the story about the three bears and the porridge. Even with numerical choices they are always even, not odd, so there is not chance of ranking in the middle. I like that. I think it pushes you to be more realistic. Having a middle ground leaves the undecided the opportunity to not make a decision. A "middle" rating is of no use to anyone.


Researcher Paul Resnick has been looking at eBay, too. His blog is at http://www.livejournal.com/users/presnick/ and has posts like eBay Live Trip Report:

There are a few relevant articles as well, at http://www.si.umich.edu/~presnick/#publications

adrian Chan

Good work there, and very thorough. Can we distinguish the social/transactional from the object/informational aspects of a rating situation? I'm just wondering aloud. On the one hand is the fact that publishing one's rating of a transaction will reflect on the transaction partner, on oneself, and so is social. On the other hand is the fact one can indeed rate an experience with some objectivity, assuming that the soft stuff has been bracketed out. I dont see how we can bracket out the social though, at least in a way that users will be able to participate in.
There's a thing about sincerity as a type of interaction: it's an attribute of expression that cannot be stated explicitly (say that you're being sincere and you throw your sincerity into quesion immediately). Ratings may fall into that kind of category of linguistic and metalinguistic acts..

Brian Hamlin

Wow, thorough and cogent.

Of all of the points, one missed opportunity occured to me while reading, and that is on the EBay statistical validity criticism. The phrase 'thus one of the failures for eBay is that it tries to claim meaningfulness for users with very few ratings, where there's clearly no statistical basis' struck me as off.

Human transactions are not mechanical actions - they involve situations that often matter. Hence a particular negative may really be cause for serious concern, even if there are not yet many ratings. Also, beginners are not inherantly less important than experienced traders. So though difficult, credit, or blame, is rightly awarded even on the basis of just a few transactions. Exactly how is a subject of further refinement.

I have read some of the EBay dissection papers a while back. This post is far more comprehensive, that is wide ranging, and yet not missing clarity, accuracy or nuance. Master work.


I have an interesting brainstorm for you about the netflix challenge:

cheers -

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Your comment could not be posted. Error type:
Your comment has been saved. Comments are moderated and will not appear until approved by the author. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.


Post a comment

Comments are moderated, and will not appear until the author has approved them.

Your Information

(Name and email address are required. Email address will not be displayed with the comment.)

My Photo