Wednesday, September 4, 2013

Peer pressure brings wine scores toward the middle

I had the pleasure of judging wine last week at Mundus Vini, Germany's leading wine competition. My panel tasted 150 wines in three days and gave zero (0) gold medals.

As is often the case with wine competition results, one of the culprits was human behavior.

People occasionally complain that the so-called 100-point scale for wine critics is actually about a 15-point scale for publications, as the Wine Advocate and Wine Spectator rarely publish ratings below 85.

In wine competitions, the scale is a little wider, but it's still more limited than you'd expect from looking at the scoresheet. Scores below 70 are rare, but so are scores above 95, which is why Grand Gold medals rarely happen in Europe without statistical help.

I always try to follow the written rules of wine competitions: to score each wine on each attribute where it belongs. But there are unwritten rules also, and the chart below will show you that it didn't take me very long to learn and follow them -- even though I don't necessarily agree with them.

This chart is given to each taster every morning, showing your results from the day before. The dark line is my scores; the lighter line is the panel's scores.

The implication is that if you're far from the rest of the panel, you're wrong.

There is some logic to this. The point of having multiple tasters is to get a consensus. One taster might like a funky, unusual wine, but if others hate it, it won't win a medal, and that's probably best for consumers.

What struck me, in getting the quick feedback, was how much I was influenced by my peers in keeping my scores in a narrow range. I like to think while I listen to reason, I stick to my convictions if I continue to believe I'm right. Clearly, I do not.

As you can see, I hated the third wine, a German Pinot Gris. It was buttery, low-acid, unpleasant. I gave it 67. But I didn't pull this score out of thin air. I marked the individual attributes on the scoresheet at the top.

If you look closely, you'll see that a wine in the middle of every category -- average in every way -- should get a 74. If you decide that it's below average in a couple of categories, it could slip below 70. Probably should slip below 70.

Think of the famous example of a classroom full of Nobel-winning scientists, graded on the curve: Some of them will get C's and D's. But we're not talking about tête de cuvée Champagne here. On this day, on the chart above, we had German Pinot Gris and Portuguese reds from Setúbal. Some of them were pretty good. But shouldn't some of them be below average?

What happened was, I gave that wine a 67. And two other panelists, both from Germany, complained. One told me that the wine was unflawed, and that only a flawed wine should get a score that low. That's not what the scoresheet says.

But I liked Mundus Vini, and I didn't want to get disinvited for my scores. So look how quickly I went to the norm. I went below 70 again two wines later, and once again got flak for it. I didn't drop below 70 the rest of the day.

Looked at another way, if 74 is average, I graded none of the final 40 wines as below average. My classroom was full of Nobel Prize winners.

How does this affect the absence of gold medals? Our avoidance of extreme scores worked in both directions. Nobody wanted to use the second-from-right column -- below average -- because the scores would be too low. But we also didn't want to use the far left column -- excellent -- because the scores would be too high. And if you didn't go into the far left column, you couldn't score a wine above 87 -- a silver medal, also known as kissing your sister.

In fact, in three days, six of us judged 150 wines: 900 judging opportunities. And only one person (me) one time gave one wine 95 points, just enough for a Grand Gold. The jury chairman remarked on it. But just as when my scores were too low for the norm, I had to defend my very high score, which was also slightly uncomfortable.

The second day, my graph was much closer to the group as a whole. In my self-image, I am a proud independent American, not afraid to speak my mind and stand up for my beliefs. In fact, I'm demonstrably as malleable as everybody else. At least my belief that I'm willing to take my lumps in public when necessary is accurate.

Follow me on Twitter: @wblakegray and like The Gray Report on Facebook.


Jack Everitt said...

Perhaps if the 3-15 pts for COLOUR and CLARITY (both of which is of zero importance to, well, me) and replaced this with, oh, to be wild, PLEASURE. Or is that too subjective a thing for Wine Judging?

Robert Cartwright said...

"But I liked Mundus Vini, and I didn't want to get disinvited for my scores. So look how quickly I went to the norm. I went below 70 again two wines later, and once again got flak for it. I didn't drop below 70 the rest of the day".

It sounds like they brow-beat you to tow the line. I thought this was a wine-judging competition, not Lord of the Flies.

W. Blake Gray said...

Jack: I think "pleasure" goes into "Global judgment" at the end, though it also heavily influences my judgment of flavor and aroma quality, which are the most important categories.

Every wine I judged got full points for clarity. Colour was a little more subjective. I didn't mark any wines further to the right than the second measure. But colour does make a bit of a difference, although admittedly less in the era of Mega Purple.

Robert: To be fair, I let myself be affected. I could have kept doing what I started doing.

Andrew Walter said...

Interesting post Blake. Just to be fair about the 15 point scale scale of the WA/WS, I checked the Wine Spectators online database. 63K wines 90 or above; 83K wines < 85 (including 23K 79 or below). My rough calculation suggests a bell shaped distribution, which is more or less what you'd expect from blind, non-biased tasting (which, as you note in this post, is very difficult to do)

W. Blake Gray said...

Andrew: Wow, that's surprising to me, thanks for the info.

Good for Spectator. Say what you want about their palate preferences (and I do), they have not gone in for the extreme grade inflation of the Advocate.

Andrew Walter said...

Upon further relfection, WS is really doing a 50 pt system but scoring a "perfect" wine 100 just sounds better than scoring it a 50!

Alex Conison said...


If you search by year for reviews under 80 points on Wine Spectator a different pattern suggests itself.

In 1994, WS gave 1236 wines less than 80 points and in 2002, 1308 wines.

But for the last decade, there has been a steady downward trend:

2007: 894
2008: 997
2009: 507
2010: 417
2011: 207
2012: 89
2013: 74 (through October issue)

Bob Henry (Los Angeles wine industry professional) said...


A clarifying question: Was there "table talk" between judges DURING the actual tasting, or only AFTER your flight rankings/ratings had been submitted to the event administrators?

Conversation while judges are in the middle of critiquing a flight of wines should be absolutely verboten. Likewise conspicuous "body language": grimaces, frowns, the dismissive "clearing" of one's throat, et cetera.

See my next comment about judging a competition -- and crossing swords with fellow judges over rankings wines.

~~ Bob

Bob Henry (Los Angeles wine industry professional) said...


I have judged just one California county fair. (Not that I am uninterested. Simply not invited. I must "run" in the wrong wine circles . . .)

At my table was a friendly acquaintance (a UCLA English classics professor-cum-wine enthusiast), and two winemakers from large commercial wineries located in the multi-county appellation being judged.

The poured glasses of wine were presented on a tray. Each wine "randomly" number coded. Each flight a specific type of wine.

We tasted in silence, recorded our "rankings" in silence, and at the end of the session submitted our individual written and signed rankings sheets to the administrators.

Only then did we discuss at our table the wines to assign medals rankings (Gold, Silver, Bronze, or no medal).

During the session of California Syrahs and Rhone varietal blends, the two commercial winemaker judges at my table insisted on disqualifying any wine that exhibited even the hint of "brett."

But Blake as you pointed out in your 2011 Los Angeles Times article on Mouvedre [], that grape may intrinsically have a "funky" character. So a wine with a gamy/barnyard-y smell and flavor may be "true to type." And that's not a "defect."

I asked the commercial winemakers is they had much drinking experience with red Rhones from the home country.

"No," was their reply.

Asked if they had they ever traveled and tasted through the Rhone Valley.

Once again, "no" was their reply.

So the UCLA English classics professor and I "agreed to disagree" with our counterparts across the table, and voted -- in good conscience -- for the "hint of brett" wines. (All two of them.)

As it turned out, other judges in the room likewise found favor with the wines, and they were awarded Gold medals at the end of the day.

At the conclusion of the tasting, a master list of all wines by number code was presented to each judge.

And after the tasting was over, as I drove down the California coast on my return to Los Angeles, I made a detour and visited the winery's tasting room.

I offered them an "off the record" report of their Gold Medal award -- and bought half a case of the wine for my own personal consumption.

Yes, I voted for the wine at the event, and voted with my wallet in the marketplace.

~~ Bob

Postscript. I haven't been invited back to judge any subsequent annual competitions. (I leave it to others to infer nefarious motives by the event organizers and/or the other judges.)