May 30, 2011

Calibration and Game Playing

I've been at the Technion for the past month visiting Shie Mannor. We've been discussing the notion of calibrate forecasting, which I have written about in a previous post. To briefly recall, we imagine a forecaster making a series of probability forecasts $p_1, p_2, p_3, \ldots \in \Delta_n$ from the set of distributions on, say, $n$ outcomes. After predicting $p_t$ the forecaster then observes $y_t \in [n]$, say the true outcome of the weather. The long term goal of the forecaster is to be "correct" in a kind of average sense. That is, if we consider all the times the forecaster says (roughly) 20% chance of rain, we should expect the rain to occur roughly 2 times out of 10 on these particular days. This is essentially what it means to be "calibrated" in an asymptotic sense. For more details see the older post.

There is much to dislike about this notion of "goodness" of a forecaster. As I said previously:
Despite the natural way to pose the concept of calibration, it is actually quite a weak notion of "goodness" of a forecaster. Let's take the weather example again, and imagine a weather forecaster who, despite the available data and the meteorological conditions, always predicts the base probability of rain. If the forecaster happens to know in advance that it rains about 13.2% of the time, then to remain calibrated she might as well predict this value every day. Hence, a well-calibrated but entirely useless forecaster.
Since being here in Israel I've had a change of heart -- Shie taught me a trick about calibration that rekindled my interest. He says it's "well-known" but it is new to me and so I'd like to share it.

The trick is this: you can use calibration as an optimal way to play repeated zero-sum games. Let's imagine we have a zero sum game with payoff matrix $M \in \mathbb{R}^{n \times m}$, where on each round $t$ player 1 choses a distribution $x_t \in \Delta_n$ over his actions, and player 2 then chooses an action $y_t \in \Delta_m$ (we can actually assume p2 plays deterministically). If the game is played repeatedly then the average payoff after $T$ rounds is $\frac 1 T \sum_{t=1}^T x_t M y_t$.

So let's assume player 1 has a calibrated forecasting algorithm for "guessing" p2's strategy, so let $y^?_t$ be the (randomized) guess of p2's action $y_t$. Then player 1 can "best respond" to this guess, that is by letting $x_t := \arg\max_{x \in \Delta_n} x M y^?_t$. Then under the calibrated forecasting assumption, this strategy is "optimal" in the limit, in that it earns at least the value of this zero-sum game. Why? While p1 might be guessing $y_t$ poorly, we know that because he is calibrated he's right "on average". So while he might be best responding to a bad forecast on each round, he is providing the best response "on average".

They key piece here, of course, is that we can achieve calibrated forecasting even when the outcome sequence is chosen by an adversary. For more details, see this paper by Sham Kakade and Dean Foster.