Indeed, one can define the notion of the "probability of an event" without appealing to some underlying assumption randomness. A probability can instead be considered in terms of

*gambling*, where $P(X)$ is the price one would be willing to buy OR sell some hypothetical contract that paid out exactly 1 dollar if the outcome $X$ occurs. From Wikipedia:This brings me back to my previous post on forecasting. Without an underlying assumption on the nature of the world, for example the i.i.d assumption, it becomes difficult to judge "how good is a forecaster?" In the calibration setting, on each round $t$ a forecaster guesses probability values $p_t$ and nature reveals outcomes $z_t \in \{0,1\}$. Of course, for a single pair $(p_t,z_t)$ we have no way to answer the question "was the forecaster right?" The question becomes even more remote we imagine that $z_t$ is also chosen by a potential adversary.Youmust set the price of a promise to pay $1 if there was life on Mars 1 billion years ago, and $0 if there was not, and tomorrow the answer will be revealed. You know thatyour opponentwill be able to choose either to buy such a promise from you at the price you have set, or require you to buy such a promise from your opponent, still at the same price. In other words: you set the odds, but your opponent decides which side of the bet will be yours. The price you set is the "operational subjective probability" that you assign to the proposition on which you are betting. This price has to obey the probability axioms if you are not to face certain loss, as you would if you set a price above $1 (or a negative price). By considering bets on more than one event de Finetti could justify additivity. Prices, or equivalently odds, that do not expose you to certain loss through aDutch bookare calledcoherent.

So let us now return to the notion of "calibration", which is a measure of the performance of a forecaster, from the previous post. The concept of calibration can be posed something like "the probability predictions roughly match the data frequencies". But while this might seem nice, it doesn't give us a way to judge how "good" the forecaster is, so it's somewhat hard to interpret. On the other hand, if we view this through the lens of de Finetti, using the idea of betting rates, we arrive at what I view is a much more natural interpretation. I'll now give a rough sketch of this idea.

Let's say you're a forecaster and a gambler, and on each round $t$ you predict a probability $p_t$ and also promise to buy or sell a contract, at the price of $p_t$, that pays off 1USD if the outcome $z_t = 1$. But you're worried that someone might come along and realize that your predictions have some inherent bias. (As mentioned in the last post, it has apparently been observed that when weather forecasters said a 50% chance of rain, it only rained 27% of the time!) Let's call an opponent a

*threshold bettor*if he plans to buy (or sell) your contracts whenever the price is above (or below) a fixed value $\alpha$.So if we pose the prediction problem in this way, we can say that a forecaster is calibrated if and only if she loses no money, on average and in the long run, to a threshold bettor. So the forecaster's predictions may not be good, but they are at least robust to gamblers seeking to exploit fine-grained biases.

**Older confusing explanation:**

(This idea came out of a discussion I had today with my advisor Peter Bartlett, who I'd like to thank)

I'm not sure I completely understand your notion of calibrated forecaster but your motivation and examples remind me of proper scoring rules in economics (or what Bob and I called "proper losses" in our ICML paper last year). If you look at Savage's "Elicitation of Personal Probabilities and Expectations" the derivation of proper scoring rules uses a similar trading/gambling framework.

ReplyDeleteFor a more modern take, you may want to look at Lambert, Pennock, and Shoham's "Eliciting Properties of Probability Distributions" from EC'08 (and the follow-up paper for classification in EC'09) if you haven't already seen them.

Correct me if I am wrong, but doesn't this notion of calibrated imply that a weather forecaster who simply makes predictions using only the correct prior (while ignoring the current state of the weather) can claim to be calibrated? If that is the case, then an uncalibrated forecast is clearly bad, but a calibrated forecast isn't necessarily all that good either.

ReplyDeleteExactly. In that sense, "calibrated" is a bit like "unbiased." An estimator that always guesses the mean is unbiased, but pretty useless.

ReplyDeleteRegarding the last couple of comments: it's not just being unbiased. It's being unbiased on *all* predictions. For a stationary distribution of outcomes, that's easy as mentioned, simply predict the empirical mean! But what if the outcomes are adversarially chosen? In this case calibration can still be achieved, which i think is still pretty surprising.

ReplyDeleteThis interpretation gives the Game Theoric Probability of Shafer and Vovk (http://www.probabilityandfinance.com/). For the online learning setting this gives Defense Forecasting (http://arxiv.org/abs/cs.LG/0505083)

ReplyDeleteThanks to anonymous commenter for those last two papers, they seem quite interesting!

ReplyDelete