A two-player zero-sum game is defined simply by a (real-valued) utility function on a joint action space, say U(x,y). That is, when Player 1 (aka The Player) plays action $x$, and Player 2 (aka The Adversary) plays action $y$, the payoff to Player 1 is $U(x,y)$ and the payoff to Player 2 is $-U(x,y)$. I'll emphasize that the pair $(x,y)$ might lead to a complex outcome, the "goodness" of that outcome is just one real value, namely $U(x,y)$.

For simplicity, assume that each player has a finite number of actions available, $n$ and $m$, and that $x$ and $y$ are chosen from the $n$-simplex and the $m$-simplex, respectively. Further, assume $U(x,y)$ to be linear in both arguments. Thus, we imagine that each player may choose a randomized action, and Player 1's utility is the expected utility according to both players' random draws. Under these assumptions, Von Neumann's MInimax Theorem tells us:

\[\min_y \max_x U(x,y) = \max_x \min_y U(x,y) \]

In other words, the optimal utility in this game is independent of which player selects first. Indeed, either player can choose an optimal strategy and reveal this to his/her opponent.

Blackwell considered an alternative question: what if the utility function is vector valued? Or put another way, what if the payoff is disbursed through several currencies which are not exchangeable -- such as, for example, dollars and happiness? While an economist might say that everyone has a utility on any joint resource allocation, in the problem at hand we may not yet know the relative value of each resource, and thus we should not automatically reduce an outcome to a single parameter.

For vector-valued games $\mathbf{U}(x,y)$, Blackwell reformulated the question "what maximum payoff can I achieve in the worst case?", which only makes sense in scalar-games, to the question "what constraints can I hope to satisfy on my average payoff vector if I play the game arbitrarily many times?". That is, can I be certain the average payoff will lie in a particular convex set $K$? The latter point, where the game may need to be repeated indefinitely, is actually a crucial issue: whereas in scalar-games the Player has an oblivious strategy which guarantees at least the minimax expected payoff against any opponent, in vector-valued games the Player can do better by using an adaptive strategy. In Blackwell's theorem, success isn't only attained if you can explicitly guarantee the payoff vectors lie in $K$ -- it's good enough if you have an adaptive strategy ensuring that the average payoff vector gets arbitrarily close to $K$. Blackwell defined the concept of "approachability" of a set $K$, namely whether the Player has a strategy for choosing a sequence of $x_t$'s such that, for any sequence of $y_t$'s,

\[ dist\left(K, \frac{1}{T} \sum_{t=1...T} \mathbf{U}(x,y) \right) \to 0 \quad \text{ as } {T \to \infty}\]

where distance to a set is defined in the obvious way.

So what is the content of Blackwell's Approachability Theorem? Essentially, Blackwell shows that you can reduce the question of approachability in vector-valued games to questions about scalar-valued games. Notice that, given any vector $\mathbf{v}$, the vector-valued utility function $\mathbf{U}(x,y)$ induces a scalar-valued game $U(x,y) := \mathbf{v} \cdot \mathbf{U}(x,y)$, and we can easily ask what the minimax value of this game. Further, given some half-space $H = \{\mathbf{z} : \mathbf{z} \cdot \mathbf{v} \geq c\}$, parameterized by the vector $v$ and constant $c$, it is not difficult to see that "H is approachable" is equivalent to "the scalar game $U(x,y) := \mathbf{v} \cdot \mathbf{U}(x,y)$ has minimax value at least as big as $c$". This leads us to

**Blackwell's Approachability Theorem:**A convex set $K$ is approachable if and only if, for every half-space $H$ containing $K$, $H$ is approachable

The "only if" is trivial, but the "if" direction is actually surprising. And Blackwell's proof is notable in that it provides a concise construction for the Player's adaptive strategy.

(For a much more complete exposition of Blackwell's Theorem, see Chapter 7.7 of the excellent book by Nicolo Cesa-Bianchi and Gabor Lugosi, who provide many more details. You can also read Blackwell's paper from 1956 which isn't written in quite such a modern way, but is very readable nonetheless.)

The theorem is intriguing for it's simplicity, but also for it's utility. Indeed, it has been used to prove a range of results: the existence of "Hannan-consisten" forecasters (see aforementioned book), a new proof of "calibration", and the construction of no-internal-regret algorithms.

## No comments:

## Post a Comment