Nov 19, 2010

Universal Portfolio Selection

Back in the old days of 1991, long before there was significant research in non-stochastic learning/decision-making models, Thomas Cover, the guy who wrote the book on Information Theory, published a pretty major online learning result. Cover used the term "universal portfolio strategy" to refer to an algorithm for sequentially adjusting a portfolio with the goal of doing as well as the "best" fixed portfolio in hindsight. This is an ambitious goal, but it so happens that this is achievable with a rather simple algorithm. It's simple enough that I can sketch the approach here in a blog post.

Assume we have $n$ stocks (securities) in which we may invest, and on each trading period $t$ we receive a vector of "price relatives" $x_t$, that is the ratios of the price on the previous period to the current period. So if $x_t(i) = 2$ then this implies that the price of stock $i$ doubled on period $t$. Cover considered the notion of a "constant rebalanced portfolio" (CRP) for a distribution $w \in \Delta_n$, which is the investment strategy that ensures that the total money invested is distributed according to exactly $w$ at the beginning of each trading period. This is similar to a basic mutual fund strategy. Because stock prices change between rounds, the wealth distribution changes also and thus rebalancing (i.e. buying and selling shares) is required, hence the name CRP. Given a sequence of price relatives $x_1, \ldots, x_T$, we can compute the multiplicative return of a given CRP $w$ by
$$\text{Mult. wealth ratio of } w \quad := \quad \prod_{t=1}^T (w \cdot x_t)$$
We will generally use the log of this quantity. Of course, if we had the price relatives in advance, we could compute the best log wealth ratio in hindsight:

$$\text{Log-return of opt CRP} := \sup_{w \in \Delta_n} \sum_{t=1}^T \log(w \cdot x_t).$$

We'll use the notation $w^*$ to refer to the optimal $w$ in hindsight.

The algorithm Cover proposed is stunningly simple: invest your money uniformly in every CRP, and rebalance within each CRP accordingly. That is, for every $w \in \Delta_n$, invest an infinitesimal amount in $w$, and continue rebalance this infinitesimal amount to maintain the $w$ distribution. Actually maintaining this infinite array of portfolios is indeed computationally difficult, as it requires a complex integral (as noted by Cover). There's been much followup work on this issue, and I may try to get back to this in a later post.

Proving a performance bound for this algorithm is also pretty straightforward, and it's worth noting that it differs from the usual online learning regret bounds. To start, we'll need the notion of an $\epsilon$-optimal CRP: $w$ is $\epsilon$-optimal as long as $w$ contains a $1-\epsilon$ fraction of $w^*$. (Observe: this is not a symmetric relationship). We make two observations. First, the set of $\epsilon$-optimal CRPs contains at least a $\epsilon^{n}$ fraction of our initial wealth. This is easy to see, just notice that the set of $\epsilon$-optimal CRPs can be viewed as a $\epsilon$-sized simplex around $w^*$. Since we invested uniformly over all CRPs, our initial wealth in this set is exactly the volume of this mini-simplex. Second, on any given round the wealth ratio in every $\epsilon$-optimal CRP is at least that of $w^*$, times a discount factor of $1-\epsilon$. This follows by the definition of $\epsilon$-optimal.

With these two facts in mind, we can lower bound the total wealth earned just in this set of $\epsilon$-optimal CRPs. As we have $\epsilon^{n}$ initially invested in this set, and we lose at most $1-\epsilon$ relative to $w^*$ on each round, we have

$$\text{Wealth of alg.} \geq \epsilon^{n}(1-\epsilon)^T \text{Wealth}(w^*)$$

Of course $\epsilon$ is just a parameter of the analysis, so why not set it to $1/T$. Taking the log wealth, and noticing that $(1-T^{-1})^T \to e$ as $T \to \infty$, we have

$$\text{Log wealth of alg} \geq \text{Log-wealth}(w^*) - n \log T - o(1)$$

Now for a big caveat. While this log-T regret bound looks pretty good, it actually says that our wealth will be polynomially-worse in $T$ than the optimal portfolio, and the exponent here is $n$. Given roughly 250 trading days per year, it's doubtful that an investor is willing to gain $250^n$ less money than the best portfolio -- this feels like a pretty poor bound in practice. On the other hand, the wealth of the best CRP should grow exponentially in $T$ (assuming the stock market continues to increase in value), and hence in the long run this ought to be a good strategy. Also, folks who have run experiments using this approach have reported pretty good results. At the same time, I don't think it's made anyone rich.

This has become a long post, so I'll save a discussion of the various followup results for another day.