Since Jake has been reminding me of my duties as a good machine learning citizen which including posting on the blog occasionally, I decided to write this post about something I recently worked on. In their UAI 2009 paper, Ganchev et al from Upenn introduced the dark pools problem. Dark pools are stock exchanges created to facilitate trade of large volumes of shares without putting the large institutional trader at a disadvantage. The thing that is dark about dark pools is that the liquidities of a share available are not revealed to the market (or tape to be more precise). The Wikipedia article on this topic provides an excellent overview.

Ganchev et al formulated the problem of a large trader wanting to sell a large volume of shares by trading a fraction at each of $K$ different dark pools. However, he only gets to see how many of the orders he placed at a particular dark pool actually got executed. Thus the true liquidities get revealed only when his order is larger than the true liquidity (and hence the number of shares he can actually sell is the true liquidity). Suppose this trader wants to maximize the total number of shares he can sell over a horizon of several trading periods.

In our recent paper, we showed that we can formulate the problem as an online convex optimization problem. This was interesting due to a variety of reasons. First, the total number of shares we can trade at each round is allowed to change adversarially. Since we have a hard constraint that the net allocation cannot exceed the total available volume, this seems like a problem of adversarially changing constraint sets. Traditional approaches a la Herbster and Warmuth would make it seem that we should incur a penalty based on how often the changes happen. However, we were able to avoid this penalty, and obtain an exponentiated gradient descent style algorithm that gets minimax optimal regret. The idea was simlpe, for each unit to be traded, we maintain a distribution over all the $K$ venues. When we get a volume constraint, we compute the expected number of shares to be traded at each venue according to our distributions on all units up to the constraint level.

So what do bandits have to do with all this? Well it turns out that the strategy I described above that is based on playing expected values at each venue assumes that it is cool to trade 1.278 shares at venue 1, 2.593 shares at venue 2 and so on. Since that would seem a little impractical, we need to restrict ourselves to playing integral values only. Now let us start with the special case where we have to trade only one share at one of $K$ venues at every round, and we found out if the liquidity at that venue was non-zero or not at that round. But this is just the $K$-armed bandit problem restated, which we know how to solve using the Exp3 algorithm. Can we build on this to get a solution for bigger volumes?

We were able to extend Exp3 to the case of arbitrary volumes, by using a probabilistic rounding between the floor and ceiling of the desired allocation level. Unfortunately this approach yields a $O(T^{2/3})$ regret bound. As always, we have a computationally inefficient way of getting to $O(\sqrt{T})$ regret by using Exp4. A lot of hours of thinking from me and some other smart people has definitely convinced me that this is a hard problem, much like the other scenarios where improving $O(T^{2/3})$ to $O(\sqrt{T})$ has proved to be hard.