## Oct 30, 2009

### Adaptivity in online models?

Posted by
Peter B

One of the exciting features of online learning models is that some results for the accuracy of prediction methods in a probabilistic setting (for instance, with iid data) can be viewed as consequences of regret bounds for online methods in an online setting. See, for example, N. Cesa-Bianchi, A. Conconi, and C. Gentile On the generalization ability of on-line learning algorithms IEEE Transactions on Information Theory, 50(9):2050-2057, 2004. It looks like there is a deterministic core to these prediction problems, and the iid assumption is not central. But that is not universally true. In an iid setting, we have the luxury of splitting the sample: if we want to choose an appropriate value for a regularization parameter, or perform model selection - choose an appropriate model complexity - we can split the sample and use a validation set to make that choice. These kinds of techniques often lead to adaptivity properties: the method can automatically adapt to unknown properties of the prediction problem, and perform almost as well as if it had been told those properties in advance. There are examples of analogous results in an online setting, for instance, choosing a regularization parameter via an online optimization, to give adaptivity to the modulus of convexity of the losses in online convex optimization (see this NIPS paper). Are there generic approaches in an online setting (for instance, general online model selection methods), analogous to cross-validation in an iid setting?

## Oct 20, 2009

### Regret vs. Competitive Ratio, Part I

Posted by
Jake Abernethy

There are two very common metrics for judging the quality of an algorithm:

Given an algorithmic/decision problem with some abstract complexity n, assume that an algorithm A makes a sequence decisions and is revealed, one by one, a sequence of "outcomes" or "inputs" $x_1, \ldots, x_T$. There is some cost function associated with the algorithm's decision and the outcome. Now assume that there is some benchmark algorithm B (the "optimal offline" algorithm) which receives the data $x_1, \ldots, x_T$ in advance. The competitive ratio of A with respect to B is

\[

\max_{\{x_t\}} \frac{\sum_{t=1}^T \text{cost}_t(A(x_1, \ldots, x_{t-1}),x_t)}{\text{cost}_{1:t}(B,\{x_t\})}

\]

(The competitive ratio is defined more precisely by allowing an additive constant in the numerator, which is negligible as long as B's cost tends to infinity)

The regret of A with respect to B is, instead, the difference of the two terms:

\[

\max_{\{x_t\}} \sum_{t=1}^T \text{cost}_t(A(x_1, \ldots, x_{t-1}),x_t) - \text{cost}_{1:t}(B,\{x_t\})

\]

The competitive ratio metric is primarily used in algorithms and theory communities, where it originated. This makes sense, as it dovetails very nicely with the notion of approximation ratio; in general, folks in theory tend to like multiplicative comparisons (e.g., the price of anarchy).

The notion of regret came from Game Theory, at least that's where I believe the term originated, but became popular in the Learning Theory community. It appealed to learning theorists because, as far as I can tell, it became an alternative to "estimation error" in a model where the data is assumed to be i.i.d. from a fixed distribution D. We define estimation error as

\[

\mathbb{E}_{x\sim D}\mathbb{E}_{\{x_t\}\sim D} \left[ \text{cost}(A(x_1, \ldots, x_t), x) - \text{cost}(B(D),x)\right].

\]

Here, we can think of B as the algorithm that plays optimally with full knowledge of the distribution D.

The above is very similar to regret, except that the data is i.i.d. and the benchmark knows only the distribution (and not the data $x_1, \ldots, x_t$). What was apparently surprising to many learning theorists was that it was possible to get reasonable bounds on regret even though the data may be adversarially chosen and available to the offline comparator.

I'll stop here, but this post is only Part I of a series - I'll continue with more discussion on the relative advantages and disadvantages of regret vs. competitive ratio. Stay tuned!

**Regret**and**Competitive Ratio**. Conceptually they are quite similar measures, both relying on the relative cost of the algorithm, given a worst-case input, versus that of some ideal (but usually unattainable) benchmark. Surprisingly, these two are rarely discussed in tandem, and papers generally focus on either one or the other. Let me first try to define each as generally as possible.Given an algorithmic/decision problem with some abstract complexity n, assume that an algorithm A makes a sequence decisions and is revealed, one by one, a sequence of "outcomes" or "inputs" $x_1, \ldots, x_T$. There is some cost function associated with the algorithm's decision and the outcome. Now assume that there is some benchmark algorithm B (the "optimal offline" algorithm) which receives the data $x_1, \ldots, x_T$ in advance. The competitive ratio of A with respect to B is

\[

\max_{\{x_t\}} \frac{\sum_{t=1}^T \text{cost}_t(A(x_1, \ldots, x_{t-1}),x_t)}{\text{cost}_{1:t}(B,\{x_t\})}

\]

(The competitive ratio is defined more precisely by allowing an additive constant in the numerator, which is negligible as long as B's cost tends to infinity)

The regret of A with respect to B is, instead, the difference of the two terms:

\[

\max_{\{x_t\}} \sum_{t=1}^T \text{cost}_t(A(x_1, \ldots, x_{t-1}),x_t) - \text{cost}_{1:t}(B,\{x_t\})

\]

The competitive ratio metric is primarily used in algorithms and theory communities, where it originated. This makes sense, as it dovetails very nicely with the notion of approximation ratio; in general, folks in theory tend to like multiplicative comparisons (e.g., the price of anarchy).

The notion of regret came from Game Theory, at least that's where I believe the term originated, but became popular in the Learning Theory community. It appealed to learning theorists because, as far as I can tell, it became an alternative to "estimation error" in a model where the data is assumed to be i.i.d. from a fixed distribution D. We define estimation error as

\[

\mathbb{E}_{x\sim D}\mathbb{E}_{\{x_t\}\sim D} \left[ \text{cost}(A(x_1, \ldots, x_t), x) - \text{cost}(B(D),x)\right].

\]

Here, we can think of B as the algorithm that plays optimally with full knowledge of the distribution D.

The above is very similar to regret, except that the data is i.i.d. and the benchmark knows only the distribution (and not the data $x_1, \ldots, x_t$). What was apparently surprising to many learning theorists was that it was possible to get reasonable bounds on regret even though the data may be adversarially chosen and available to the offline comparator.

I'll stop here, but this post is only Part I of a series - I'll continue with more discussion on the relative advantages and disadvantages of regret vs. competitive ratio. Stay tuned!

## Oct 6, 2009

### A HelloWorld, of sorts

Posted by
Jake Abernethy

Hello there, and welcome. Being the first post of Inherent Uncertainty, a mission statement or raison d'etre is in order. Here goes.

In the last few years, a new medium of communication has been added to the mix: blogging. Blogs have become a secondary outlet for discussion on topics that are simply not suited for the conference/journal format. This is, I think, a very good thing. The review process is useful for assessing completed research and solved problems, but does not incentivize suggesting new research directions or abstract ideas. Nor does the usual publication setting provide the infrastructure for conversation or response by readers.

Several popular blogs have cropped up in both the theory and learning community -- see Hunch.Net, The Machine Learning Forum, Computational Complexity, In Theory, and probably others I'm missing -- and now we add one more to the mix: Inherent Uncertainty. Let the conversation begin!

**What:**InherentUncertainy will tend to focus on, but not be limited to, theoretical aspects of learning, decision-making, prediction and game-playing. There will be a significant emphasis on new directions and open problems.**Who:**The blog will be primarily managed by me, but I expect not to be the central author. Whether or not this will be achieved, I hope to outsource 75% of the posts to others, and I'd like the authorship to be broad - spanning various fields as well as institutions. I welcome volunteers as well, don't hesitate to contact me.**Why:**The process of academia, and in particular the way in which research findings are disseminated, has long been dominated by a single medium: the journal publication. As many have observed, that appears to be changing, but probably no faster than in Computer Science where most research appears to be released through conferences. This has its drawbacks, but it's certainly useful in a field that is changing so quickly.In the last few years, a new medium of communication has been added to the mix: blogging. Blogs have become a secondary outlet for discussion on topics that are simply not suited for the conference/journal format. This is, I think, a very good thing. The review process is useful for assessing completed research and solved problems, but does not incentivize suggesting new research directions or abstract ideas. Nor does the usual publication setting provide the infrastructure for conversation or response by readers.

Several popular blogs have cropped up in both the theory and learning community -- see Hunch.Net, The Machine Learning Forum, Computational Complexity, In Theory, and probably others I'm missing -- and now we add one more to the mix: Inherent Uncertainty. Let the conversation begin!

Subscribe to:
Posts (Atom)