While in Vancouver, I had an excellent chat with Paul Mineiro (he has a nice blog) who works for eHarmony, the online dating site. This follows another chat I had in August with Tom Quisel who works at okCupid. Both of these guys had really really interesting things to tell me.
For starters, I highly recommend the okTrends blog, published by some data scientists at okCupid. These folks take the treasure trove of data they have access to, and provide analysis on the aggregate. For example, they can tell you what properties of your profile photo lead to greater or fewer messages from other users. It is perhaps surprising that this doesn't present major legal troubles due to the privacy issues, although the fact that they are using "averages" mostly means that individual user information can not be inferred. (When they include photos, for example, they are ask permission.)
Online dating appears to be a growing very quickly; the Boston Globe recently reported that "22 percent of heterosexual couples surveyed met online." And while people tend to believe that Facebook and Google have a vast amount of information on their users, these dating sites know way more about you. Paul mentioned that users of eHarmony fill out a survey that typically takes 4 hours to complete, and that's even before they've invested in the site. And Tom mentioned that many users are investing hours per day on the site.
If you play with any of these sites, as I have, you'll encounter a vast number of interesting learning problems. And whereas data labeling and annotation is usually quite expensive, users of dating sites are extremely happy to answer questions and give more info, so long as it leads to better matches. (Users give away lots of information just by the profiles they browse as well as the messages they respond to or ignore.) While dating might seem like a simple problem of ranking the "most desired partners" for a given user, there is actually a challenging allocation problem: most people just want to be shown the "hotties" even though this isn't efficient. And there are learning/decision problems that I would never have even thought about. Paul mentioned that eHarmony actually uses the survey data to do price discrimination, i.e. by setting the plan prices based on the users demographics. This presents a nice online bandit problem, since the user provides only an "accept/reject" response to the offer, rather than "no but I'll pay X".
I think problems in online dating are ripe for machine learning researchers, and I hope this becomes a big application in the future. And think of the benefits to the field: when people ask you "what do you work on?", you can say "I help people find love" rather than "I teach computers to distinguish between handwritten digits."