- Data compression: record-setting PAQ-based compressors are "context mixing" algorithms that combine several context/prediction models.
- Netflix Prize: the winner used a blend of many different predictors.
- IBM's Watson Jeopardy player: their DeepQA architecture considers several hypotheses simultaneously and chooses one with highest confidence score, or none if max confidence is too low.
In each case the solution is not to focus on one really good model but rather a mixture of several independent models. Maybe the reason is just that inductive learning is an under-determined problem. For any given data set there are many possible explanatory models. Occam's razor/Kolmogorov complexity tells us to assume the simplest model/program that could have generated the data. This assumption might be wrong, especially with limited data, but given limited data it's the only sane thing to assume.
For any data set the number of models is often so large that we can't just enumerate them and pick the simplest. We have to sample the model space somehow. If we have prior knowledge of good models, we should use it; otherwise we can sample randomly (e.g. random forests).
Matt Mahoney (PAQ inventor): "Modeling is provably not solvable." There is no computable procedure that finds the optimal model for any problem, and there's no way to tell if any model is the best/simplest.
Marvin Minsky: "The basic idea I promote is that you mustn't look for a magic bullet. You mustn't look for one wonderful way to solve all problems. Instead you want to look for 20 or 30 ways to solve different kinds of problems. And to build some kind of higher administrative device that figures out what kind of problem you have and what method to use."