Gastronomic lessons from a two-armed bandit

The two-armed bandit is a nice theoretical framework for a large class of problems we encounter in everyday life. You face a strange slot machine with two (or more) arms. You can play for free by pulling a lever… you know each lever will pay you a random amount following a fixed - but unknown - distribution. You start pulling levers, but you realize that unfortunately, you only have an hour to squeeze the most money out of this machine… how should you play?
The exact answer depends on your prior on the payoff distribution of each arm, on your risk aversion, and on your time preference. Finding the optimal solution is rarely doable, however, most solutions follow the same pattern… you start by pulling both arms alternatively - to gather information about the distribution of each arm - then you start pulling mostly the arm that provides you with the best risk/reward return.
Unfortunately, your time is scarce… you have to allocate your hour between two tasks: information gathering and money making, it’s called the exploration / exploitation trade-off.
This trade-off is very common.
-
Should patients be given new experimental treatments or more mature treatments?
-
What should you study in college?
-
Should you stay with your girlfriend?
-
Should you quit your job?
All of these involve the decision to forgo a known payoff to discover a potentially greater unknown payoff.
I was recently surprised to realize I was playing this game very poorly when it comes to picking a restaurant or picking a dish on a menu. There are easily more than 10,000 restaurants in Manhattan, yet I found myself going to the same places over and over to order the same dishes over and over. I am not the only one. I observed many people displaying this behavior. The way people pick restaurants is generally reminiscent of a meta-heuristic called stochastic diffusion search. Everyone picks a few restaurants at random and from there discover new restaurants when they are invited by friends who made a different initial pick. This method doesn’t work so well since people tend to have lunch/dinner with the same other people.
Generally speaking, the dish-picking or restaurant-picking behavior of most people seems to imply a ridiculously high risk aversion (I want a guaranteed lunch experience) or time preference (the benefits of discovering a better restaurant will mostly extend in the future where I will make more informed choices).
Why is this? I think we have a strong conservative bias when it comes to food. Most food is marketed as “old-fashioned”, using “traditional recipes”. Menus from Chinese restaurants feature Pagodas, not the Shanghai skyline, and menus from pizzerias often come with Renaissance illustrations. You don’t see Intel marketing its CPUs as made in the time tested tradition of silicon wafer artisans.
One possible reason is that food conservatism used to be required for survival. Maybe the red berries are slightly tastier than the black berries; maybe they’ll kill me… I think I’ll stick with the black berries. This form of conservatism is still alive today, when people favor organic food for example. That may or may not be a rational thing to do; however, when it extends to picking a specific dish on a menu, I’m pretty sure it’s an undesirable bias.
Picking a dish is always a difficult experience for me. I am tempted by many dishes, but I always fear I will make a wrong decision and forgo the opportunity to have the most delicious dish. Of course, I could always go back to the restaurant, and I often do, but every time feels like it is the last opportunity for me to have the most likely best dish on the menu.
In the multi-armed bandit setting, it means I am always favoring exploitation over exploration. I recently decided to strongly favor exploration. I decided to pick restaurants solely based on customer ratings, not on previous experience. Should I go back to a restaurant I knew, I committed to always try a dish I didn’t try before. While I had a few disappointments, I did discover that many of my previous restaurant choices and dish choices were sub-optimal. I have experienced a lot of new restaurants, and very often have I had the feeling “this place is great, I should come back here!”, only to realize this is the kind of thinking that led me to avoid the place in the first place. So whenever I like a place, I make a commitment not to come back there for some time.