Galvanize Week 2: Probability & Statistics... then Multi-armed Bandit!

Most of this week's material overlapped with topics we had either been tested on in the admissions process or had completed in the pre-course assignments: basic set theory, conditional probability, Bayes theorem, distributions, A/B testing. It's a good week for those who haven't been exposed to probability & statistics or are generally wanting to solidify their understandings of these concepts.

This week's assignments gave me more opportunities to practice plotting with matplotlib. Examples are much more impactful for my learning compared to just plain reading documentation, so in my quest for good material, I found this tutorial particularly helpful (and interesting): Matplotlib tutorial: Plotting tweets mentioning Trump, Clinton & Sanders. I also had some fun exploring and sharing matplotlib's color collection (cornsilk? papayawhip?) with other students :)

We tied together a lot of the probability & statistics material when discussing Frequentist vs. Bayesian approaches. Frequentists believe that a parameter has one true value, which, given some amount of sampling, will lie within a confidence interval with some probability (typically 95%). On the other hand, Bayesians believe that a parameter is a random variable with some distribution of possible values, and that the parameter's distribution can be updated as more information is received. Bayesians believe that a parameter has some probability of existing within a certain credible interval (also typically 95%). This coin toss problem is a helpful example of how the approaches differ.

Roughly speaking,
Frequentists: parameter is fixed, confidence interval describes probability that the bounds contain the fixed parameter
Bayesians: parameter is variable, credible interval describes probability that the parameter exists within a fixed bound

A couple of things I like about Bayesian statistics:

  • Bayesian updating is your friend. You can start with zero (or maybe even wrong) knowledge, aka prior, about a system and be able to evolve your beliefs as you receive new information- Bayesian updating to the rescue.

  • The standard application of Bayes Rule highlights the effect of base rate neglect- given a 99% accurate test for infection, you might think that a person who tests positive surely is infected. However, when you have a prior knowledge that the base rate of infection is only 0.5% of the population, it turns out that a person who tests positive has a higher probability of not being infected.

Our instructor wrapped up the topic with one of his favorite XKCD comics:

XKCD frequentists<em>vs</em>bayesians


We ended the week with a new topic- hello, multi-armed bandit. If a gambler is faced with multiple slot machines (or, multiple "one-armed bandits"), with a belief that their win probabilities differ, how should he pick which slot machines to play?

Our introductory reading on the topic, 20 lines of code that will beat A/B testing every time, explains the simplest multi-armed bandit method, the epsilon-greedy method. In this method, our gambler "explores" and plays a random bandit a small percentage of the time (epsilon percent of the time, typically 10%) and "exploits" the bandit he believes has the highest win probability the rest of the time. Each time the gambler plays, he updates the bandit's win probability.

But! This means that even in the extremely lucky situation that our gambler is playing between two machines- one that always wins and one that always loses- AND identifies the machine that always wins, he's bound to lose some amount of money from playing the losing bandit half of the "exploring" time.

This brings us to regret. Regret is when a suboptimal bandit (bandit that doesn't have the highest win probability) is chosen. While the epsilon-greedy method inherently contains some level of regret, there are other multi-armed bandit strategies that aim to minimize regret. While we were shown some of these other algorithms (UCB1, Softmax, and Bayesian Bandit) to implement and compare in our paired programming exercise, it didn't seem like there was a clear outperformer amongst all scenarios. So, I'm reading more about these algorithms via these posts:

(Yes. You read that right. The same guy preaching the multi-armed bandit gospel comes back three years later with a warning to proceed with caution, do not apply these algorithms blindly.)

comments powered by Disqus