The Bayesian Observer

Category: Machine learning

Migrating Blog

This blog is being migrated to I will no longer be posting new articles here.

Walter Schloss: Ignoring Noise

I don’t seem to have very much influence on Walter. That is one of his strengths; no one has much influence on him.

– Warren Buffet, about Walter Schloss, value investor from the Graham-Dodd school.

Lack of price discovery in the Healthcare Market

Price discovery is

… the process of determining the price of an asset in the marketplace through the interactions of buyers and sellers.

Buyers and sellers of a good or service set the price by expressing how much they are willing to buy or sell for. When a very large number of buyers and sellers interact in this way, the final price of the good or service settles to one that incorporates all this information. However, in the healthcare market, this mechanism is absent.

When you are sick and you visit a doctor, you do not get to ask up front what you will be charged for seeing the doctor! Instead, you present your health insurance card to the receptionist, thereby disclosing information about your ability to pay, and creating information assymetry. The other party (i.e. the healthcare provider) can now select a price , taking into account the information you have just disclosed, and that too, after you have made use of their services, rather than before. This is akin to getting to know the price of a haircut in a barber shop, after you have had the haircut. There is no price discovery mechanism in the healthcare industry in US. I recommend:

  1. Providers should not be allowed to see the insurance coverage a patient has. Each patient should simply be treated. Claims and payments should be handled by a third party, such as a government agency, that acts as a mediator between the hospital and the patient.
  2. Providers should be mandated to disclose a menu of charges, like a restaurant or a barber.
  3. Providers should charge the same fee, irrespective of whether a patient has insurance or not, or how much insurance coverage she has.

Ostensibly, providers check a patient’s insurance card in order to make sure that the patient can pay for the services she is about to purchase. Instead, this function should be done by a ‘lookup’ over the internet on a server run by a government agency: Input = ‘name of the medical procedure’, Output = ‘Covered or not’. This will prevent the provider from learning about the extent of coverages available to the patient. Not learning about this will force providers to more honestly assess fees, and compete with each other more realistically, bringing down costs borne by the patients. I think this will also prevent providers from passing on large unnecessary fees to insurance companies.

A formal mechanism design formulation of the problem would be interesting to tackle. 4 players: patients, providers, insurance companies, government.

Non-linearities in Life

Conventional wisdom says that the marginal gains accrued per unit increase in effort are diminishing. In some cases this does not hold, and the gains seem to be super-linear, rather than sublinear in the amount of effort.

Consider for a moment that you are driving in your car, let’s call it Car 1, down a straight long road that has a series of traffic lights. A second car, lets call it Car 2, is driving near you at a speed that is only very slightly greater than yours. The other car is inching ahead of your car very slowly. At time t, the distance by which Car 2 is ahead of you is d(t).

As you near the next traffic signal, the light turns amber and you begin to slow down. But the driver of the other car doesn’t feel the need slow down, but instead makes it through the light without stopping, while you wait at the traffic signal. The gap between the two cars has suddenly increased. If the two cars were to maintain the same small different in speeds throughout the road, slowing and stopping only at traffic lights, then at time t, the distance d(t) between them will be a non-linear function, quite different from the linear function that it would have been, had there been no traffic lights. What do you think d(t) would look like?

It’s not hard to reason out the shape of d(t). At any given time, either both cars are moving,, in which case, they have a relative velocity of v2-v1, or car 1 is stopped at a signal (relative velocity of v2), or car 2 is stopped at a signal (relative velocity of -v1). So the plot of d(t) versus t would be piece wise linear with one of 3 possible slopes in any segment. In the figure below, each segment is labelled with which car(s) is moving during that segment.

Over a long enough time t, d(t) will be significantly above the dotted line, i.e. the value of d(t) that one would expect if there were no traffic lights. Therefore, in the long term, Car 2, which started out with a small velocity advantage ends up far far ahead of Car 1, thanks to non-linearities introduced by the traffic lights.

I have often mused about the similarity between the scenario above and other aspects of life, such as career growth. Just like the traffic lights, there are a number of thresholds in real life, e.g cut-off marks for the A+ grade in exams, a threshold for being accepted at a prestigious university or a job interview, or the difference between getting and not getting an award. Two individuals that differ by only a small amount, will, over the course of a long time interval, find themselves on different sides of these thresholds enough number of times. Thus, small differences in skill level and/or determination between very similar workers get amplified, leading to drastic differences in their achievements over a lifetime. Natural processes around us, including interactions between humans, are highly non-linear, and my suspicion is that humans do not accurately perceve the extent of the non-linearities. The fact that long term benefit can be superlinear in effort is a significant realization, because it means that putting in just a little bit of extra effort — e.g. an extra hour at work daily, an extra skill, extra effort in networking or maintaining relationships — can have a disproportionately positive effect in the long term. I’ll end with a passage from Richard Hamming‘s 1986 lecture at Bell Labs titled ‘You and Your Research’ that touches upon this:

.. Now for the matter of drive. You observe that most great scientists have tremendous drive. I worked for ten years with John Tukey at Bell Labs. He had tremendous drive. One day about three or four years after I joined, I discovered that John Tukey was slightly younger than I was. John was a genius and I clearly was not. Well I went storming into Bode’s office and said, “How can anybody my age know as much as John Tukey does?” He leaned back in his chair, put his hands behind his head, grinned slightly, and said, “You would be surprised Hamming, how much you would know if you worked as hard as he did that many years.” I simply slunk out of the office!

What Bode was saying was this: “Knowledge and productivity are like compound interest.” Given two people of approximately the same ability and one person who works ten percent more than the other, the latter will more than twice outproduce the former. The more you know, the more you learn; the more you learn, the more you can do; the more you can do, the more the opportunity – it is very much like compound interest. I don’t want to give you a rate, but it is a very high rate. Given two people with exactly the same ability, the one person who manages day in and day out to get in one more hour of thinking will be tremendously more productive over a lifetime. I took Bode’s remark to heart; I spent a good deal more of my time for some years trying to work a bit harder and I found, in fact, I could get more work done.

1 hour extra per day turns out to be equal to be an extra ~1.5 months of 8 hour work days per year!

Habit Formation, Part 2

This post builds upon an earlier post on modeling habit formation. To follow this post, please read this post first. Go on, I’ll wait.

The urn model in that post is not really a complete description of habit formation because the limit distribution of the system (i.e., fraction of white balls after a very large number of draws) is completely determined by the initial configuration of the urn (i.e., number of white and black balls). In reality, habits are formed from repeated actions, and our actions are based on not just our inclinations but also the conscious choices that we make, which can sometimes be against our inclinations. Free will is missing from this model. There is no way to break a bad habit if one start out with one.

Therefore, let us alter the basic urn model slightly to capture choice: At each step, before picking a ball, we explicitly declare which color we are interested in for that pick: a white ball ( = follow the desirable habit) or a black ball ( = follow the undesirable habit). Suppose we choose to aim for a white ball. A ball is then sampled from the urn at random, and if a white ball does indeed come up, then:

  1. A payoff of $W is received (We wanted white and we got white, so we are happy to the extent of $W).
  2. The white ball is placed back into the urn, along with an extra white ball (Reinforcement).

But if we decide to pick white and a black ball comes up, then there is no payoff, and the black ball is placed back in the urn without any extra ball. However, if we chose black, a black ball is guaranteed to show up, we get a payoff of #B, where # is a different currency from $, and the black ball is placed back along with an extra black. The extra black ball makes picking a white ball harder when a white is desired.

Consider the implications of this model:

  1. Suppose there are very few white balls compared to black. A white ball would rarely show up. With the decision to pick a white, most of the time there will be no payoff and no reinforcement. But with the decision to pick a black ball, there is guaranteed payoff of #B units, but at the cost of picking a white later on harder.
  2. As the bad habit is reinforced, the chance of reverting to the good habit never dies away completely, but diminishes as the number of black balls grows in comparison to the whites. (‘It’s never too late, but it does get harder’).
  3. The purpose of the different currencies is to model the fact that the pleasure obtained from following a bad habit is qualitatively different from the one obtained from following a good one. Since most habits are based on short-term pleasure at the expense of long term benefit (e.g. cigarette smoking, binge eating /drinking, putting off exercise),  currency # may correspond to lots of short term kicks, while $ may correspond to long term well being.

Short-term thrills

I believe human behavior is fundamentally driven by a desire to be happy. However, different individuals can have very different ideas about what will make them happy. Since most bad habits result from a focus on short term happiness, at the expense of long term happiness, we would do well to make the following tweak: # and $ can be added to give a total payoff, but each unit of # currency is worth only \epsilon < 1 unit, 1 iteration after it was earned. The other currency, $, however, does not decay with time. The question is what behavior emerges when the objective is simply maximization of total payoff? It is intuitive that the answer will depend upon whether the game has a finite horizon or an infinite horizon. Since players have finite lifetimes, it is appropriate to use a finite time horizon, but since a person does not know when he/she will die, the horizon must be a life expectancy with some uncertainty (variance). In other words, nobody knows when they will die, but everybody knows they will die. The only time it makes logical sense to pick a black ball is if the player believes his remaining lifetime is so small that the expected cumulative gain from deciding to pick whites is smaller than the expected cumulative gain (including the decay effect) from picking blacks. Of course this depends upon how he spent his past — i.e. whether or not he built up a high fraction of whites.

Breaking a bad habit

Suppose a person starts out with a predominant inclination towards an undesirable habit and wishes to switch to the good habit. It is clear that if she always chooses the white ball, then she will eventually succeed in creating a larger number of white balls and a good $ balance. But she will have to go many tries and no payoff before that happens. In practice, this requires determination, and she may be tempted to pick the black ball because it is so easily available.

Suppose D \in [0,1] is the fraction of times that (s)he will decide to aim for a white ball, i.e. the player’s determination. It is intuitive that a larger value  D would help her switch to the good habit in fewer iterations. It would be interesting to see if there is a threshold value of D below which the bad habit is never broken. In particular, I expect there to be a threshold value D(w,N-w), below which the bad hait is never broken, w.p. 1, where w is the number of whites in the urn and N-w is the number of blacks.

Game over when wealth < 1 unit

Now let us further suppose, that in the course of playing the game, a player can never have less than 1 currency units. In other words, the game stops if the total currency units with a player reaches zero. All other rules are the same: 1 unit earned from a white ball, 1 unit from a black ball but units earned from black balls decay to \epsilon of their value every time slot. Deciding to pick a black guarantees a black, and an extra black goes in. Deciding to pick a white, provides a white with a probability proportional to the fraction of whites in the urn, and an extra white goes in. With the new rule stipulating ‘game over’ when the player’s wealth falls below 1, the player is incentivized to pick a black if she ever approaches 1 unit of wealth. This is meant to model the inclination of individuals with low self worth and happiness levels to engage in activity that would yield short term happiness (alcohol, drugs, binge eating, etc. ) at the expense of long term prospects.

Document as a Database

The problem with conventional text documents is that you cannot easily query them. Suppose I have a 20 page piece of text explaining various things, perhaps with mathematical equations, and plots, etc. In order to figure out something, I need to patiently read through the document till I have my answer. When time is short, and usually time is short, I will skip some parts in a hurry to get to what I need. The skipped parts might contain intermediate facts or definitions, making it impossible for me to understand what I am looking for even when I do find it.

This situation is because information is being presented in the document in an order decided by the author. Perhaps this is the best order. But for a specific query (e.g. what is the main result of this paper? ), it would be nice to have a way to do better than having to patiently read the whole document. The best we have right now is Ctrl+F on words and phrases, which just doesn’t cut it for queries of reasonable complexity (e.g. ‘Why did the GDP of Germany double so quickly?’ while reading an article on the history of the German economy).

The computer doesn’t understand the document :-(

It would be pretty useful to somehow model documents as databases that can support complex queries. Especially when the documents are large, like books. Perhaps I am asking for the holy grail of NLP? Surely, if IBM can build a Jeopardy-winning Watson, it should be possible to build a question-answering service for individual documents? Outside information is permitted.

The Effect of Interruptions

Tasks that require a high cognitive load, such as thinking about a problem, reading a research paper or writing C++ code,  are very sensitive to interruptions. I find that when I am interrupted during such activity, say by a phone call, or by a co-worker asking me something, the net cost of the interruption is not just the amount of time I spend attending to the call or person. It’s as if I have a short term RAM in my head, and I have to re-load the entire context of what I was doing when I get back to it. And this can take a lot of time. What I like are large chunks of uninterrupted time, not several small chunks, even if they add up to the big chunk.

However, shutting oneself off from all interactions with others at one’s workplace is not the solution, because one runs the risk of not interacting enough with colleagues. Richard Hamming said about Bell Labs:

I notice that if you have the door to your office closed, you get more work done today and tomorrow, and you are more productive than most. But 10 years later somehow you don’t know quite know what problems are worth working on; all the hard work you do is sort of tangential in importance.

I have personally found that conversations over coffee and lunch have led to much more interesting directions in my work than solitary thinking. While Hamming was speaking about a research lab environment, I suspect his statement holds true in other creative domains as well.

The apparently contradictory requirements above can be resolved by planning out both solitary time and interaction time during the work week. Both are necessary. I feel creative work requires a combination of two distinct phases. One that allows focused concentration on a task at hand, and another for free discussions, exchange of ideas, and making oneself available to others as a sounding board. Exploit and Explore. An ideal workspace should provide for both. It is said of the original Bell Labs building, that its designer deliberately created long corridors, so that researchers would be forced to encounter one another. Steven Johnson explains in his book ‘Where Good Ideas Come From’ that coffee houses played a crucial role, because they were breeding grounds for collective thinking.

During my travels on the Internets, I have come across a number of writings on the impact of interruptions on productivity that I can attest to from my own experience:

Paul Graham writes about meetings:

I find one meeting can sometimes affect a whole day. A meeting commonly blows at least half a day, by breaking up a morning or afternoon. But in addition there’s sometimes a cascading effect. If I know the afternoon is going to be broken up, I’m slightly less likely to start something ambitious in the morning. I know this may sound oversensitive, but if you’re a maker, think of your own case. Don’t your spirits rise at the thought of having an entire day free to work, with no appointments at all? Well, that means your spirits are correspondingly depressed when you don’t. And ambitious projects are by definition close to the limits of your capacity. A small decrease in morale is enough to kill them off.

Joel Spolsky writes about context switching between different projects.

Some interruptions are self created. Compulsive checking of email for instance. Or deciding to multitask, or trying to handle too many disconnected projects. I find I can do one major thing per day, if I try to do 2 or more major things, I risk accomplishing none. Having smartphones set to beep when an email arrives doesn’t help. I switched my phone to not auto check email, long ago.

Habit Formation

An urn contain balls of two colors, white and black. A ball is drawn at random, and then put back in the urn along with a new ball of the same color. This process is repeated  many times. What is the ratio of white balls to black balls in the urn after n tries? What is the ratio as n \rightarrow \infty? The answer depends upon the initial number of white and black balls of course.

This is the Polya Urn problem, and I find it to be an interesting way to think about the process of habit building. Imagine that the two types of balls represent two opposing choices that we can make about a certain aspect or activity in our lives, e.g. waking up early or not waking up early, doing regular exercise or not, smoking a cigarette after dinner, or not, completing homework on time every evening vs. playing video games during that time, etc.

# Polya urn simulation in Python
import random, pylab as plt

nruns = niter = 200
whitenum = blacknum = 1
for i in range(0,nruns):
  frac = []
  nwhite = whitenum
  nblack = blacknum
  for j in range(0,niter):
    if random.random() < nwhite/float(nwhite+nblack):
       nwhite = nwhite + 1
       nblack = nblack + 1

Let us say that picking a white ball corresponds to the desirable activity in these examples, and picking a black ball corresponds to the undesirable one. The fraction of white balls in the urn at any given time represents our proclivity to pick the desirable activity at that time.

This model has some interesting properties. First, if the number of balls of each type to begin with are equal, then the fraction of white balls at any step, and in particular at n \rightarrow \infty is a random variable that behaves as follows:

Above: Fraction of whites, starting with 1 black and 1 white ball

Above: Histogram of the fraction of white balls after 20k iterations

Note that extreme outcomes are possible as n \rightarrow \infty if appropriate choices are enforced early on. Also the final value is attained after only a few iterations. Early iterations provide the opportunity for large swings in the final outcome, but there are hardly any swings after crossing about 50 iterations. Similarly, the that longer a habit has had to cement itself, that harder it is to change it.

But the model above converges to an almost uniform distribution on the fraction of white balls after a large number of tries. Perhaps starting with 1 white and 1 black isn’t quite right, because we are do not necessarily begin life with equal tendencies for picking each side of a binary choice. If we set the number of white balls to be greater than the number of black balls at the start, then the distribution of the fraction of white balls after many tries is, as expected, skewed in favor of white balls. Again, the fraction of white balls usually settles down to a fairly stable value that is typically  determined largely by the actions in the first few iterations.

Above: Fraction of whites, starting with 1 black and 15 white balls

Distribution of fraction of whites after 20k iterations starting with 5 whites, 1 black

If the initial number of balls is allowed to be fractional, then the limit distribution has most of its mass near 0 and 1. The plot below shows 500 iterations starting with 0.2 white and 0.2 black balls. Perhaps this better models habit formation in some people.

Fraction of white balls, starting with 0.2 whites and 0.2 blacks.
Distribution of fraction of whites, after 20k iterations, starting with 0.2 whites & blacks.

Habit building is not the only thing this model can describe. There are a number of phenomena in which reinforcement plays a role:

  1. Popularity of a brand: Among competing brands of similar quality, one that is slightly more popular may get picked more by customers, making it even more popular (“So many people choose this brand, so it must be good”). I’m sure marketing majors know this well.
  2. Rich get richer: It is easier to make $X when you have $100, compared to if you have only $10 to begin with.
  3. Short term market fluctuation in the price of a security on the stock exchange can sometimes have this property. A large number of buy orders will signal to the market a positive outlook on the security, resulting in even more buy orders. Ditto for sell orders. Until the market stabilizes. Similar to (1) above.
  4. Markets with network effects have a rather obvious reinforcement property: for e.g., the greater the number of users that facebook has, the greater the number of new users it is likely to attract, compared to, say, a competing network like Google+, because the marginal utility for a new user is greater when she join the larger network.
  5. Tragedy of the commons [PDF]: Defined by Wikipedia as: ‘The depletion of a shared resource by individuals, acting independently and rationally according to each one’s self-interest, despite their understanding that depleting the common resource is contrary to their long-term best interests’. Example: A grassy hill is shared by several farmer to graze their cows. If one farmer overgrazes, others have an incentive to quickly overgraze too, and eventually the utility of the hill is destroyed for all farmers.
  6. Hiring in a new company/group: If the organization has a good number of great/well-known people, it is easier to attract other good employees. Similar to (1).
  7. Thinking: Sometimes people make a choice about thinking about something in a certain way (e.g. so-and-so is a mean person), and this colors their interpretation of future observations, in effect reinforcing their prior notion.


1. In the form above, the urn model displays positive feedback. How can we change the urn model to display negative instead of positive feedback? Simple: put in an extra black ball if a white ball is picked and vice versa. This forces the fraction of white and black balls to remain close to 1/2.

2. Learning by rewards: There has been work in the psychology community that studies reinforcement learning in humans and animals, that is, given a set of choices and associated rewards, how does an animal learn which choice to pick? The Law of Effect is the hypothesis that the probability of picking a choice is proportional to the total reward accumulated from picking that choice in the past. Given a reward scheme, this translates to an Urn model with the colors representing rewards from the various possible actions. For a nice survey of reinforcement models, see: Robin Permantle, A survey of random processes with reinforcement, Probability Surveys, Vol. 4 (2007) .

3. The model above is the simplest urn model. There are several variations of this model, that have proved useful in modeling and learning tasks, other than reinforcement. E.g. using more than 2 colors, or using multiple urns, or allowing for the introduction of new colors. There are a number of interesting distributions for which these Pola Urn variations act as generating processes (Dirichlet-multinomial, Dirichlet Process, Chinese Restaurant Process), but that’s another post.

A paper is not scholarship

A word is not the same as the thing it represents, only a representation of it, meant to convey information about it. For e.g. the word ‘Castle’ is not the same as a Castle. This is obvious. Similarly, a map of a place is not the same as the place itself, it is only a map of it. This is also obvious.

An academic paper published at a conference or journal is not scholarship! It is a presentation of the scholarship. The scholarship itself is the research work. Somehow, this is not so obvious to a lot of people.

Comparing Cybersecurity and Financial Trading

  1. Both fields benefit from rigorous quantitative modeling for tasks such as detecting patterns and anomalies.
  2. Both financial trading and security operate in an adversarial environment. In finance, the adversary is the ‘market’, i.e. the rest of the participants. In security, the adversary is the set of attackers.
  3. Because of the adversarial environments, simple models do not work for very long, because the adversary evolves in response to actions taken by the other party, and it is hard for a model to capture the direction in which the adversary will evolve. Therefore both fields require that models be updated. Models are built using training data . The challenge therefore is to use the right mix of recent data and longer term historical data. How to do this is not obvious in both fields..
  4. Game theory is a useful tool for modeling problems in both fields because of the adversarial nature of the field.
  5. Modern financial instruments, apart from serving as investment vehicles, are designed for mitigating risk. The business of selling information security products or services is also one of mitigating risk — risk of information loss. There is no such thing as perfect security. Instead, organizations allocate an amount of money they consider appropriate to meet their risk appetite.
  6. One difference is that trading in financial markets often has the ‘self fulfilling prophesy’ property, i.e. it can act as a positive feedback system (if a certain fraction of people with opinion that the price of a security is likely to fall, sell their holdings, the rest start thinking its probably a good idea to sell, and the end result is that the price does fall), whereas there doesn’t seem to be an equivalent in cybersecurity.