The Bayesian Observer

Defining Intelligence

Intelligence is a slippery thing to define. The following definition recently struck me:

That which allows one to maximize happiness over long term.

I like this definition because it is short (c.f. MDL, Occam’s Razor), it makes logical sense, and it carries a lot of meaning without going into details of how to be intelligent. It is logical to me because of the following argument: Suppose a person is allowed to life two versions of his life starting from some fixed point in his life. All events and circumstances in the two versions are the same except for actions taken by the person. Then he can be said to be more intelligent in that version of his life in which he achieves greater happiness over the rest of his life.

Intelligence is needed in order to understand what actions will make us happy, for how long, and whether there will be any effects of those actions on our future happiness. Making decisions to maximize cumulative happiness is certainly a non-trivial task. Sometimes one must put oneself through short-term adversity (e.g. graduate school at little on no stipend, or an athlete undergoing gruelling training for a race) to be able to do well later. Sometimes, one decides to undertake an action that provides short term happiness, but at the cost of long term happiness. It takes intelligence to learn to avoid such behaviour n the future.

Modern definitions of intelligence from the scientific and psychology community are incredibly long-winded [Wikipedia]

A very general mental capability that, among other things, involves the ability to reason, plan, solve problems, think abstractly, comprehend complex ideas, learn quickly and learn from experience. It is not merely book learning, a narrow academic skill, or test-taking smarts. Rather, it reflects a broader and deeper capability for comprehending our surroundings—”catching on,” “making sense” of things, or “figuring out” what to do.

and:

Individuals differ from one another in their ability to understand complex ideas, to adapt effectively to the environment, to learn from experience, to engage in various forms of reasoning, to overcome obstacles by taking thought. Although these individual differences can be substantial, they are never entirely consistent: a given person’s intellectual performance will vary on different occasions, in different domains, as judged by different criteria. Concepts of “intelligence” are attempts to clarify and organize this complex set of phenomena. Although considerable clarity has been achieved in some areas, no such conceptualization has yet answered all the important questions, and none commands universal assent. Indeed, when two dozen prominent theorists were recently asked to define intelligence, they gave two dozen, somewhat different, definitions.

The same Wikipedia page also lists various different definitions given by researchers. The long-windedness of these definitions is somewhat excusable as an attempt to be all-inclusive and general. But in the end, the notion of intelligence is a man-made model, invented to try and explain phenomena. I think a focus on happiness as the central phenomenon to be explained goes a long way in simplifying our understanding of intelligence.

Average age of first-time Mothers

The average age of first time mothers in the developed countries of the world has been rising for the last ~40 years.

Here is another plot that shows the rate of occurrence of Down Syndrome, a chromosomal defect, as a function of the age of the mother at the time of child birth.

The curve really starts to shoot up at 30. In the UK, the average age of a first time mother is 30 years. It is well known that the fertility rate in women decreases after the age of 30 and drops rapidly after 35. Older mothers are likely to find it harder to have a baby and if they do, then they run a higher risk of chromosomal defects. Given the possibilities of all these negative consequences, the increase in the average age is a bit disturbing. It seems like there is a hidden cost to more women working and for longer.

Why is it that women are waiting longer before having their first born despite the risks? Most of my hypotheses (of which more than one, or none, may be true) have to do with women working:

  • Having invested significantly into an education, greater number of women are entering the workforce, with the desire to be financially independent.
  • There is greater financial pressure in families for women to work.
  • The policies of workplaces in these countries are not favourable to childbirth. I can see this is true in the US, but I doubt this holds for Wester European countries, which I know have policies favorable to child birth.

One source of further information is the following map, showing the absolute increase, in years, in the age of a first time mother, over the last 40 years, state by state in the US:

This number is highest in the North Eastern states of NY, NJ, MA, CT, etc. The intensity of the colors in the map above correlates well with population density, and with economic activity in general (meaning more working women).  Here are two more plots I came across in a US based study done by P&G, that suggest that at least in the US, employer policies may be responsible.

What would a mother value most from their employers?:

How much guilt to mothers feel about work-life balance?:

Perceived Corruption and per capita GDP correlate inversely

Perceived level of corruption in the public sector:

Screen shot 2012-12-05 at 7.58.28 PM

GDP per capita:

GDP_nominal_per_capita_world_map_IMF_2009

Also, less surprisingly, GDP density, and the night lights as seen by a low Earth orbit NASA satellite are very well correlated:

sachs

dnb_land_ocean_ice.2012.3600x1800

What is the biggest problem in the world?

I have been posing this question to friends and acquaintances (and to myself) in one form or another for a while now. The answers I have received have varied significantly. I am not the first to pose this question of course. Here is one of several online polls, posing the same question, with 700+ responses so far. Here are some others. Some of the responses I have received personally and gathered from various online postings like the ones above, in no particular order include:

Environmental change Poverty, Hunger, clean drinking water
The P vs NP problem Ego
War Communication between people
Lack of tolerance Jobs, economy
Ignorance Fear
Greed Lack of genuine love, hatred
Religion Racism
Moral decline Energy shortage
Sin Drugs
Terrorism Apathy, lack of empathy
Anger Pollution
Love for Money Politics
Forgetfulness of God Overpopulation, limited world resources
Toxic waste Consumerism
Death Selfishness
HIV All –isms: Nationalism, sexism, racism..
Cancer Envy

One of the problems with the way the question is posed above is that it does not specify what ‘problem’ and ‘biggest’ mean.

Define problem. We will define problem as ‘That which brings suffering to humans’.

Define biggest.  Biggest could mean ‘one that affects the largest number of people’, ‘the scientific problem that would create the biggest impact if solved’, or ‘one with the greatest economic impact’, etc. I am interested in a specific version of this question, in which ‘biggest’ means ‘most fundamental’, i.e. one which can be said to be a root cause of many other problems.

Causal structure. A natural question to pose in order to move in the direction of getting an answer to my version of ‘biggest problem’ is: how many degrees of freedom are really present in the above responses (and what are they)? That is, are they all independent problems, or do they stem from a relatively small set (1-2) of root causes (with others being effects)? For example, lack of tolerance and energy shortage can be said to be causes of war.  It is also clear that not all the problems listed above are at the same level of generality — some seem intuitively more abstract or fundamental than others. For e.g., war seems more in the realm of effects or symptoms, compared to say anger, fear or greed. In other words, even though they are all problems, some of the items in the list above are really effects rather than causes, and I am interested in the causes. To restate the question properly:

What is the true causal structure of the world problems?

Here is a small toy example of what I mean by causal structure:

An arrow from A to B indicates ‘A causes B’. In the above example, energy shortage is stated to be a cause for war, and lack of tolerance is also stated as a cause for war. Also, once energy shortage is taken into account as a cause for war, then war is not caused by overpopulation or consumerism. In other words, overpopulation and consumerism do lead to war, but only through energy shortage.

One correct answer. What strikes me most about the restated question above is that there must exist a definite answer to it. That is, there is an objective reality associated with the question. The causal structure is not a matter of subjective opinion. There is one true structure of cause and effect. I am not claiming the number of independent root causes at the very top of the causal structure is 1 (perhaps this is the case). All I am saying there is one definite causal structure. The ‘one correct answer’ aspect is interesting because while it is arduous to build a causal structure, checking whether a proposed structure makes sense should be much easier.

I am looking for this causal structure. I think that gaining an understanding of the causal structure can be more insightful than an understanding of the each of the problems in isolation [1]. If you think you have have a causal structure of even part of the list of problems above, please write to me or leave me a comment. If you contact me with a proposed causal structure, please use the following format:

Cause1 -> Effect1
Cause2 -> Effect2

and so on, with one cause-effect pair per line. For the above toy example, this would be:

Overpopulation -> Energy shortage
Consumerism -> Energy shortage
Energy shortage -> War
Lack of tolerance -> War

Think of this as a jigsaw puzzle, in which the problems are the blocks (feel free to pick whatever set of problems you want from the above list, or otherwise. Of course, the more complete the set, the better.), and one has access to as many arrows as needed (The fewer the arrows, the better).

_____

Notes

[1] I think this may be true in general. In middle school I recall homework and exam questions in various subjects asking us to fill in the blanks or match entries in column A with the entries in column B. I feel explaining the causal structure between a set of things would make a very instructive exercise in school because it would force a student to think.

In the eye of the Beholder

Why is the picture on the right more appealing than the one on the left?

What is it that we find more interesting about the picture on the right, compared to the one on the left? The picture on the left contains more information. So we are certainly not looking for more information. One might say we don’t know how to interpret the image on the left into anything familiar, but it is television static. A more precise answer is given by Jurgen Schmidhuber, who argues convincingly that:

Artists (and observers of art) get rewarded for making (and observing) novel patterns: data that is neither arbitrary (like incompressible random white noise) nor regular in an already known way, but regular in way that is new with respect to the observer’s current knowledge, yet learnable (that is, after learning fewer bits are needed to encode the data).

This explains the pictures on top. The picture on the left is not compressible because it is a matrix of uniformly random 0/1 pixels. The Monet on the right evokes familiar feelings, and yet adds something new. I think what Schmidhuber is saying is that the amount of compressibility should neither be too little, nor too much. If  something is not very compressible, then it is too unfamiliar. If something is too compressible, then it is basically boring. In other words, the pleasure derived first increases and then decreases with the compressibility, not unlike this binary entropy curve.

Let us ask the same question again for the following pair of images (you have to pick one over the other):

My guess is that most people will find the image on the right more appealing (it is for me at least). Please drop me a comment with a reason if you differ. When I look at the image on the right, it feels a little more familiar, there are some experiences in my mind that I can relate to the image – for example looking straight up at the sky through a canopy of trees (white = sky, black=tree leaves), or a splatter of semisolid food in the kitchen.

In order for an object to be appealing, the beholder must have some side information, or familiarity with the object beforehand. I learnt this lesson the hard way. About 2 years ago, I gave a talk at a premier research institution in the New York area. Even though I had received complements when I’d given this talk at other venues, to my surprise, this time, audience almost slept through my talk. I learnt later that I had made the following mistake: in the abstract I’d sent to the talk’s organizer, I had failed to signal that my work would likely appeal to an audience of information theorists and signal processing researchers. My audience had ended up being a bunch of systems researchers. The reason they dozed through my talk was that they had just a bit less than the required background to connect the dots I was showing them.

It is the same with cultural side information or context — the familiar portion of the object allows the observer to latch on. The extra portion is the fun. Without the familiar, there is nothing to latch on to. The following phrases suddenly take on a precise quantifiable meaning:

  • “Beauty lies in the eyes of the beholder”: the beholder carries a codebook that allows her to compress the object she is observing. Each beholder has a different codebook, and this explains ‘subjective taste’.
  • “Ahead of its time”: Something is ahead of its time if it is very good but does not have enough of a familiar portion to it, to be appreciated by the majority of observers.

I can think of lots of examples of art forms that deliberately incorporate partial familiarity into them — e.g. music remixes, Bollywood story lines. Even classical music first establishes a base pattern and then builds on top of it. In this TED talk, Kirby Ferguson argues that all successful creative activity is a type of remix, meaning that it builds upon something familiar.

Takeaways:

  1. When writing a paper or giving a talk, always make sure the audience has something familiar to latch on to first. Otherwise, even a breakthrough result will appear uninteresting
  2. Ditto for telling a story or a joke, DJing music at a party, or building a consumer facing startup. Need something familiar to latch on to.
  3. In some situations it may be possible to gauge the codebook of the audience (e.g. having a dinner party conversation with a person you just met), to make sure you seem neither too familiar, nor too obscure.

Climate change and The Burglar’s Stopping Problem

The climate change hypothesis is that global changes in climate leading to significantly higher number of severe weather events are predominantly man-made, and in particular, the release of greenhouse gases such as carbon-dioxide into the atmosphere is a leading cause. After conveniently escaping the national spotlight in the US during the presidential campaigns, climate change has once again appeared in the news, thanks to Hurricane Sandy. Munich-Re, the reinsurance giant released a report, somewhat presciently on Oct 17, that says:

Nowhere in the world is the rising number of natural catastrophes more evident than in North America. The study shows a nearly quintupled number of weather-related loss events in North America for the past three decades, compared with an increase factor of 4 in Asia, 2.5 in Africa, 2 in Europe and 1.5 in South America.

Unambiguously proving that man-made climate change has a role to play in a specific event such as Sandy is more of an ideological debate, than a statistical exercise. And so there are many many people who can boldly claim that man-made climate change is fiction.

A few industrialized nations are responsible for the bulk of CO2 emissions.

Some of these nations have refused to ratify the Kyoto Protocol, which calls for reduction in CO2 emissions. No points for guessing which colors in the map below denote countries that have not ratified the treaty.

(Brown = No intention to ratify. Red = Countries which have withdrawn from the Protocol. Source: Wikipedia)

Most of the world apparently believes in man-made climate change. When will these other countries wake up? I can’t help but think of the following stopping problem:

Taken from G. Haggstrom (1966) “Optimal stopping and experimental design”, Ann. Math. Statist. 37, 7-29.

A burglar contemplates a series of burglaries. He may accumulate his larcenous earnings as long as he is not caught, but if he is caught during a burglary, he loses everything including his initial fortune, if any, and he is forced to retire. He wants to retire before he is caught. Assume that returns for each burglary are i.i.d. and independent of the event that he is caught, which is, on each trial, equally probable. He wants to retire with a maximum expected fortune. When should the burglar stop?

Traffic Lights: Regulation vs. free-markets

In the aftermath of the Sandy Hurricane, many parts of the NY/NJ area have sustained power outages, and as a result, traffic lights in these areas are not functional. This requires drivers to approach a traffic junction as a multi-way stop-sign. This got me thinking: What if, in place of traffic lights, we had just stop signs everywhere, and the rule was: the next car to go should be the car at the head of the longest queue. I believe this is an optimal scheduling policy in a certain sense (it provides an optimal throughput x delay product — that is for a given average delay at the intersection, it would provide the highest rate number of cars going through [1] ). In this policy, each driver is trusted to follow the scheduling policy faithfully. For argument sake, I am ignoring (1) the time spent by each driver having to figure out which queue is the longest at each step, (2) how the driver at the head of each queue gets information about the length of each queue, and (3) the loss in efficiency incurred by slowing down and starting.

Compared to this self-enforced scheduling policy, traffic lights can be very suboptimal. You know this if you have ever stood on a red light waiting to turn green while the street with the green signal has no traffic. Why then do we have traffic lights? The problem is that in the self-enforcing scheduling policy, there will be some drivers who will free-load, i.e. they will not obey the rule and simply take the turn for themselves, even if the turn belongs to someone else according to the scheduling rule. Further, when this happens, it will often result in collisions between the free loader and the rightful owner of the turn. This is why traffic lights are necessary, even though they come at the expense of reduced overall efficiency.

There is a nice lesson embedded here that speaks to the need for government regulation by way of analogy: Regulation is necessary to enforce fairness and safety by preventing freeloaders and accidents, even though a free market might provide higher overall benefit if everyone was guaranteed to behave properly. Therefore regulation is the price we must pay, in the form of reduced overall benefit, to counter the fact that all market participants do not behave as per the rules if left to themselves.

EDIT 1: The loss in overall utility when all participants are allowed to act selfishly, compared to the state where each participant acts for the overall good of the set of all participants, is called the price-of-anarchy. This is different from (but related to) the loss in overall utility from the imposition of regulations. A simple 2-player prisoner’s dilemma can exhibit the price of anarchy when all participants are worse off if allowed at act selfishly, compared to the overall optimal for the 2 players. In the traffic light example, when players act selfishly, they create unfairness and also end up endangering everyone (including themselves, but perhaps they don’t realize this bit). Hence the utility derived by each participant is lower, compared to if they all cooperated perfectly.

EDIT 2: Regulation can be thought of simply as a mechanism designed to improve the utility received by players beyond what it would be in anarchy, by changing the (rules of the) game a little. Regulation typically doesn’t take the system to the overall optimal (which corresponds to perfectly cooperating players in the original game) of the original game. The ‘price of regulation’ ( = utility of overall optimum – that achieved by regulation) should be less than the price of anarchy (= overall optimum – state achieved by anarchy). Modern day regulators need to be really good at mechanism design!

EDIT 3: Perfect cooperation can be unstable against defection by free loaders [2] because the utility a player derives by unilaterally defecting is greater than that obtained by cooperating. If everyone is well aware of the risk of an accident upon defecting, then this can serve as a disincentive to defecting because the utility from defecting, after factoring in the probability of an accident may no longer make defecting worthwhile. This suggests that simply increasing awareness of the risks posed by misbehavior upon the misbehaving player, might improve the overall equilibrium a bit. Of course, this requires that the defector bear extra personal risk.

___

[1] I know this because it holds true for scheduling packets transmissions in a class of communication networks [citation required].

[2] I experienced free loaders first hand during the last few days after Sandy in 2 different contexts: people going out of turn at road intersections, and people trying to break into the long line at a gas station.