Alternative foundations for probability theory

Jaynes' derivation of probability theory is an attempt to show the necessity of probability theory from first principles. He wants to show that probabilities are the way to reason. His approach is to start by assuming that we wish to assign real number "credences" to arbitrary events, and then define an unknown function f such that P(A,B) = f(P(A), P(B|A)). He then asks what the form of f could be. He make some simple assumptions about the sign of the relationships between these functions, and then draws heavily on the assumption that no matter which order the computations are performed, certain chains of computations must all yield the same result. From this he shows that f must be the multiplication operation, which gives the familiar product rule of probability. He is then able to derive essentially all of probability theory.

So this is what one possible foundation for probability theory looks like. The project of building a foundation for probability theory is different to that of founding other branches of mathematics because the goal is to show that probability theory is not just one possible system for reasoning but is the uniquely correct way to reason. Merely showing that probability theory is internally consistent or even elegant is in sufficient; one hopes to convince the reader that all intelligent reasoners ought to use probabilities. This gives the foundations of probability theory a very different character to that of other branches of mathematics where the main goal is to come up with a model in which various theorems can be neatly and robustly expressed.

Here is an idea for a different way to found probability theory.

Ultimately, an intelligent reasoner must take actions in the world. Probability theory, or indeed any epistemology, must eventually output decisions, since whatever your epistemology says about evidence and its relationship to beliefs, if it is incapable of outputting any decisions then it must be a very poor epistemology.

Probability theory outputs decisions through expected value computations. Under elementary causal decision theory, one assigns probabilities to various possible future world states conditioned on taking certain actions, then the action with highest expected utility is chosen. Under more sophisticated decision theories, the expectations are not conditioned on simply taking certain actions, but rather on all instances of a certain decision algorithm outputting a certain value. Without getting too much into decision theory here, what is common to all these decision theories is that they follow this pattern:

  1. Assign probabilities to possible future world states
  2. Assign utilities to possible future world states
  3. Compute some set of expected values (the details of which depend on your decision theory)
  4. Find the maximum among those expected values
  5. Take an action determined by this optimal value (the details of which depend on your decision theory)

Putting aside our intuitive understanding of the meaning of probabilities, what exactly does probability theory provide in the above? Principally it provides a way to assign real numbers to possible future world states such that if you take the inner product between probabilities and utilities over those world states then your action is determined by the argmax. Importantly, the probabilities are independent of the utilities: you could compute all your probabilities are write them down on a sheet of paper and forget about all the reasoning that led you to accept those probabilities, then completely change your utility function, yet you could still use the probabilities you wrote down to determine rational actions under your new utility function. If you really wrote down probabilities for all possible future world states then no matter how many times you changed your utility function, you would never need to go back and reconsider the raw evidence from which you derived your probabilities. I call this the property of being actionable: one's probabilities screen off the evidence from which they were derived, in the sense that one's actions are functionally independent of the evidence given probabilities.

So this is my idea for an alternative foundation for probability theory: start with the assumptions above and ask: what kind of system for assigning numbers to possible future world states has the property that one can optimal decisions from them under any utility function?

One hurdle in this project would be to express precisely what it means for a decision to be optimal without reference to probability theory, since assuming the concept of expected values would be assuming the consequent. It seems to me that this has a good chance of proving surmountable, since the idea of a "good" decision given a pre-defined utility function seems fundamentally concrete, and it should only be necessary to describe in the most abstract terms.

In addition to actionability, probability theory has the desirable property of being updatable. Suppose you assigned probabilities to all future world given the evidence available to you. Now you make one further observation. Probability theory allows you to compute new probabilities for future world states as a function of your old probabilities and a small set of likelihoods. This allows you to avoid storing all past observations and instead store just the latest set of probabilities, knowing that no matter which observations you receive in the future, you will always be able to update your probabilities correctly without needing to return to older observations. This a second form of screening off: this time, probabilities conditioned on observations (x1, ..., xn, xn+1) are screened off from observations (x1, ..., xn) given the probabilities conditioned on (x1, ..., xn) and the likelihoods for xn+1.

Once again, the properties that make probability theory nice here can be translated into desiderata for selecting among any quantitative epistemology: namely, we found probability theory by looking for any system with the updatability property. Perhaps actionability and updatability are sufficient to completely specify probability theory: if so, we would have an even deeper understanding of the centrality of probability theory in epistemology. If not, perhaps we can find an example of a non-probabilistic system that has both of these properties. This latter outcome could be very interesting -- perhaps even more so than the positive outcome -- since we could then ask what reason we have to prefer probability theory over this alternative system. And if this project were to turn up a whole different quantitative epistemology then that could be a very interesting result on its own!

Chicken farmers become the chickens

I just watched a show about the plight of chicken farmers. It turns out that the companies that deliver meat to supermarkets are not simply middle men that buy chickens from farmers and sell the meat onwards, but actually own the chickens and pay farmers to raise them under contract. This, the show claimed, is terrible for the chicken farmers because the big chicken companies keep imposing new requirements on the farmers’ facilities, which makes it impossible for the chicken farmers to turn a profit. Furthermore, the chicken companies evaluate the farmers’ performance against one another and ruthlessly punish poor performers by refusing to renew their contracts.

My first reaction, of course, is to ask why the chicken farmers got into this business in the first place if it’s so bad for them. Surely the big chicken companies have little to gain by destroying all chicken farmers everywhere, so perhaps we’re just hearing from the farmers who are actually poor performers. And even though it’s painful for poor performing chicken farmers to go out of business, it’s better overall that the resources invested in those farms are re-invested elsewhere than that they keep sub-optimally producing chickens. Most importantly, the proposed fix is to not to simply keep a few chicken farmers from going out of business, or even to reshape the big chicken companies into kinder, friendlier entities, but to introduce regulation, which injects a whole host of complicated and often negative incentives into the situation, and must be administered by people who are paid using tax dollars, and who may eventually be captured by the industry they were supposed to be regulating. So rather than asking the government to change the rules of the contracts they freely entered in to, perhaps the good chicken farmers of Mississippi should concentrate on chicken farming.

That’s one view, but it isn’t quite the whole picture. Perhaps the chicken farmers really are being exploited by the chicken companies. Perhaps the chicken companies came to their houses and had them sign long and complicated contracts that they could neither understand nor afford competent legal advice. Perhaps the chicken companies explained the contracts in plain english but made subtle and deniable omissions and embellishments, misleading the chicken farmers into thinking they were getting a reasonable deal when they weren’t. I have no idea whether this is accurate, but what should we do if it were? What if it becomes clear that the chicken companies can keep doing this and the farmers will never “wise up” because the economic disparity between them and the companies is simply too great? Exactly how far should we let the chicken companies grind the farmers down before stepping in?

We could certainly find ways to prevent the companies from fooling the farmers into bad contracts: we could subsidize legal advice for the farmers, pass laws preventing the companies from misrepresenting their contracts, even ban the most exploitative kinds of contracts. But this is really just an anti-fraud effort. It’s important for the government to prevent flagrant misrepresentation of contracts for the same reason that it’s important to prevent insurance scams, which, for that matter, is the same reason that it’s important to prevent bank robberies: because the smooth operation of the country relies on a robust rule of law.

The more interesting question is this: what if the big chicken companies are not fraudulent mega-corporations making money hand over fist by exploiting small-time chicken farmers, but are in fact themselves also squeezed to the brink by their own customers?Perhaps the chicken companies are not defrauding the chicken farmers at all, but are simply imposing brutally onerous terms that they calculate to be ever so slightly better than the chicken farmers’ next best option. Yes the bottom 25% of chicken farmers each quarter are discarded in favor of better-performing farmers. Yes this pits neighbor against neighbor in a brutal deathmatch to raise the fattest chickens. But the competition among the chicken companies for market share at the supermarket level is so intense that profit margins are razor thin and any company that spares a thought for the farmers will be demolished by some more ruthless company.

And if this extends also to the supermarkets, who are themselves locked in a heated battle for market share among consumers?

Because the farmers’ situation does sound pretty terrible. I sometimes wonder how I’m going to be able to afford to raise a family, and I’m in my 20s making a software engineer’s salary in silicon valley, so I can scarcely imagine what it’s like to raise a family on a chicken farm in Mississippi, never mind also being locked in a brutal deathmatch against your neighbors, with your family’s livelihood riding on your performance at every quarterly chicken weigh-in.

It sounds like a rather terrible society-level outcome, and it makes you question whether the mechanism of organization that led to it isn’t flawed. Even if every contract along the way was entered into freely, and every decision was retrospectively endorsed by its maker, a world of merciless competition strikes me as fundamentally sub-optimal, and it seems that any mechanism of economic coordination that gives rise to such a world is also sub-optimal.

Would an infinitely wise, perfectly selfless dictator organize his domain under a mostly-free-market system with a few caveats, or something radically different? Experiments with radically non-free-market systems over the past hundred years have been disastrous, but the space of possibilities is large, and the failed approaches with which we are all familiar represent only a tiny fraction of this space. So perhaps it’s worth exploring further.

Reasons you might not be replaceable

Rob Mather is the founder and president of the Against Malaria Foundation, which is a nonprofit that raises money to distribute bed nets in areas that are at high risk of malaria. AMF seems to be doing excellent work on an important problem, and so it would seem that Rob himself is doing a very worthwhile thing with his life. But suppose someone put the following argument to him: "Rob, you're sure doing a great job with AMF, but before starting AMF you were a high-paid strategy consultant, so surely you could go back to doing that and earn enough to pay someone else to run AMF, and still have enough left over to fund many, many bed nets, and still have enough left over to live comfortably. If you really care about doing good, should't you do that instead?"

Even if the premises of this argument were true (I don't actually know how much Rob Mather could counterfactually donate if he went back to consulting), I think it would probably still be bad for the world if Rob followed this advice. However, although I am going to criticize the argument above in this essay, I am very much focussing on the case where you're already a founder or key person within an organization whose mission you care about. I'm going to argue that in this case, you may more difficult to replace than it first appears, so if you care about the organization's continuing success, you should take my argument into account before leaving to do something else. But I'm not criticizing the general earning-to-give argument, which is normally applied from the vantage point of someone is choosing between working directly on a problem they care about versus working in a high-paid field and donating to that cause, but has not yet embarked on either path.

Here's a thought: In 2011, Larry Page returned to the CEO position within Google. But what are the chances that this person who started a successful web startup in the late 90s also just happens to be the most talented executive that Google could possibly recruit? Larry is clearly a brilliant engineer and entrepreneur, but the chance of the fastest sprinter in the world also just happening to be the fastest hurdler in the world are incredibly slim, and so it seems apriori unlikely that Larry is the single most accomplished executive that Google could possibly have recruited.

Yet I think Larry probably is the person best suited for this role -- not because he has some singularly unsurpassed skillset, but because he occupies a position of spiritual authority within the hearts and minds of the folks at Google. Compared to anybody else, Larry will have spent less time earning the trust and respect of Google's 50,000+ employees, and more time focussing on the furthering the overall mission of Google. When making strategic decisions, he will have faced less resistance from skeptical board members of potential competitors, since who would even think of attempting to replace Larry as CEO? Even after making mistakes, he will have spent less time consolidating his position and instead more quickly moved on to the next problem. And overall this is a very good thing for Google.

Much of this comes down to "spiritual authority". If you are a founder or key early person within an organization, others probably look to you as a leader. And importantly, they predict that others do too, so are less likely to engage you in zero sum games of rivalry. It's very difficult for someone to come into an organization and enjoy that same situation. Typically, when somebody new enters a position of leadership they must spend a great deal of time gaining the trust and respect of those within the organization, and, even if they succeed in this, they may never earn the "unthinkable to remove" status of a true founder. The existence of somebody -- anybody -- with spiritual authority is enormously valuable to an organization, since it allows the organization to coordinate in ways that would have otherwise been impossible.

So my advice to Rob Mather is: don't leave AMF! And my advice to you, if you're thinking of leaving an organization that you helped to build, and you think that it can carry on without you, is to factor in the cost of your organization losing spiritual leadership. It's not a matter of finding someone with the right skills to replace you, or even of finding someone that everybody else in the organization trusts and respects, but of finding someone who can become a definitely-the-right-person-to-be-running-things Schelling point in the hearts and minds of the entire organization. It's frightfully hard to replace such people.

Boards, factions, and democracy

When a board of directors takes a vote — a real vote, where a decision is disputed and its resolution hangs on the vote — you know something is wrong. A board of directors is a small group charged with both final accountability and final authority over a company. The board can fire and hire top-level executives, must approve major financial decisions such as a merger or acquisition by another company, and is held accountable for the lawfulness of the company’s finances. The directors themselves are company executives, major investors, and advisors recruited for their connections, insight, and wisdom.

But unless something is wrong, boards tend to agree unanimously on most issues, or are able to come to a consensus. How does this happen? Firstly, most decisions — even non-executive board-level decisions — are made or strongly influenced by the CEO. Good boards see themselves as an advisory and oversight committee to help the CEO do their job, and they rightly give a very great deal of authority to the CEO. After all, the CEO is paid full-time to run the company, so the directors correctly reason that almost always their best move is to make sure the company has a great CEO, and then let that person do their job. Typically board-level decisions are really made by the CEO, who communicates his advice to the board, who approve the decision. This is what a healthy board of directors looks like.

The more general explanation for why boards tend to come to consensus is that, under normal conditions, the board members are well aligned: they all wish to maximize the long-term financial success of the company, since each of them stands to personally gain or lose as the company succeeds or fails. So the only disagreement can be about what decisions will maximize that shared goal, and it’s much easier to come to agreement when everybody is motivated by the same goals (and everybody knows that, and everybody knows that, and so on), especially when each person is incentivized by significant financial stakes.

A board making decisions by explicit majority rule, then, is a sign of trouble, because while it could mean that the board simply has factual disagreements about which decision is best for the company, it’s also possible that the lack of consensus is a sign that the board has become misaligned: that, somehow, the board is no longer motivated by a uniform goal. More troubling still would be a board that has split into factions that are directly fighting each other across many issues, since that situation is less likely to arise due to factual disagreements, and more likely to be a result of misalignment.

An excellent example is the board of Hewlett-Packard between 2001 and 2006. During this time the board hired and fired four CEOs, the president of the board employed private investigators to spy on other board members, and board members were regularly removed from their post and replaced with others more loyal to whichever faction had temporarily gained the upper hand. Needless to say, this was an awful situation from the perspective of HP the company, and indeed this is period is widely seen as among the worst in HP’s history.

In the domain of politics, however, factions are the norm. The existence of a ruling majority party standing off against a minority opposition party is so familiar within democracies that it seems not just normal, but necessary. We are not talking here about what would ordinarily be considered "political dysfunction", we are talking about the best that democracy offers; when robust debates light up our parliaments and senates and congresses and politicians from opposing parties square off over important issues. This, we normally think, is a sign of a well-functioning government.

But the very same pattern of factional trench warfare is a bad sign among a board of directors. When your board divides into two groups who then oppose each other on every issue, run for the fences! It doesn’t matter what forum the debate takes place in, or whether debate is encouraged or repressed, or whether it is carried out through dignified discourse or a bitter smear campaign, the very fact that it is possible to distinguish two groups that are consistently fighting is a dire sign of impending doom!

Why then this striking difference between democracies and companies? I think it’s ultimately because the two party system is a very strong attractor state for representational democracies, so every democracy naturally falls into the trap, and it’s been this way for so long that it seems not just normal, but necessary. 

Let’s look at why political factions form in the first place. No democracy was founded with the intention of dividing the country into two near-equal halves and having them fight for the rest of eternity. If you or I were creating a new democracy, we would not design things with this goal, and neither did the founders of modern democracies have this particular state of affairs in mind when our democracies were created. If you or I were creating a democracy, we might plan that each electorate would elect a person to represent their wishes, and that these representatives would then assemble and make well-reasoned decisions in the best interests of the country. Perhaps this is a close approximation to what does happen in democracies, perhaps not, but either way it’s not obviously necessary for the representatives to divide into exactly two groups. Yet in actual democracies this division into parties happens very consistently. Why?

Consider a government of five representatives, each hoping to maximize their influence over decision-making. Clearly, three of them could agree to always vote together, capturing one third of total influence each. But now the two representatives left out in the cold are also incentivized to band together and try to recruit one more to their ranks so that they may pull the same trick. In a larger government, imagine one party with 45% of all seats, and two smaller parties, each with half of the remaining seats. Clearly it’s in the interests of the two smaller parties to band together and capture a majority. This kind of thing happens all the time — it’s a perfectly normal part of democracy. So this is how a congress of individuals with no apriori intention to form a party system nevertheless end up forming a party system. 

If this agglomeration dynamic is so powerful that it has taken hold of literally every major democracy, why don’t we see it more often within corporate boards? We certainly see it in some corporate boards -- usually ones that are unhealthy -- but it’s not the norm. Why not? As always, it’s about incentives. Returning to the government-of-five example, suppose instead that five individuals were managing a pomegranate tree that they jointly owned. They agree that decisions will be made by majority vote. Suppose they hire a gardner, who advises them to invest in a certain amount of fertilizer. How do they respond? Most likely they check that the gardner is showing diligence and responsibility, and then take his advice. Perhaps they decide that the gardner is wildly incompetent and fire him. But in particular there is no incentive for any three of the owners to band together to grab a larger slice of the pie, since there is nothing to grab. Nobody — majority or otherwise — can disenfranchise anybody else of their share in the tree since each person’s property rights are protected. Perhaps three owners could vote to liquidate the tree and distribute the pomegranates to their families, but this is not even in their own best interests since they stand to gain more by ensuring the long-term viability of the tree and extracting pomegranates from it over a long period. Which is what the rest of the owners want as well.

There is a yawning divide between a well-aligned democracy and a poorly aligned one. When everybody is incentivized to pursue the same goals, a democratic board of directors can provide competent oversight, ensure that bad eggs can be expelled, and provide a stabilizing force against the shifting whims of any one individual. But when one group’s gain is another’s loss, democracy leads instead to the formation of parties that battle over each issue, and all the dead weight loss that we see in modern democratic governments.

On abstractions

My day job involves designing artificial intelligence systems that perceive things about their environment and are able to locate themselves within it. To do this, we spend a lot of time thinking about how our system should store and process information, and how its beliefs should be updated when various sensor measurements arrive. We only use a small set of sensors, and at the end of the day there are just a few quantities we’re interested in (position, orientation, velocity, and so on). Our job -- our only job for the past three years -- is to use the two or three sensors attached to our devices to estimate these quantities.

But if you strayed into our office and watched us for a few days, and especially if you looked at snapshots of our whiteboard over a few weeks, then you might think that we were working on ten completely different problems. Some days we draw diagrams with little circles and squares connected together, and our discussions centers around likelihoods, posterior probabilities, and maximum likelihood inference, and we’ll worry about whether our system is over-confident. Other days we’ll draw axes with labels like “cost” and “hypothesis space”, and curves with points marked out like “estimate” and “linearization point”, and we’ll talk about iterative optimization and manifolds and gauge freedoms and worry about convergence properties. Other days still we’ll draw block diagrams and data schemas that show how the information flows between the components in our system, and we’ll worry about synchronization. Yet other days we’ll write down tables of algorithms with numbered steps one through N, and worry about computational complexity.

In fact we’re talking about the very same system every day, and not just the same system but the same parts of the same system, except we’re viewing the system through different lenses and from different vantage points. Each vantage point helps us to see some things clearly while other things are obscured. If we’re trying to solve a threading deadlock issue then no amount of time spent drawing graphical models will help to uncover the problem (though the block diagrams may help). And if the system is becoming overconfident and we’re not sure why then our block diagrams will have absolutely nothing helpful to say (but the factor graphs might). I do not know of any vantage point from which it is possible to see all properties of the system simultaneously.

It is quite dangerous to focus on just one of these abstractions for too long. Spend too much time thinking about computational complexity and you’ll end up building a very efficient system but it may no longer be probabilistically consistent. Spend too much time thinking about probabilistic consistency and your perfect reasoner may parallelize very poorly. Worse, if you think exclusively in terms of just one abstraction then you’re getting an incomplete picture of what’s really out there, and you’re failing to include important parts of the territory in your map.

Abstractions, though, are seductive. Factor graphs are a beautiful and powerful way to describe probabilistic reasoning. The language of convex optimization has more to say about how to actually implement reasoning systems than perhaps any other type of mathematics. Bayesianism, of course, illuminates great swaths of conceptual terrain in a single brush stroke. Being introduced to one of these abstractions for the first time is like realising that you have another sense that you hadn’t until now realised you could use. Especially if you’ve bumped up against these ideas many times before without realising quite what you were grappling with, the feeling of understanding a bigger part of the picture for the first time can be very exciting.

So exciting, in fact, that it’s tempting to try to explain everything through a single lens, and to declare anything not illuminated by this lens as unimportant or non-existent. Indeed it’s the most powerful abstractions that are most prone to ignite this tendency, since when a single hammer works on a thousand nails in a row, you might question whether the one nail that stands defiant is really a nail at all.

Worse: powerful abstractions can so captivate our attention that we start believing that the bits of reality that are not illuminated by them are simply unimportant.

This is not to say that abstractions are wrong, or bad, or even that they give “only one picture of reality”. Bayesianism tells us that if we want to combine multiple uncertain observations into a coherent set of beliefs, the way to do so is by multiplying the likelihood by the prior and dividing by the evidence. This is not “just one way to do it”, it is the only way, and Bayesianism tells us exactly why. Similarly, on the question: “what is the asymptotic running time of quicksort as the input size tends to infinity?”, the language of computational complexity tells us that there is one, and only one correct answer.

The point is that we should be on the lookout for questions that our best abstractions do not have an answer to, or have only a partially satisfying answer to. And rather than forcing them to fit into our abstractions or -- worse -- sweeping them into the dustbin of “unimportant” or “uninteresting”, we should look for new abstractions that jointly explain many of the these together

The problem with memes

Recently there has been a lot of talk about the snippets of language and culture that are known as memes. They pop up in the newspaper, on television, and most of all on the Internet. In this essay I’ll briefly discuss the explanatory role that meme theory is supposed to provide, then I’ll argue that there are some gaps in the evidence supporting this particular explanation.


Richard Dawkins originally introduced the concept of memes as an illustration of evolution outside the biological domain. He postulated that the framework of evolution by natural selection was not specific to biology, but could actually be applied wherever one found replicators, so long as two conditions were met:

  1. Heritability with error. The replicators must contain information that is passed on during replication, and errors must occasionally be introduced during this process.

  2. Selection. Some replicators must systematically produce more copies of themselves than others.

Biological genes meet these criteria since the information they embody is represented in their genome and is passed on via reproduction, and the natural environment provides selection pressure in the biological domain. In The Selfish Gene, Dawkins went to pains to explain that in the case of biological evolution it is genes, not organisms (which contain many genes) that are the replicators, and hence we should not be confused by apparent mal-adaptations such as the peacock’s tail, which are explained easily from a gene view but create difficulty for the organism view.

Dawkins also pointed out that his criteria are not merely necessary for evolution, rather they constitute sufficient conditions for evolution. That is, if the two criteria are met then evolution will occur, no more questions. The reason for this is that any system that meets the criteria will naturally explore a search space via replication, and selection will bias that search towards regions representing greater propensity to replicate. Without any further ingredients, any such system will evolve.

This phenomenon has been put to use by computer scientists, who use it to solve abstract optimisation problems. Genetic algorithms, also known as evolutionary algorithms, solve problems by simulating artificial genes (typically represented by sequences of bytes in the computer’s memory) in an artificial “environment” defined by a fitness function. The programmer specifies a fitness function appropriate to the problem at hand; then, beginning with a randomly generated population, the algorithm proceeds to pick the most promising individuals (as defined by the fitness function), which, by analogy to biological reproduction, are “recombined” to create the next generation. Of course, this reproduction is entirely artificial – the algorithm just mixes the bytes from the fittest individuals to create the next generation – but artificial or not, this system meets the criteria for evolution and indeed evolutionary algorithms can solve problems for which few other optimisation algorithms perform well.

Another type of replicator that meets the criteria for evolution is ideas. Humans pass ideas from one to another, so every time you repeat a catch-phrase, concept, or idea in conversation you are participating in the “replication” of that idea. Of course, the idea will often come out slightly differently to when you heard it, so ideas meet the criteria of heritability with error. What about the second criteria? Well consider how much more likely you are to pass on a funny joke or a catchy phrase or an insightful idea, in comparison to a dull, lame, or obvious one. Some ideas are stickier – they appeal to us, play to our sensibilities, resonate with us – and that is what constitutes selective pressure for ideas. So according to Dawkins’ argument, having seen that the relevant criteria are satisfied, we can now conclude that ideas, or memes, will evolve to be ever stickier to the human mind.

The meme concept has been received with much applause, and has been applied with great vigour to explain many aspects of human culture, from phrases invented by internet communities, to religion, and even to scientific theories. But is this a valid conclusion to draw from the analysis of the previous paragraph? Dawkins’ argument demonstrates that evolution of memes will occur, but it doesn’t tell us which particular phenomena we can attribute as caused by that evolutionary process. This is the crucial step in the argument that I believe requires further evidence; that is, the step from “evolution will occur among replicator class X” to “phenomenon Z was caused by evolution among replicator class X”. Dawkins’ criteria can get us to the first step but not to the second. To make the jump to the second step we need additional evidence.

As an example, consider the following (absurd) example. Pretty streams that flow through picturesque countryside naturally appeal to our sense of beauty. In time, people will tend to reshape other streams in the image of the prettiest streams they have seen in the past. Pretty streams will in this way reproduce themselves, with humanity’s sense of beauty providing the selection pressure. Therefore, the Nile – surely one of the prettiest streams in the world – must have been produced by evolution-of-beautiful-streams. Let’s call this exciting new phenomenon stremes!

The conclusion of the argument above is pure nonsense: the beauty of the Nile was produced by geological forces, not humans, and certainly not evolution. But if the conclusion was wrong then either one of our premises must have been wrong, or else there must have been a mistaken step somewhere in the argument. So what exactly went wrong? It is certainly true that people do have a sense of beauty and that some streams are naturally more beautiful (to humans). It is also true that humans reshape their environment (in part) according to appealing environments they have seen before – the spread of landscaping trends over the past few hundred years bears witness to that. Despite the absurdity of the stremes example, it does satisfy Dawkins’ criteria for evolution, and indeed if we imagine a world in which the only force is humans industriously replicating their favourite streams, occasionally introducing errors, then, yes, we actually could imagine an evolutionary process leading to ever prettier streams. It is true that stremes could evolve, but the real problem with the argument I gave was in the final conclusion, that a specific phenomenon (the Nile river) was caused by this type of evolution. Perhaps something similar to the Nile river could eventually come about as a result of evolution-of-stremes, given a sufficiently conducive environment and enough time, but that is different to the question of whether the actual Nile river did come about as a result of evolution amongst stremes. Indeed we have very good reason to believe that the Nile came about for reasons completely unrelated to evolution. There is a gap between the statement that “stremes will evolve given the right circumstances”, and the statement that “the Nile river was caused by evolution among stremes”. And that gap can only be bridged by extra evidence.

The same evidence gap exists in the case of memes. The argument for memetic evolution is valid: Dawkins’ criteria are satisfied and a straight forward argument tells us that memes will evolve. However, to conclude that any particular phenomenon such as Rick-Rolling or Buddhism or silicon chips was caused by evolution of memes requires further evidence.

At this point it may be worth returning to the case of biological evolution and asking on what grounds we conclude that the specific phenomenon of life on Earth is a result of biological evolution. Well, I’m glad you asked. One source of evidence is the fossil record, which shows a sequence of species that looks just as it should if evolution were the cause for life on earth. Another supporting observation is that the range of species found on each continent looks just as it should assuming an evolutionary process on separate land masses occasionally connected by land bridges. These pieces of evidence are independent of Dawkins’ criteria for evolution (which merely show that genes will evolve), and it is because of them that we conclude that life on Earth was caused by an evolutionary process.

Similar evidence is required to validate memetic explanations for any specific phenomena. If evolution amongst memes really was responsible for, say, Rick Rolling, then we should expect to see a relatively continuous sequence of memes in historical records, analogous to the fossil record. We should also expect never to find irreducible complexity in memes (that is, memes that could not have been caused by evolution because removing any part would destroy the meme’s fitness). I have not collected evidence that supports or denies memetic explanations for particular cultural phenomenon; the thesis of this essay is simply that it is invalid to assume memetic explanations in the absence of such evidence.

Another piece of evidence that supports biological evolution as the cause for life on Earth is that there is no other process we are aware of that provides a plausible alternative. But is the same true in the case of memes? Are we forced to accept memetic explanations for lack of any alternative? No, we are not; there are many alternatives to memetic explanations. For instance, the word “mum” has come to mean “mother” probably because “mum” is an easy sound for young children to make given the biology of the mouth and throat, not because it arose from evolution of memes passing between infants. That is, our use of the word “mum” is explained by biological facts, not memetic evolution. Don’t be confused by the fact that our biology itself arose through an evolutionary process: although biological and memetic evolution could surely interact, they are fundamentally different processes since they operate on different substrates: DNA versus human discourse.

Another candidate explanation for cultural phenomena is optimisation by humans. When Einstein proposed the special theory of relativity he did not simply make random permutations to the ideas of those before him; rather he took the best ideas in physics at the time and made directed, purposeful modifications. A memetic explanation for how the theory of special relativity came about would posit many physicists making smallrandommodifications to existing theories, followed by selection of the more promising theories for further modification. While the latter is quite plausible, the former is not: Einstein’s contribution to physics was anything but small and random; rather he used insight and rational inquiry to make contributions in a directed manner. That Einstein’s theory was significantly better than previous theories at explaining physical phenomena could only be accounted for in a memetic framework if there had also been literally billions of competitor theories that proved less useful, since in an evolutionary process a large increase in fitness comes about only as a result of a great many small increases, and an even greater number of small failures. Although there certainly were rivals to special relativity that were ultimately discarded, they number in the dozens, or at most in the hundreds: too small by many orders of magnitude for the memetic explanation.

If optimisation-by-humans is the cause of theories in physics then it is also a plausible explanation for more general phenomenon such as Buddhism, Rick Rolling, or silicon chips. I am not going to try to settle those particular cases in this article, the arguments herein are simply intended to demonstrate that optimisation-by-humans is at least as plausible as memetic evolution, and that therefore in the absence of further evidence it is unjustified to assume the memetic explanation.

The term “meme” has found widespread use in contemporary discourse, especially when discussing the public mindset, since the constitution of that mindset is what memetic evolution is supposed to explain. It is often used to refer to particularly catchy or trendy ideas, but I have argued that in many such cases there is little justification for assuming that an evolutionary process was responsible. Questioning whether the term “meme” should be applied in such cases process is dangerously close to a vacuous quibble over semantics; however, a few cautions do bear mentioning. First, it is a mistake to think that a deeper understanding has been reached just by calling something a meme. In the absence of evidence for an evolutionary process, calling something a “meme” is no different to calling it an “idea” or “phrase”. No greater understanding of its nature or origin has been reached by invoking the term, nor does the term suggest any new ways that it might be manipulated, magnified, or minimised. To talk of “injecting into the meme pool” is no different than just talking about plain old “publicity”. Second, we may miss evidence of alternative explanations for the things we label as memes, since the term implies that the ideas have a “life of their own” (which they would, but only if the memetic was correct).

In this essay I have argued that although memetic evolution is a coherent concept, applying it as an explanation for specific phenomena requires extra evidence to corroborate its causal role in producing those phenomena. In the absence of such evidence we should be careful about using the term “meme” too liberally since we may make unjustified assumptions about the nature of the ideas we are dealing with.