Allied bombers were key to Britain’s air offensive against Germany during the second world war. As such, the RAF wanted to armour their bombers to prevent them from being shot down. But armour is heavy – you cannot reinforce an entire bomber and still have it fly. So statistician Abraham Wald was asked to advise on where armour should be placed on a bomber.
After each wave of bombing, every returning aircraft was meticulously examined and a note was made of where each aircraft had sustained damage by the Germans. The image below conceptualises what Wald’s data might have looked like visually.
So what was Wald’s advice? Where should armour be added?
He essentially advised the RAF to add armour to places where you do not find bullet holes. Wait… what?!
Wald wisely understood that the data was based only on planes that survived. The planes that did not survive were likely to have sustained damage on the areas where we do not observe bullet holes – such as around the engine or cockpit.
Survivorship and selection bias
Survivorship bias, which we see in this particular story, is an extreme form of selection bias. Selection bias arises when our sample (the subjects – be they individuals or planes – being studied) is not representative of the larger population of interest.
We cannot simply study successful companies (or people, or investment products) to make conclusions about what a successful strategy might be. Unsuccessful companies may have used exactly the same strategy.
Nor should we look at our grandmothers’ Singer machines and marvel at how well-made and long-lasting things used to be in the past (“They don’t make ’em like they used to”). The very items that have lasted till today are likely those that were well made to begin with. They provide very little generalised information on how things were made in the past.
After the polling industry got the results of the 2015 general election badly wrong, an inquiry was set up. The conclusion of the inquiry report was that the “primary cause of the polling miss in 2015 was unrepresentative samples”. Selection bias here was a result of a fundamental difference between survey respondents and the general public.
Lessons for business
When businesses try to expand their customer base, it is common to study who they are attracting (existing customers). But perhaps they should also study non-customers to understand why they are unengaged.
When analysing marketing (or product development) strategies, businesses should not simply examine current marketing campaigns. What ideas are currently being terminated by the marketing department at an early stage and so cannot be tested? What ideas have been killed off early because of poor short term returns, and hence cannot be tested for long term returns? Whilst it is impossible to test every single idea, a small selection of these ideas can be assigned a small budget and tested so that you can be satisfied that you are not missing something special.
Data analysis should always be conducted and reviewed with a statistician’s hat on, or you risk drawing incorrect conclusions. This is not just with selection or survivorship bias, but also with problems such as spurious regression, or not thinking probabilistically.
Note: A later statistical analysis of Wald’s work can be found here. Warning, this is very technical.