Development Pathways (08.02.2018) Get your inner nerd out! The World Bank has launched a competition to help them better predict a households’ poverty status based on easy-to-collect information and machine learning algorithms.
“Right now measuring poverty is hard, time consuming, and expensive,” the World Bank states. “By building better models, we can run surveys with fewer, more targeted questions that rapidly and cheaply measure the effectiveness of new policies and interventions. The more accurate our models, the more accurately we can target interventions and iterate on policies, maximising the impact and cost-effectiveness of these strategies.” Sounds great! Or does it?
The use of statistical models and machine learning has become omnipresent in recent years. Hedge funds, for example, use complex algorithms to try to predict the movements of stock markets. Banks use them to estimate the risk of defaults when deciding on loan applications and insurance firms when setting premiums. Internet giants like Google and Facebook build detailed profiles of their users to better target adverts. The list is ever-growing.
Many social assistance programmes in low- and middle-income are using complex maths too, to ‘target the poor’. Their administrators feed data from national household surveys into a computer algorithm, which develops a formula that is used to predict whether a household is poor or not, based on certain characteristics such as family size, type of housing, and education. These ‘proxy means tests’ are a feat of the conditional cash transfer programmes in Latin America, and are also being introduced in countries in Africa and Asia, usually with technical support of multilateral donors.
The main problem, though, is that algorithms can make troublingly unfair decisions. In our paper ‘Exclusion by Design’ we show that social assistance programmes that rely on statistical models to select beneficiaries suffer from high errors, typically excluding at least half of the very poorest households they aim to reach. The economists Brown, Ravallion, and Van De Walle reach a similar conclusion based on data from nine African countries. While ‘econometric’ targeting can do a reasonable – but far from perfect – job of filtering out the most affluent households, it is not an accurate method for identifying the poorest households. What’s more, highly sophisticated models that use more detailed information and complex techniques do not perform much better.
Bjorn Gelders is a Senior Social Policy Specialist at Development Pathways, specialising in child poverty, inequality, vulnerability and social protection programming. His areas of expertise include statistical analysis, microsimulation and policy analysis and he has provided technical assistance to donors, governments and NGOs across Africa, Asia and the Pacific.
When the algorithms get it wrong, there is little chance of recourse. Every year, millions of families across the developing world who really do need support are denied access to social protection because a computer says they are not ‘poor’. Pathways’ recent research into social accountability found that few social assistance programmes have functioning grievance mechanisms, and those that do are better at collecting complaints rather than resolving them. Local officials or caseworkers are not empowered to override incorrect decisions. And, no-one holds the algorithms accountable: I have not yet seen a social assistance programme that monitors their performance and keeps track of errors.
Don’t get me wrong. I’m fascinated by the explosion of ‘big data’ and machine learning in different areas of society. And I think the World Bank has set an interesting challenge for data scientists. Robust data and statistics should play a critical role in decision-making, and efforts to improve and optimise household surveys are important. But let’s not trust machines too much, especially when deciding who has their right to social security fulfilled and who does not.
The causes of poverty are multi-faceted and household incomes and consumption are highly dynamic. Trying to squeeze that reality into a statistical model is challenging at best and dangerous at worst.