Programme Evaluation, Impact Assessment, and Impact Evaluation: Commonality and Differences
November 11, 2020Results-Based Monitoring: 4 Important Aspects of RBM (and Evaluation)
November 25, 2020Proving Attributable Impacts – Experimental, Quasi-experimental, and Observational Designs
Impact evaluation is a means to measure the causal or attributable impact of an intervention. To attribute the impacts to the intervention, we must compare ‘what happened with the program’ (factual) with ‘what would have happened in the absence of the program’ (counter-factual). To measure these two parts, we need a “treatment group” and a “control (or comparison or counterfactual) group”. The robustness of impact evaluation design depends on how similar or exchangeable both these groups were before the intervention began.
Ideally, we need a counterfactual group that mimics what would have happened in absence of the program so well that the implementers should have been totally indifferent if we had asked them to shift the programme to the control group instead (this is exchangeability). In practice, the rigor and validity of an impact evaluation differs by methods for identifying the control or counterfactual outcome as follows.
You can also view our webinar recordings on RCTs and Non-RCT methods.
EXPERIMENTAL APPROACH

1. Randomized Control Trials (RCTs) – From an exchangeability perspective, this is a gold standard – groups or people are randomly assigned into treatment or control groups. As the sample assigned to each group increases, this ‘lottery’ will ensure that on an average, the two groups are identical, both in terms of observable and unobservable characteristics. In fact, theoretically, if a number of units (people or groups) randomly assigned are substantially large (say, hundreds), we can ‘assume’ balance or exchangeability at the baseline and not even conduct a baseline survey.
However, this design is possible only if planned well in advance before the implementation starts and programme design allows for randomization. RCTs are sometimes criticized to be unethical but we have rarely found merit in these arguments because RCTs are used only to prove attribution or causality for interventions that are unproven compared to the usual or prevailing standard-of-care. So, implementing something that can cause no good, wastes money and time, or can cause harm only on basis of conviction or biased research is unethical; not subjecting such intervention to a tough test of proving causality!
QUASI-EXPERIMENTAL APPROACH

When RCT is not possible, quasi-experimental methods that ‘find the control group non-experimentally but still compare treatment’ and ‘control groups’ are selected.
2. Regression Discontinuity Design is used when clear eligibility criteria are used to differentiate the participants and non-participants. It compares those just below and just above the eligibility cut off to a certain tolerance level in the eligibility score (but not everybody), assuming that the only difference between them is the program participation.
3. Statistical matching involves matching people or groups with access to the program with those without the access using pre-intervention data – either primary baseline survey data or secondary data such as a country Census. In theory, if the matching algorithm can mimic the programme allocation decisions well, then there is no systematic difference between “treatment group” and “control group” and the estimated change in the endline is causal. Propensity score matching is the most theoretically sound and widely applied method.
4. Difference in Difference comparisons are usually an additional feature after a control group is identified using one of the above two designs. This design combined before-after comparison with-without comparison. Any pre-existing difference in the outcomes at the baseline is controlled or differentiated out so that the difference observed in the endline is considered causal or only due to the intervention. In this method, we must assume that in the absence of the program, there would be no differences in the changes over time between the participants and non-participants.
OBSERVATIONAL DESIGNS

Not to split a hair, but quasi-experimental methods listed above are also observational, simply because they are not experimental. However, we distinguish the observational design below as those where a control group is not explicitly identified, and we can claim association but not causality.
5. Before-and-After comparison involves comparing the same set of people or same group of villages (any clusters) before and after the intervention. This design simply assumes that any change over time is caused by the programme; a fair assumption for an output that can only be produced by a programme such as piped water connection to all households, but not for an outcome that is higher-order such as a reduction in diarrhoea.
6. Multivariate Regression involves statistical controls to find out ‘marginal’ change associated with the intervention while controlling for factors that could be different between participants and non-participants (ceteris paribus).
Given the numerous assumptions driving the quasi-experimental and observational methods, advanced level analytical skills and experience with evaluation design is needed to robust inference. Sometimes these methods are combined with qualitative research, secondary data analysis, use of monitoring data, and arguing on basis of a programme theory to rule out alternatives to the programme which could deliver similar impacts and thus make causal arguments. Therefore, for a similar rigor of measurement (questionnaires), RCTs are often cheaper in terms of sample size and time-cost of senior researchers and analyst’s time.
Our advice on which methods to use is simple –
• Not all programmes need evaluations, but all programmes need monitoring;
• Thinking and integrating evaluation at programme design stage will ensure a feasible, robust and cost-effective design;
• Don’t be afraid of RCTs if you truly care about proving causality (which you should if your intervention is unproven). There are variations and procedures that have minimal effect on how you roll out the programme; and
• Not all programmes need a causal impact evaluation and there are other types of evaluations that may be more suited to your needs. So, better to do some evaluations using methods ‘feasible’ under given resource constraints than not do an impact evaluation.
Transform the World with NEERMAN
Do you have a specific project or partnership idea?
Do you need a micro-consulting on M&E questions that Google can’t answer?
Do you have general comments or inquiries not covered above?