Improving Results-based Monitoring in NGOs and CSRs
October 26, 2020RCTs: Best Practices in Managing a Randomized Controlled Trial
November 4, 2020Data Quality: 7 Back-Office Data Checks to Improve Its Quality
For high-quality data, the importance of proper training of survey teams and field-level data quality checks is undisputed. However, back-office data checks are incredibly important to ensure that the data is reliable and credible. Timely and planned back-office data quality assurance and active feedback mechanisms improve data quality with immediate effects.
Most people cite several reasons for not investing enough resources in back-office data quality checks such as lack of time, not planning it in advance (before data collection starts), or lack of staff skilled in such checks. But today, technology has made back-office data quality assurance much easier. In one click, you can find all the quality-related issues from the data.
The only investment is planning prior to data collection and setting up dashboards using basic excel tools such as Pivot Tables and Slicers. You can use more options or a combination of theme google spreadsheets, google data studio, STATA, SPSS, and R.
Following are 7 smart checks for back-office data quality assurance
1. Internal consistency
Are answers to two or more inter-related questions logically consistent? For example, a woman reports a place of delivery as Home but type of delivery as ‘C section’. In rarest of the rare cases, this is possible but warrants a flag that must be investigated with the field enumerator. Ideally, your CAPI App or on-field scrutiny of data should capture such errors. Alas! Not many invest in developing a sound CAPI App either. We recommend identifying 7-8 highly critical logical fallacies from an analysis perspective despite what the CAPI app is able to assure.
2. Duration of Interview

Ensure that you record the duration of every interview and shortlist interviews that were completed in a single visit. Then, flag all ‘completed’ interviews where the duration is too short and thus most certainly indicate that enumerators are rushing through the questions. On the other hand, too long a duration suggests enumerators are not trained well or capable of efficiently administering interviews. Suggest their retraining to the field managers.
3. Overuse of skipping
Unfortunately, some enumerators may overuse skipping and filtering questions to complete an interview in a short time. You can count how many such filtering and skipping questions are coded in a way that speeds up an interview and checks against a plausible range. Count such deviances for an enumerator and have a field supervisor to do a thorough back check based investigation. Such enumerators rarely improve, and their removal may be the best option.
4. Straight-liners
Here, enumerators always code in a systematic manner in Likert-scale type questions. There are two typical patterns:
- the same coding/rating to all questions/statements. (e.g., always selecting the code “3” on a scale of 1-5); and
- alternating between two codes/ratings in a predictable manner across a series. (e.g., alternating between the rating of 2 and 4 on a 5-point scale). It is very difficult to find out the straight liners in large data. But, you can shortlist-3-4 questions where this is likely and focus on those in back-office checks. Always look for such patterns by enumerators. Often these can be due to the enumerator’s lack of ability to read/explain questions and can be fixed through re-training on the field.
Another solution is in CAPI design where you randomly change the rating scale from worst-best to best-worst.
5. Background audio recording (Audio-audits)
Many CAPI Softwares allow recording live audios in the background. Use this feature for 2-3 questions where incorrect reading can lead to wrong answers or where asking the question was critical (say, administration of consent). The back-office can check:
- if the audio records match with an actual answer and
- whether the enumerator read off or explained the question correctly.
We suggest that about 5% of forms should be receiving this audio-audit.
6. GPS Location

Automatic GPS location captured by the CAPI whenever the interview starts. The location should be used for making sure that the interview is happening in the sampled village only. For example, NEERMAN uses a proprietary App to get the attendance of enumerators and verify that they are in the village sampled. The back office can estimate the distance between expected and actual points of interviews and flag suspicious cases. However, please note that the A-GPS system often latches the location of the cell tower and many smartphones cannot detect location accurately in rural areas. Therefore, back-office must be generous with error allowance.
7. Dealing with excessive “Don’t know and Other (Specify)” answers
Some enumerators do not explain or read questions properly and rush by coding the answer as “don’t know / cannot say”. Some enumerators don’t know the categorical answer questions well enough and choose the answer “Others (specify)” and write the answer as a text instead of selected in a pre-coded answer. Back-office should identify 15-20 such questions and count how many times the above answers are coded. If the proportion exceeds 2-3% for a question, then either
- the question’s wording or options are not correct and need urgent investigation by the researchers or
- The enumerator needs re-training or some disciplinary action.
Remember, back-office checks are the final defense against poor data quality. High proportion or frequency of the above problems at back-office most certainly indicate major shortcomings in the questionnaire design, CAPI App, field training, and field-level data quality assurance.