How to ask the Right Evaluation Questions? OECD-DAC criteria helps!
December 23, 2020Questionnaire Design: Best Practices for High-Quality Measurements
January 20, 2021What determines the Sample Size for Cross-Sectional Surveys?
Sampling – an art guided by statistical theory(ies) is an understatement. Naturally, this part of the research design is usually entrusted with experts. However, those who commission and use survey research ought to stick to the basics so they can make informed decisions. We hope this blog explains in practical terms what determines the sample size for a descriptive cross-sectional survey. We are focusing on sampling to measure a proportion or prevalence which are more common variables of interest in social surveys than continuous numbers. However, the concepts remain the same.
How the Sample Size is determined

Many of our readers might have come across a magical sample size number of 384. With a sample of 384, we can measure a true population mean of 50% (which ironically we don’t know without a census) with a 5%-point error on either side (error range: 45% to 55%) with 95% confidence. Above is the most common and standard assumption many use (sometimes mechanically) and estimate a sample size of 384 as denoted by a small blue stack in the middle panel of in the left-hand figure.

We pay a premium in terms of a larger sample size if we want a higher confidence. For example, if I needed 99% confidence that the true mean is between 45% and 55%, I will need a sample of 664. To get tighter or more precise estimates of the population mean, we pay in terms of larger sample size more dearly. For example, to measure a population mean of 50% with ±1 % error (49% to 51%) with 95% confidence, we would need a sample of 6,764.
The required sample size is largest while measuring a proportion of 50% for given precision/error and confidence levels as shown in the right-hand figure. Often prior to the survey, we do not know what the proportion of an outcome or an indicator will be in the population. So, it is safest to assume 50% prevalence or proportion.
Sampling Approaches
The sample size we estimate is for simple random sampling method which is practically never used or usable. Imagine having to sample 687 people out of 1.3 billion across India! We, instead, sample using a multi-stage clustered sample where, for example, we first sample a few districts, then a few villages, and then a few households or people. In clustered sampling, we multiply the estimated sample size by a “design effect” which depends on something called intra-cluster correlation (ICC); these concepts are a bit too technical for these blogs. Design effect typically varies from, say, 1.2 to 5 for most descriptive social-sector surveys, but can be larger is sample design is not done ‘cleverly’ (this is where true expertise in sampling comes).
Another sampling approach is stratified sampling where we seek to survey different sub-populations or strata. It is best to think of each stratum as a different study with its particular proportion, error rate, and confidence level needs. If all sampling assumptions are the same for two strata, then we need twice the sample needed. Our advice here is that carefully think whether you need analysis stratified by sub-population (by sex, by districts, by income group, etc.), and what level of precision and confidence you truly need.
Key Point about Sample Size
Finally, please remember that the SAMPLE SIZE DOES NOT DEPEND ON THE POPULATION SIZE (well… in most cases). When someone advises you to take a 10% or 5% sample of the target population, it is basically a guess (perhaps an educated or experienced one). A finite-population-correction is applied when the target population is less than, say, 15000. This avoids situations where the required sample size is 5,000 and the total target population, is, say 6000 (we would just do a census instead of a sample survey, wouldn’t we?).
NEERMAN’s free web-resources has an Excel-based Sample Size Calculator to help you estimate sample size based on basic assumptions. We keep adding more tools and knowledge resources to our website so I invite you to stay engaged with us for more.
