###### Stepped wedge study design

We used the sample size calculations for a stepped wedge design proposed by Woertman et al1. Their calculations use the unadjusted sample size ($n_{unadjusted}$) required for an individual randomised controlled trial for comparing two proportions and then adjusts this calculation based on the cluster design effect ($DEFF$) to be expected in a stepped wedge design. Their equation for estimating the design effect of a stepped wedge study is as follows:

$$DEFF_{\text {stepped wedge}} = \frac {1 + p(ktn + bn – 1)}{1 + p \left(\frac {1}{2}ktn + bn – 1 \right )} \times \frac {3(1-p)}{2t \left (k-\frac{1}{k} \right )}$$

where

$k = \text {number of steps}$
$b = \text {number of baseline measurements}$
$t = \text {number of measurements after each step}$
$p = \text {intra-cluster correlation coefficient (ICC)}$
$n = \text {number of subjects per cluster}$

To estimate the unadjusted sample size ($n_{unadjusted}$) needed for the comparison of two proportions, we use the following formula2:

$$n_{\text {unadjusted}} = { \left (Z_{\frac {\alpha}{2}} + Z_{\beta} \right )}^2 \times \frac {p_1(1-p_1) + p_2(1-p_2)}{{(p_1 – p_2)}^2}$$

where

$Z_{\frac {\alpha}{2}} = \text {level of significance (1.96 for 5% significance)}$
$Z_{\beta} = \text {desired power (0.84 for 80}\% \, \text {power)}$
$p_1 = \text {proportion for control group}$
$p_2 = \text {proportion for intervention group}$

We calculated the sample size based on an expected 5% decrease in MAM prevalence (from 20% to 15%) with an 80% power to detect a difference and a 5% level of significance. These parameters provide the following unadjusted sample size:

\begin{align} n_{\text {unadjusted}} &= {(1.96 + 0.84)}^2 \times \frac {0.20(1 – 0.20) + 0.15(1 – 0.15)}{{(0.20 – 0.15)}^2} \\ \\ &= 7.84 \times \frac {0.20(0.80) + 0.15(0.85)}{{(0.05)}^2} \\ \\ &= 7.84 \times \frac {0.16 + 0.1275}{0.0025} \\ \\ &= 7.84 \times \frac {0.2875}{0.0025} \\ \\ &= 7.84 \times 115 \\ \\ &\approx 902 \end{align}

This results in an unadjusted sample size per group of 902 for a total unadjusted sample size of 1804. Using this estimated sample size, we factor in the DEFF estimator specified above using the following parameters:

In the original design which was the basis for the initial submission to 3ie, we used the following parameters to calculate the DEFF to be used to calibrate the computed unadjusted sample size above:

$k = 3 \ \text {steps}$
$b = 1 \ \text {baseline measurement}$
$t = 2 \ \text {measurements after each step}$
$p = 0.034 \ \text {intra-cluster correlation coefficient}$
$n = 192$

These parameters were based on a study design that would conduct 1 baseline measurement where all areas or clusters are at ‘baseline’ status (as defined previously) and that assumes that rollout of intervention will be staged at 4 month intervals over a one year period hence 3 steps. We planned to conduct 2 measurements at each step for each of the clusters or areas. First measurement will be done 2 months after the start of each step (i.e., turning of some clusters into intervention areas) and the second measurement will be done 2 months after the first measurement. This will allow for a measurement to be made at the start phase of the intervention when the organisation, logistics and protocols of the intervention are getting refined and institutionalised and at 2 months thereafter when the intervention has already been well-established and potentially has had an effect. We estimated the intra-cluster correlation factor to be around 0.0343 and will aim for a minimum cluster size of 192 which is the minimum sample size to estimate GAM prevalence with the required relative precision of 30%4 using a PROBIT estimator5.

Given these parameters and considerations, we arrived at the following calculations sample size calculations for the initial stepped wedge study design:

\begin{align} n_{\text {stepped wedge}} &= 1804 \times \frac {1 + 0.034(3 \times 2 \times 192 + 1 \times 192 – 1}{1 + 0.034 \left (\frac {1}{2} \times 3 \times 2 \times 192 + 1 \times 192 – 1 \right )} \times \frac {3(1-0.034)}{2 \times 2 \left (3 – \frac {1}{3} \right )} \\ \\ &= 1804 \times \frac {1 + 0.034(1343)}{1 + 0.034(767)} \times \frac {3(0.966)}{4 \left (\frac{8}{3} \right )} \\ \\ &= 1804 \times \frac {46.662}{27.078} \times \frac {2.898}{10.67} \\ \\ &= 1804 \times 1.72324396 \times 0.27160262 \\ \\ &\approx 844 \end{align}

For the original design, we estimated a total sample size requirement of 844 children 6-59 months old for the study6. Given a minimum cluster size of 192 children (as mentioned in the outcome measures section), we would need about 5 clusters. We decided to bring this up to 6 to round off the number of clusters that will switch at each of the 3 steps of the study. With 6 clusters, we can have 2 clusters switching over to being intervention areas at every step (4 month intervals). By the third step, all clusters would then become intervention areas.

However, the current time available for the study will not allow for such a study design. A full year of data collection will not be possible anymore. At best, we will only have about 9 months of data collection for the study. Given this timeframe, we reset the parameters of the study design as follows:

$k = 4 \ \text {steps}$
$b = 1 \ \text {baseline measurement}$
$t = 1 \ \text {measurements after each step}$
$p = 0.034 \ \text {intra-cluster correlation coefficient}$
$n = 192$

We now plan to rollout the programme in 4 steps at 2-month intervals with 1 measurement made at each step. This design will maintain the 2 monthly measurement intervals of the previous design but with a shorter gap between steps. This design was chosen so as to stay as close as possible to the temporal resolution of the original design and without inflating the sample size to an unreasonable level7. Using these new parameters, we arrive at the following new sample size calculations:

\begin{align} n_{\text {stepped wedge}} &= 1804 \times \frac {1 + 0.034(4 \times 1 \times 192 + 1 \times 192 – 1}{1 + 0.034 \left (\frac {1}{2} \times 4 \times 1 \times 192 + 1 \times 192 – 1 \right )} \times \frac {3(1-0.034)}{2 \times 1 \left (4 – \frac {1}{4} \right )} \\ \\ &= 1804 \times \frac {1 + 0.034(959)}{1 + 0.034(575)} \times \frac {3(0.966)}{2 \times 1 \left (\frac{15}{4} \right )} \\ \\ &= 1804 \times \frac {33.606}{20.55} \times \frac {2.898}{7.5} \\ \\ &= 1804 \times 1.635328 \times 0.3864 \\ \\ &\approx 1140 \end{align}

The new design requires a sample size of 11408 which is 296 more than the original design. However, this sample size does not increase the number of clusters needed as six clusters of 192 sample size each will give just enough overall sample size required for this new design. Given that we have inflated the number of clusters to 6 in the original design, then there is no net change in the planned number of clusters and subsequently the total number of sample size for this new design.

###### Incidence sub-study

For the incidence sub-study, we apply sample size calculations in $y_{\text {person-years}}$ proposed by Hayes and Bennet8 for an individually-randomised cluster controlled trial as follows:

$$y_{\text {person-years}} = \left (Z_{\frac {\alpha}{2}} + Z_\beta \right) ^ 2 \times \frac {\lambda_0 + \lambda_1}{(\lambda_0 – \lambda_1) ^ 2}$$

where

$\lambda_0 = \text {incidence rate in control group}$
$\lambda_1 = \text {incidence rate in intervention group}$

We use a value of $\lambda_0 = 0.32$ (assuming a prevalence rate of 20% in the control group) and a value of $\lambda_1 = 0.24$ (assuming a prevalence rate of 15% in the intervention group). This gives us a sample size for one arm of the incidence study of:

$$y_{\text {person-years}} = (1.96 + 0.84) ^ 2 \times \frac {0.32 + 0.24}{(0.32 – 0.24) ^ 2} \approx 686$$

For both arms, we would therefore need 1372 sample size. To calculate the number of clusters needed based on this sample size, we use the following formula:

$$n_{\text {clusters}} = 1 + \left (Z_{\frac {\alpha}{2}} + Z_\beta \right ) ^ 2 \times \frac {\frac {\lambda_0 \ + \ \lambda_1}{y_{\text {person-years}} \ + \ {k ^ 2}({\lambda_0} ^ 2 \ + \ {\lambda_1} ^ 2)}}{(\lambda_0 \ – \ \lambda_1) ^ 2}$$

where

$k = \text {intra-cluster correlation coefficient which we set at 0.034}$

The formula gives us:

$$n_{\text {clusters}} = 1 + (1.96 + 0.84) ^ 2 \times \frac {\frac {0.32 \ + \ 0.24}{1372 \ + \ 0.034 ^ 2 (0.32 ^ 2 \ + \ 0.24 ^ 2)}}{(0.32 \ – \ 0.24) ^ 2} \approx 2$$

So, we will need 1372 sample (686 per arm) from 2 clusters (one from each study arm).

###### Endnotes

1 See Woertman, Willem, Esther de Hoop, Mirjam Moerbeek, Sytse U Zuidema, Debby L Gerritsen, and Steven Teerenstra. “Stepped Wedge Designs Could Reduce the Required Sample Size in Cluster Randomized Trials.” Journal of Clinical Epidemiology 66, no. 7 (July 1, 2013): 752–58. doi:10.1016/j.jclinepi.2013.01.009.

2 As recommended by Hayes, R J, and S Bennett. “Simple Sample Size Calculation for Cluster-Randomized Trials.” International Journal of Epidemiology 28, no. 2 (April 1999): 319–26. doi:10.1093/ije/28.2.319.

3 Based on ICC recommendations in Kaiser, Reinhard, Bradley A Woodruff, Oleg Bilukha, Paul B Spiegel, and Peter Salama. “Using Design Effects From Previous Cluster Surveys to Guide Sample Size Calculation in Emergency Settings..” Disasters 30, no. 2 (May 31, 2006): 199–211. doi:10.1111/j.0361-3666.2006.00315.x.

4 As recommended by Prudhon, Claudine, and Paul B Spiegel. “A Review of Methodology and Analysis of Nutrition and Mortality Surveys Conducted in Humanitarian Emergencies From October 1993 to April 2004.” Emerging Themes in Epidemiology 4, no. 1 (2007): 10. doi:10.1186/1742-7622-4-10.

5 This is based on sample size simulations for a PROBIT estimator for GAM prevalence conducted by Brixton Health and Valid International (documentation available on request).

6 This will be the same number of sample needed for each of the target groups for each of the outcome measures to be assessed directly from the main study. These target groups are 1) PLW (for measurement of prevalence among PLW); 2) MAM children 6-59 months old (for measurement of MAM treatment coverage); 3) children 6-23 months (for measurement of eBSFP or blanket FBPM coverage); 4) children 6-23 months at risk (for measurement of targeted FBPM coverage)

7 Number of steps and number of measurements per step impact on sample size in a stepped wedge design.

8 This will be the same number of sample needed for each of the target groups for each of the outcome measures to be assessed directly from the main study. These target groups are 1) PLW (for measurement of prevalence among PLW); 2) MAM children 6-59 months old (for measurement of MAM treatment coverage); 3) children 6-23 months (for measurement of eBSFP or blanket FBPM coverage); 4) children 6-23 months at risk (for measurement of targeted FBPM coverage)

9 See Hayes, R J, and S Bennett. “Simple Sample Size Calculation for Cluster-Randomized Trials.” International Journal of Epidemiology 28, no. 2 (April 1999): 319–26. doi:10.1093/ije/28.2.319.