# Sample size

###### Stepped wedge study design

Given further delay in starting up the study as described in the Design section, the sample size requirements estimation for this study was again re-visited.

The previous target of starting the data collection (baseline) in April 2016 is now untenable as there are no available resources that can be used to start the baseline. The main implication of this delay is that the implementation window of the study is narrowing and the number of data collection rounds potentially dwindling which has major knock-on effects on sample size. It has been decided to drop baseline altogether and then keep the start of interventions in May 2016. We would then start with incidence data collection by the first week of May 2016 for 2 weeks and then continue with the second round of incidence data collection for June and then have the first round of stepped wedge data collection by the end of June 2016. Keeping the number of steps to 4, this would mean that the final stepped wedge data collection will be on the last 2 weeks of December 2016. This option will roughly maintain the amount of time previously allocated for data analysis and will ensure deliver out outputs to 3ie by the March 2016 deadline. However, dropping baseline has a sample size implication. It should be remembered that a baseline round has two benefits to the study. It reduces the overall study sample size requirement and per cluster sample size requirement. Also, it increases the power of the study to detect variances and differences. In general, a baseline makes the study so much stronger and better. Losing baseline would require a relevant increase in sample size to make up for the variance lost by giving up baseline. The sample size increase is reflected in the following calculations:

$ k = 4 \ \text {steps} $

$ b = 0 \ \text {baseline measurement} $

$ t = 1 \ \text {measurement after each step} $

$ p = 0.034 \ \text {intra-cluster correlation coefficient} $

$ n = 192 $

Given these parameters, we arrive at the following sample size:

$$ \begin{align}

n_{\text {stepped wedge}} &= 1804 \times \frac {1 + 0.034(4 \times 1 \times 192 + 0 \times 192 - 1}{1 + 0.034 \left (\frac {1}{2} \times 4 \times 1 \times 192 + 0 \times 192 - 1 \right )} \times \frac {3(1-0.034)}{2 \times 1 \left (4 - \frac {1}{4} \right )} \\

\\

&= 1804 \times \frac {1 + 0.034(767)}{1 + 0.034(383)} \times \frac {3(0.966)}{2 \left (\frac{15}{4} \right )} \\

\\

&= 1804 \times \frac {27.078}{14.022} \times \frac {2.898}{7.5} \\

\\

&= 1804 \times 1.93110826 \times 0.3864 \\

\\

&\approx 1346

\end{align}$$

This sample size is higher by 206 samples as compared to the original design. This sample size will require 7 clusters with a size of 192 each. It will still be possible to keep the 6 study cluster structure but we will have to get a minimum of 224 samples within each of the study clusters.

This sample size increase is not outrageously large and can be accommodated with minor adjustments in design. This option is also the least disruptive compared to the previous option. This option can be implemented without changing the study design considerably and without needing to negotiate for the deadline to be extended yet again.

###### Incidence sub-study

For the incidence sub-study, we apply sample size calculations in $ y_{\text {person-years}} $ proposed by Hayes and Bennet^{8} for an individually-randomised cluster controlled trial as follows:

$$ y_{\text {person-years}} = \left (Z_{\frac {\alpha}{2}} + Z_\beta \right) ^ 2 \times \frac {\lambda_0 + \lambda_1}{(\lambda_0 - \lambda_1) ^ 2} $$

where

$ \lambda_0 = \text {incidence rate in control group} $

$ \lambda_1 = \text {incidence rate in intervention group} $

We use a value of $ \lambda_0 = 0.32 $ (assuming a prevalence rate of 20% in the control group) and a value of $ \lambda_1 = 0.24 $ (assuming a prevalence rate of 15% in the intervention group). This gives us a sample size for one arm of the incidence study of:

$$ y_{\text {person-years}} = (1.96 + 0.84) ^ 2 \times \frac {0.32 + 0.24}{(0.32 - 0.24) ^ 2} \approx 686 $$

For both arms, we would therefore need 1372 sample size. To calculate the number of clusters needed based on this sample size, we use the following formula:

$$ n_{\text {clusters}} = 1 + \left (Z_{\frac {\alpha}{2}} + Z_\beta \right ) ^ 2 \times \frac {\frac {\lambda_0 \ + \ \lambda_1}{y_{\text {person-years}} \ + \ {k ^ 2}({\lambda_0} ^ 2 \ + \ {\lambda_1} ^ 2)}}{(\lambda_0 \ - \ \lambda_1) ^ 2} $$

where

$ k = \text {intra-cluster correlation coefficient which we set at 0.034} $

The formula gives us:

$$ n_{\text {clusters}} = 1 + (1.96 + 0.84) ^ 2 \times \frac {\frac {0.32 \ + \ 0.24}{1372 \ + \ 0.034 ^ 2 (0.32 ^ 2 \ + \ 0.24 ^ 2)}}{(0.32 \ - \ 0.24) ^ 2} \approx 2 $$

So, we will need 1372 sample (686 per arm) from 2 clusters (one from each study arm).

###### Endnotes

^{1}

^{2}

^{3}

^{4}

^{5}

^{6}

^{7}

^{8}

^{9}