We outline below our preliminary ideas and plans for analysing the various datasets that will be collected by the proposed impact evaluation study as per study design. We first present how we will analyse the various indicators on which data has been collected for. Then we describe how we will aggregate, summarise, present and compare these indicators so as to be able to show any measure of change that may be attributed to the FBPM programme in Sudan.

##### Indicator sets

###### GAM prevalence

GAM prevalence will be measured at each study area or cluster. GAM prevalence will be reported for both under 5 children and for pregnant and lactating women (PLW). GAM prevalence will be estimated using a ** PROBIT** estimator using the observed mean and standard deviation of the collected MUAC data at every data collection round of the stepped wedge study

^{1}.

The ** PROBIT** function is also known as the inverse cumulative distribution function or the quantile function. This function converts parameters of the distribution of an indicator (e.g., the mean and standard deviation of a normally distributed variable) into cumulative percentiles. This means that the

**function can be applied to the mean and standard deviation of the MUAC data to estimate the proportion of under-five children or PLW falling below a given MUAC threshold.**

*PROBIT*For example, for data with a mean MUAC of 142 mm and a standard deviation of 14.5 mm, the output of the ** PROBIT** function for a threshold of 125 mm (GAM cut-off) is 0.1205 which means that 12.05% of children are predicted to fall below the 125 mm threshold. This is the estimate of prevalence of GAM for children under-five years.

The same approach will be true for MUAC data from PLW. For example, for data with a mean MUAC of 256 mm and a standard deviation of 28 mm the output of the ** PROBIT** function for a threshold of 210 mm is 0.0502 meaning that 5.02% of the PLW are predicted (or estimated) to fall below the 210 mm threshold.

Both the classic and the ** PROBIT** methods can be thought of as estimating area as shown in Figure 1 below.

**Figure 1:** Comparison between classic and PROBIT estimators

The principal advantage of the ** PROBIT** approach is that the required sample size is usually smaller than that required to estimate prevalence with a given precision using the classic method

^{2}. The

**method assumes that MUAC is a normally distributed variable. If this is not the case then the distribution of MUAC is transformed towards normality. The prevalence of SAM is estimated in a similar way to GAM. The prevalence of MAM is estimated as the difference between the GAM and SAM prevalence estimates as shown below.**

*PROBIT*$$\text{MAM prevalence} {}={} \text{GAM prevalence} {}-{} \text{SAM prevalence}$$

###### GAM incidence

Given the changes in the study design^{3}, GAM incidence will not be estimated classically. Instead, we will proxy GAM incidence using a time-to-event analysis which in this case would be time-to-undernutrition metric. A Kaplain-Meier survival curve analysis will be applied on the data collected for the incidence sub-study for purposes of reporting on indicators on proportion of children becoming acutely undernourished at monthly intervals and an average number of months before a child becomes acutely undernourished.

###### Programme coverage

Various programme coverage indicators will be measured as a nested survey in the GAM prevalence surveys. Eligibility to each of the various MAM treatment and prevention packages will be determined and then various coverage estimators will be assessed. Given that there are multiple intervention components of the MAM treatment and prevention packages, various coverage estimators will be used. Specifically, we will assess the following coverage indicators:

- MAM case-finding effectiveness for children – this is defined as children 6-59 months who are current MAM cases
^{4}in TSFP out of the total number of children 6-59 months who are current MAM cases.

- MAM treatment coverage for children – this is defined as the children 6-59 months who are current or recovering MAM cases
^{5}in TSFP out of the total children 6-59 months who are current and recovering MAM cases.

- MAM case-finding effectiveness for PLW – this is defined as PLW who are current MAM cases
^{6}in TSFP out of the total PLW who are current MAM cases.

- MAM treatment coverage for PLW – this is defined as PLW who are current and recovering MAM cases in TSFP out of the total PLW who are current and recovering MAM cases.

- Targeted MAM prevention coverage for children – this is defined as children 6-23 months old who are at risk
^{7}in targeted FBPM programme out of all children 6-23 months old who are at risk.

- Targeted MAM prevention coverage for PLW – this is defined as PLW who are at risk
^{7}in targeted FBPM programme out of all PLW who are at risk.

- Blanket MAM prevention coverage for children – this is defined as children 6-23 months old in blanket FBPM out of all children 6-23 months old.

- Blanket MAM prevention coverage for PLW – this is defined as PLW in blanket FBPM out of all PLW.

- Home fortification coverage – this is defined as children 6-59 months old not eligible for TSFP or FBPM
^{9}receiving home fortification out of all children 6-59 months old not eligible for TSFP or FBPM.

- SBCC coverage – this is defined as mothers and/or caregivers of children 6-59 months old and PLW who have received or participated in at least 1 appropriate education session and/or individual counselling session in the past month out of the total of mothers and/or caregivers of children 6-59 months old and PLW.

- Mothers groups coverage – this is defined as mothers and/or caregivers of children 6-59 months old and PLW enrolled in mothers clubs out of the total of mothers and/or caregivers of children 6-59 months old and PLW.

- Care groups coverage – this is defined as mothers and/or caregivers of children 6-59 months old and PLW enrolled in care groups out of the total of mothers and/or caregivers of children 6-59 months old and PLW.

All the abovementioned coverage indicators will be calculated in each of the study areas or clusters at each of the data collection rounds. We estimate that at the very least we will be able to have enough sample size for each coverage indicator to classify coverage ($ n = 40 $) or at best estimate coverage ($ n = 96 $).

###### Cost-effectiveness

We will use activity-based costing^{10} with relevant costs, for both provider and participant, grouped by activity and organised by cost centres for analysis and calculation of the total costs for implementing the MAM treatment programme in Kassala and the total costs for implementing both the MAM treatment programme and the MAM prevention programme in the state.

- Provider costs
- Participant costs
- Allocation to cost centres
- Cost per at-risk child becoming no-risk
- Cost per DALY averted

Costs incurred by the service provider (i.e., WFP and implementing partner) will be collected using semi-structured key informant interviews with relevant programme and administrative staff at WFP and at the relevant implementing partners. Relevant cost data that will be gathered from the provider are 1) personnel costs; 2) programme supplies; and, 3) programme delivery.

For personnel, we will collect data on salary information for WFP and implementing partner staff involved in the implementation of the MAM treatment programme and the MAM prevention programme. Data on time spent by staff in implementing both programmes will also be collected. Costs for non-salaried personnel (whether or not incentivised) such as community health workers or community mobilisers will also be collected. For non-incentivised personnel, a shadow wage rate^{11} will either be estimated based on current labour markets in Kassala or from previous studies that have estimated this rate^{12}.

For supplies, costs of all supplies and materials which include the feeding product used for both treatment and prevention of MAM will be collected from programme budgets and programme staff.

For programme delivery, transport costs, trainings, rent and utilities will be collected through programme budgets and other related documentation and from programme staff.

Direct costs of participation by beneficiaries, which include transport costs to access the treatment and the prevention programme and time-to-travel information and indirect costs such as opportunity cost incurred by family and/or caregiver will be collected through the cross-sectional surveys for the stepped wedge study. We will perform spatial interpolation using the time-to-travel data and geo-location data collected by the study together with publicly available geographic data on elevation, roads, land use and water bodies in Kassala to create a raster-based cost surface at a resolution of at least 10 sq km. Cost will be measured in terms of time it takes to travel from a specific location on the raster map to the nearest health facility or distribution site. Then using the collected data on average daily wages in Kassala, the time-to-travel metric will be converted into opportunity cost thereby creating an opportunity cost surface.

Cost centres will be developed and finalised in collaboration with WFP and relevant implementing partners. The estimated costs described above will then be categorised under the relevant cost centres.

Cost-effectiveness will be calculated using the programme outcomes data described in the outcome measures section of this document. The metrics for cost-effectiveness that will be calculated are:

This metric will assess the cost-effectiveness of MAM treatment and MAM prevention together in converting at-risk children into no-risk children

This metric will assess the cost-effectiveness of MAM treatment and MAM prevention together in averting DALYs

To calculate DALY, we use the following formula as specified by Murray^{13}:

$$ \int_{x = a}^{x = a + L} DCxe^{-\beta x} e^{-r(x – a)} dx $$

where:

$ D = \text {disability weight e.g. premature death is 1, wasting is 0.127} $ ^{14}

$ r = \text {discount rate;} 0.03 $

$ C = \text {age-weighting correction constant;} 0.04 $

$ \beta = \text {parameter from the age-weighting function} $

$ a = \text {age of onset} $ ^{15}

$ L = \text {duration of disability or time lost due to premature mortality} $

###### Knowledge, attitude and practices

This will be measured using specifically designed set of questions to be asked of mothers / caregivers of children 6-59 months old and PLW on topics covered by the SBCC component of the programme (i.e., healthy pregnancy, child health and healthcare, breastfeeding, complimentary feeding, dietary diversity, food supplementation, use of micronutrient powder (MNP) and WASH. Wherever possible, standard question sets that have been developed and tested for KAP assessment will be used. Following are some of the standard question sets that we will consider using for this purpose:

- Healthy pregnancy – assess women of reproductive age (15-49 years old) of their knowledge of pregnancy danger signs. There are 10 pregnancy danger signs. Women of reproductive age (15-49 years old) will be asked to identify pregnancy danger signs that they know of. The number of danger signs they’ve identified correctly are recorded and the mean number of pregnancy danger signs can be used as the summary indicator / measure.

$$ \text {mean number of danger signs identified} = \frac {\sum_{i = 0}^{9} ds_{i + 1}}{10} $$

where $ ds = \text {danger signs} $.

- Child health and healthcare – mother / caregiver of children 6-59 months will be asked whether their children have had and illness in the past 2 weeks to assess morbidity. For those who report as having had an illness, a series of questions on what the mother did in response to the illness will be asked to assess whether or not appropriate healthcare / treatment-seeking behaviour was exhibited by the mother/caregiver.

- IYCF – standard IYCF question set will be used to assess breastfeeding, complementary feeding and diet diversity adapted from standard guidelines
^{16}. The indicators were adapted for simplicity and rapidity, as well as to the small sample size (i.e. compared to MICS, DHS, etc.) that will allow analysis at a local level (see next section on local and wide area level).The approach used is to produce a single indicator which defines good infant and young child feeding practices as either:

- Exclusive breastfeeding in children aged under six months.

- Age-appropriate feeding practices (defined in terms of continued breastfeeding, dietary diversity, and meal frequency) in older children.

Age-appropriate feeding practice is measured using an infant and child feeding index (ICFI) similar to that used in the 2000 DHS survey of Ethiopia and further developed by IFPRI and FANTA as a KPC2000+ indicator:

Age group (months) 6 – 8 9 – 11 12 – 36 36 – 60 Value Score Value Score Value Score Value Score Breastfed Yes +2 Yes +2 Yes +1 Yes +0 Food groups 1 +1 1 or 2 +1 2 or 3 +1 3 or 4 +2 ≥2 +2 ≥3 +2 ≥4 +2 ≥5 +3 Meal frequency 1 +1 1 or 2 +1 2 +1 2 +1 ≥2 +2 ≥3 +2 3 +2 3 +2 ≥4 +3 ≥4 +3

The ICFI score is a measure of appropriate child feeding practices:

$$ ICFI = \text {Breastfeeding} + \text {Dietary Diversity} + \text {Meal Frequency} $$

using age-specific weighting for each item. Children receive a score between zero and six. Children receiving a score of six are classified as receiving good infant and young child feeding. The ICFI can be extended to include older children if required. The shaded areas in Table 1 represents this extension to the standard ICFI score to include children aged between 36 and 59 months. - Exclusive breastfeeding in children aged under six months.
- Women’s dietary diversity – standard diet diversity questionnaire
^{17}will be used to assess the dietary diversity of mothers with children 6-59 months old and PLW. Women’s dietary diversity score (WDDS) is assessed for mothers of the children sampled for IYCF. WDDS indicator assesses the quality of the women’s diet and gives an indication of the micronutrient adequacy of the women’s diet. There is evidence to suggest that WDDS reflects household access to food. WDDS can be used to identify nutritionally at-risk women.There are 6 indicators that can be reported from the data collected on women’s dietary diversity.

- Women’s dietary diversity score (WDDS) is calculated based on the 10 food groups (see Table 2) that have been determined to be relevant and important to women. The potential score range for the WDDS is from zero to ten based on the number of food groups consumed by women out of the 10 food groups.

**Table 2:**Food groups relevant for womenFG1 Starchy staples FG2 Dark green leafy vegetables FG3 Other vitamin A-rich fruits and vegetables FG4 Other fruits and vegetables FG5 Organ meat FG6 Meat and fish FG7 Eggs FG8 Legumes FG9 Nuts and seeds FG10 Milk and milk products - Mean WDDS is calculated as:

$$ \text {Mean WDDS} = \frac {\sum {WDDS}}{\text {Total number of women assessed}} $$

Validation studies of the WDDS done in 5 countries showed a mean WDDS of 4.7 with a standard deviation of 1.1 and a range of between 2 to 9.

- Consumption of vitamin A-rich foods is calculated based on which women consumed vitamin A-rich foods in the past 24 hours. This indicator identifies women who are at-risk of vitamin A deficiency.

- Consumption of iron-rich foods is calculated based on which women consumed iron-rich foods in the past 24 hours. This indicator identifies women who are at-risk of iron deficiency.

- Women’s dietary diversity score (WDDS) is calculated based on the 10 food groups (see Table 2) that have been determined to be relevant and important to women. The potential score range for the WDDS is from zero to ten based on the number of food groups consumed by women out of the 10 food groups.
- Food supplementation and MNP – we will build upon a set of questions we’ve developed and used for assessment of coverage of, knowledge and practices on the use of complementary food supplements in Eastern Ghana
^{18}.

- WASH – we will use some components of the standard WASH indicator set
^{19}that focus on WASH-related behaviours such as safe disposal of child’s faeces, water treatment practices and hand washing practices and other variations made in relation to WASH behaviours^{20}.

##### Levels of analysis

###### Local analysis

At each data collection round of the main stepped wedge study (one data collection round per step, 4 steps in total), we will report on all indicator sets (see section A) at a local area level (i.e., sub-locality or sub-study area / cluster) based on the hexagonal areas defined by the stage 1 spatial sample. At this level of aggregation, we believe we will obtain enough sample size per hexagonal area to classify majority of the indicators. The classification results will be mapped per hexagonal grid area overlaid onto the map of the study areas or clusters (see example map in Figure 2).

**Figure 2:** Example indicator map for local hexagonal area analysis for Rural Kassala

In addition to this level of local area analysis, we will also perform spatial interpolation using inverse distance weighting (IDW) to estimate each of the indicators at a much higher spatial resolution (at least 10 sq km up to 1 sq km). Spatial interpolation can be described as a process of smoothing data over space to create a surface map. There are various approaches and methods of spatial interpolation, the main differences are determined by the weights applied to the point dataset to estimate values at each of the unknown points of the surface map. As the name implies, the spatial interpolation using IDW method uses weights that are inversely proportional to the distance of a point being estimated from the sampling point locations:

$$ \hat {\upsilon} = \frac {\sum_{i = 1} ^ {n} \frac {1}{d_{i} ^ {p}} \upsilon_{i}}{\sum_{i=1} ^ {n} \frac {1}{d_{i} ^ {p}}} $$

where $ d_1, \ldots , d_n $ are the distances from each of the $ n $ sampling point locations to the point being estimated, $ p $ is the power of the distance and $ \upsilon_1 , \ldots , \upsilon_n $ are the sample values^{21}. The power of the distance $ p $ is an important aspect of the IDW method for point estimation. The influence of $ p $ to the weights applied to the point estimation is such that as $ p $ approaches 0, the weights become more similar, thereby giving more weight to the nearest sample values. As $ p $ approaches $ \infty $, the weights become more different from each other, thereby giving more weight to the closest sample. The power of the distance $ p $ has been traditionally set at 2 for convenience and ease of calculations. In theory, given a set $ p $, IDW calculations can be performed using manual calculations aided by a spreadsheet and / or a calculator as it requires fewer calculations. For the spatial interpolation that we will perform, we will set $ p $ initially at 2 and then perform a cross-validation technique to optimise $ p $ to a value that minimises the estimation errors at each of the sampling point locations.

Cross-validation is a technique applied to validate the predictive models. It assesses how accurately the predictive model performs in practice. IDW is one of the simplest model-based interpolation methods available, but ideally would still require a form of cross-validation to determine the optimal value of the distance power $ p $. We will perform a two-fold cross validation^{22} in which we randomly split the data points into two sets of equal size, with one set assigned as the validation data for testing the model, and the other set as the training data. The validation data will then be interpolated using the IDW method with an initial $ p $ of 2 and the resulting predictions were compared with the training data. Comparison will be made using the sum of the squared residuals between the predicted values and the observed values to report errors. Optimisation will then be performed by replicating the two-fold cross validation process 100 times using randomly generated values for $ p $. Out of these replicates, the value of $ p $ that provided prediction results with the minimum errors will be selected as the distance power for the eventual interpolation performed^{23}.

Results of the spatial interpolation will then be mapped to show the spatial variation in the indicators within study areas / clusters and between study areas / clusters. An example interpolation map for one of the study clusters can be seen in Figure 3.

**Figure 3:** Example interpolation map for local area analysis of Rural Kassala

###### Cluster-level analysis

At each data collection round of the main stepped wedge study (one data collection round per step, 4 steps in total), we will report on all indicator sets (see section on indicators) for each of the study areas / clusters. We estimate that across most of the indicator sets, we will have enough sample size per study area / cluster to estimate with good precision. In addition, we will use a non-parametric computational technique to estimate the various indicators at the study cluster / area level.

The sample collected for the study is complex in the sense that it is an unweighted cluster sample. The data analysis procedures to aggregate results to produce an overall result need to account for the sample design. Model-based / parametric procedures are available to do this. We will, however, use resampling (i.e. bootstrap) techniques. Resampling techniques are proposed because they make no strong assumptions about the sampling distributions of indicators and allow a broader range of statistics than model-based methods.

We will use a blocked weighted bootstrap (BWB). Blocked such that the block corresponds to the primary sampling unit (PSU) which in this case are the sampling villages. Only the PSUs are resampled. Observations within the PSUs remain unchanged. Weighted such that the sample is posterior weighting (given that population proportional sampling is not used in selecting the PSUs) procedure is required. We will use the roulette wheel algorithm^{24} (see Figure 4) to weight (i.e. by population) the selection probability of PSUs in bootstrap replicates.

A total of $ m $ PSUs are sampled with replacement for each bootstrap replicate (where $ m $ is the number of PSUs in the survey sample). A large number (e.g. 1999) replicates are taken.

The required statistic is applied to each replicate. The reported estimate consists of the 0.025th (95% LCL), 0.5th (point estimate), and 0.975th (95% UCL) quantiles of the distribution of the statistic across all replicates. This technique can be easily extended to provide hypothesis testing should this be required.

**Figure 4:** Graphical representation of the roulette wheel algorithm

##### Comparison and hypothesis testing analysis

The next level of analysis is determining the change (if any) that can be attributed to the FBPM programme. This analysis will be performed for both the stepped wedge data and the incidence data using different techniques.

- Stepped wedge study
- Incidence sub-study

For the stepped wedge study, we are proposing to approach the analysis using a similar analytical approach to that of the THRio study on the effectiveness of tuberculosis preventive therapy intervention^{25}. We will analyse the main outcome measure (GAM prevalence) at any given month among the study areas or clusters and then combine the results over the period of the study and account for within-locality correlation. This approach is meant to adjust the data for secular trend. We will condition the analysis on each month of the study and compare the GAM prevalence in the study clusters or areas that receive the intervention to those who have not yet received the intervention. A mathematical approach similar to that of the Cox proportional hazards regression model will be used.

It is possible that at the start of each step, full effect of the intervention may not be fully detectable given that the first couple of weeks might be used for preparatory phases and to get supplies and the mobilisation activities going. We will deal with this in a similar way to the THRio study by adding an intervention co-variate that is non-binary (0 for no intervention and 1 for intervention). Instead, depending on when the intervention has really gone in full swing, we will use a fraction based on how far along into the start of implementation period of 8 weeks the actual intervention has started. So for example, in the first three intervention areas for this study, we have just learned that actual intervention will only start fully by the 15th of May which is 2 weeks into the implementation period. Instead of giving these areas an intervention co-variate of 1 (for full intervention), we will give it a fraction of 6 out of 8 as intervention happened 2 weeks into the 8 week implementation period.

We will apply the Kaplan-Meier survival analysis method on the incidence data. A survival model will be applied to both the data from the intervention group and the control group and survival curves will be plotted for each. Corresponding descriptive statistics for each arm can then be reported. Then, the survival curves for intervention and control groups can be compared and statistically tested whether they are significantly different from each other using the log-rank test or the Mantel-Cox test which is a non-parametric test used when the data being analysed is right-skewed and censored. Any significant difference between the time it takes for a child in the intervention group to become acutely undernourished as compared to those in the control group will then be detected.

###### Endnotes

^{1}

^{2}

^{3}

^{4}

^{5}

^{}

^{7}

^{8}

^{9}

^{10}

^{11}

^{12}

^{13}

^{14}

^{15}

^{16}

^{17}

^{18}

^{19}

^{20}

^{21}

^{22}

^{23}

^{24}