This tutorial provides step-by-step instructions on how to run an analysis workflow written in R AnalyticFlow. Specifically, this tutorial will show how to run the analysis workflow for the 3ie Impact Evaluation of Sudan’s Food-based Prevention of Moderate Acute Malnutrition (FBPM) programme.

R AnalyticFlow is a software that enables simple to complex data analysis through the drawing of analysis flowcharts. The key advantage of the use of analysis flowcharts is the effective sharing of the data analysis processes in multi-user or multi-developer or team collaboration contexts. R AnalyticFlow is one of a number of integrated / interactive development environment (IDE) for R. An IDE is a software application that provides comprehensive facilities to computer programmers for software development.

The R AnalyticFlow software is developed and made available without charge for any purpose by Ef-prime, Inc.

This tutorial assumes that R is already installed on your computer (see tutorial on Installing R) and that you are familiar with what R is (see Introduction to R) and that R AnalyticFlow is also already installed on your computer (see tutorial on Installing R AnalyticFlow). This tutorial also assumes that you have downloaded the 3ie workflow. If not, download it here. This tutorial also assumes that you have access and permission to use the data from this 3ie impact evaluation. To seek access and permission, contact Valid Measures.
 

1. Launching R AnalyticFlow

Launch R AnalyticFlow by double-clicking on the shortcut to R AnalyticFlow or through the Programs in the Start Menu.
 


 

2. Select the 3ie bookmark

The Projects dialog box will then show up. If you have followed the other tutorial on opening the 3ie workflow in R AnalyticFlow, you should now have a bookmark for the 3ie analysis project (1). Select and click on this bookmark to open the 3ie workflow.

If you don’t have this bookmark, follow the tutorial on opening an R AnalyticFlow file.
 


 

The 3ie worfklow named steppedWedge should now open and you should be able to see the various components of the workflow in the workflow generator compartment of the R AnalyticFlow screen (2).

The version that you should have is version 0.20 dated 21 November 2016. This is the latest version of the 3ie analysis workflow.

To start using the workflow, click on the star icon labeled Step 0 (3).

Note: The workflow has been annotated such that the user will be guided as to what the specific component of the workflow is doing. The best way to find out what a specific analysis step is performing is to select the star icon of the specific step and to read the annotation provided in the view section of the IDE (4).
 


 

3. Step 0: Install packages

Once you select the star icon labeled Step 0 (5), you will see the explanation of this step in the viewer below (6).

Step 0 installs the R packages / libraries reqiured by the workflow to perform the analysis. This step has been labeled ‘Step 0’ because this is more of a preparatory step and needs to be done only once per computer performing the analysis. Once the required packages have been installed into R on the computer currently being used, then there is no need to run this step in the future.

It should be noted that in order for packages to be installed using this step, the computer being used will have to be online. This is because this step will download the necessary packages from the R package repository online to be able to install it. If no internet is available, this step will most likely show errors. The alternative will be to download the packages separately and then install the pacakges from the files that have been downloaded.

For a guide on how to install R packages, click here.
 


 

4. Step 1: Setup

This step specified R properties, libraries and dependencies required by this analysis, bespoke functions written specifically for this dataset and various utilities and specifications that will be called upon in the latter sections of this workflow.

To run this node, select the star icon for Step 1 (7) and then press ‘Run’ (8) or type CTRL-SHIFT-R
 


 

You know that the script is running, you will see lines of code appearing in the R console section of the R AnalyticFlow IDE (9).
 


 

Step 1 also creates additional folders within the working directory (10). These folders are created to organise the various outputs that the analysis workflow will produce. These additional folders are:

  • checks – this folder is where CSV files of data checks performed are saved. These data checks are performed in Step 1a (below) during data cleaning stage. These data checks pertain specifically to the tabulation of the various ‘others’ responses in the questions pertaining to coverage;
  • codebook – this folder will contain the codebook produced by the script (in CSV format) created during data cleaning stage (Step 1a). This codebook is created for the purpose of documenting the various variables collected for this study for future reference for anyone who wants to use the data.
  • data – this folder will contain the cleaned data produced during the data cleaning stage (Step 1a). These data are then to be used in the subsequent stages of the data analysis.
  • grids – this folder will will contain the interpolation grid that will be used in the spatial interpolation steps (Step 5 and Step 10). These files are in ESRI Shapefile format.
  • report – this folder will contain the output of the reporting stage (Step 12). These reporting outputs are in HTML format and organises all the results in a way that is easy navigable/viewable.
  • results – this folder will contain the results of the various analysis for the various indicators organised in various sub-folders. The main types of outputs are: 1) tables in CSV format containing the estimates for each of the indicators; 2) charts in PNG format which visualise certain indicators using appropriate graphics; and, 3) maps in PNG format which visualise the spatial distribution of the indicators.

 


 

5. Step 1a: Clean data

This step is an interim step or sub-step. Raw data for each round is cleaned and processed and then saved clean data in the folder named ‘clean’ in the working directory. Codebooks are also produced (based on round 1 data only) for dataset documentation pruposes and as a reference. Data cleaning need only be done once for each round as clean data is used for subsequent steps.

It can be noted that unlike the previous step (and the steps after this one), the nodes for clean data step is not connected to each other. This was done so that the user can choose to select which data cleaning node to run. It should also be noted that when running a data cleaning node, the user will be asked to provide or point to the raw data that will be cleaned. There are three sets of raw data per round of data collection.

  1. Administrative data – this data is named  adminDataRound1.csv (for round 1 data; other rounds will have the number different accordingly);
  2. Mother data – this data is named motherDataRound1.csv (for round 1 data; other rounds will have the number different accordingly).
  3. Children data – this data is named childDataRound1.csv (for round 1 data; other rounds will have the number different accordingly);

When a data cleaning node is run, it will first ask for the administrative data, and then the mother data and then the child data. Make sure that you provide the data to the analysis workflow in this sequence and for the appropriate round. If not, then errors will most likely occur which will stop the analysis workflow.

To run this step, select the data cleaning node for the data round that you want to clean. Then click on ‘Run’ (11) or type CTRL-SHIFT-R.

The cleaned data will be saved in CSV format inside the ‘data’ folder created inside the working directory (12).
 


 

6. Step 2: Recode child data

This step recodes corresponding data to allow for calculations of the various child indicators.

This step will first ask the user to provide the corresponding data which in this case will be the cleaned-up child data for whichever round of the study that you want to analyse. A prompt will again come out as with the previous step and the user should point to the cleaned-up child data for the respective round of the study inside the ‘data’ folder in the working directory (13) where the cleaned-up data has been saved.
 


 

7. Step 3: Estimate child indicators

This step estimates the various child indicators using bootstrapping techniques.

This step is the longest step in the whole analysis workflow because the bootstrapping technique used resamples the data 1999 times.

To run this step, select the star icon for Step 3 and then click on ‘Run’ (14) or type CTRL-SHIFT-R.

This step produces tabular results of estimates of each indicator with confidence limits. These results are formatted in CSV and are saved in the ‘results’ folder (15) under the sub-folder ‘table’.
 


 

8. Step 4: Create charts

This step draws relevant charts for specific indicator sets that will benefit from such visualisation.

To run this step, select the star icon for Step 4 and then click on ‘Run’ (16) or type CTRL-SHIFT-R.

The results produced by this step are saved in the ‘results’ folder (17) under the sub-folder ‘charts’.
 


 

9. Step 5: Spatial interpolation

This step performs spatial interpolation on the child indicator sets. This step allows for mapping of the indicator sets at a resolution of 1km x 1km.

To run this step, select the star icon for Step 5 and then click on ‘Run’ (18) or type CTRL-SHIFT-R.

This step produces ESRI Shapefile format files. The results produced by this step are saved in the ‘results’ folder (19) under the sub-folder ‘grids’.
 


 

10. Step 6: Map child indicators

This step maps the various child indicator sets based on the interpolated dataset from Step 5.

To run this step, select the star icon for Step 6 and then click on ‘Run’ (20) or type CTRL-SHIFT-R.

The results produced by this step are saved in the ‘results’ folder (21) under the sub-folder ‘maps’.
 


 

11. Step 7: Recode mother data

This step recodes corresponding data to allow for calculation of the various mother indicators.

This step will first ask the user to provide the corresponding data which in this case will be the cleaned-up mother data for whichever round of the study that you want to analyse. A prompt will again come out as with the previous step and the user should point to the cleaned-up mother data for the respective round of the study inside the ‘data’ folder in the working directory (22) where the cleaned-up data has been saved.
 


 

12. Step 8: Estimate mother indicators

This step estimates the various mother indicators using bootstrapping techniques.

This step is the longest step in the whole analysis workflow because the bootstrapping technique used resamples the data 1999 times.

To run this step, select the star icon for Step 8 and then click on ‘Run’ (23) or type CTRL-SHIFT-R.

This step produces tabular results of estimates of each indicator with confidence limits. These results are formatted in CSV and are saved in the ‘results’ folder (24) under the sub-folder ‘table’.
 


 

13. Step 9: Create charts – mother data

This step draws relevant charts for specific indicator sets that will benefit from such visualisation.

To run this step, select the star icon for Step 9 and then click on ‘Run’ (16) or type CTRL-SHIFT-R.

The results produced by this step are saved in the ‘results’ folder (17) under the sub-folder ‘charts’.
 


 

14. Step 10: Spatial interpolation – mother data

This step performs spatial interpolation on the mother indicator sets. This step allows for mapping of the indicator sets at a resolution of 1km x 1km.

To run this step, select the star icon for Step 10 and then click on ‘Run’ (27) or type CTRL-SHIFT-R.

This step produces ESRI Shapefile format files. The results produced by this step are saved in the ‘results’ folder (28) under the sub-folder ‘grids’.
 


 

15. Step 11: Map mother indicators

This step maps the various mother indicator sets based on the interpolation dataset from Step 10.

To run this step, select the star icon for Step 11 and then click on ‘Run’ (29) or type CTRL-SHIFT-R.

The results produced by this step are saved in the ‘results’ folder (30) under the sub-folder ‘maps’.
 


 

16. Step 12: Reporting

This step produces HTML format reporting files that organises the results to allow for easy browsing. These files are saved in the ‘report’ folder (31).

To run this step, select the star icon for Step 12 and then click on ‘Run’ (29) or type CTRL-SHIFT-R.
 

 
 

Tagged on: