Data collection, management and analysis

Data will ideally be collected using an electronic data entry system based on the open data kit (ODK) standard that runs on the Android® operating software (OS) platform for mobile devices. The study instrument will be encoded into the electronic data entry system platform and will be served out of a local computer server. Each study team will be provided with mobile devices running on Android® OS that have been configured with an application that will receive the electronic data form. All measurements and answers by respondents will then be recorded on the mobile devices and will then be transmitted to the local server whenever there is a mobile phone and / or WiFi signal.

Appropriate data check mechanisms will be put in place using the data check systems available with the ODK system. Spot checks by study supervisor/s will be done in the field to ensure that enumerators are performing the measurements correctly, administering the study instrument correctly and entering all data accordingly. Further checks will be performed by the survey manager on the data as needed as they are received by the local server.

Daily data backup will be performed and backups will be stored in 3 different devices. All data and their corresponding backups will not be sent to a remote or cloud server. This is to protect privacy and confidentiality of the data.

If the electronic data collection system proposed is unfeasible (either due to cost or for other logistical reasons), we will default back to a data collection system using paper forms which will then be entered onto a data entry system built on EpiData1.

Data analysis will be performed on final data that have been cleaned and cleared. Data analysis will be done using R language for statistical computing2 using bespoke analytic scripts put together into a workflow under the RAnalyticFlow integrated development environment (IDE)3. The workflow system will allow the external reviewers to see the algorithms that has performed the analysis of data and to replicate the results themselves as a form of verification and validation. The workflow prepared for this study will be designed such that an automatic reporting format is produced including maps, figures and tables of results.


1 Lauritsen, J M. “EpiData Data Entry, Data Management and Basic Statistical Analysis System,” Odense, Denmark: EpiData Association, n.d. [Link].

2 R Core Team. R: a Language and Environment for Statistical Computing, Vienna, Austria: R Foundation for Statistical Computing, 2014. [Link].

3 Suzuki, Ryota, and Tatsuhiro Nagai. “R AnalyticFlow 3: an Environment for Data Analysis with R,” March 28, 2014, 1–1.