Developing and field testing a remoteness indicator in Malawi

Developing and field testing a remoteness indicator in Malawi


Malawi is a country prone to humanitarian disasters, during which humanitarian actors often struggle to reach all people in need. The 510 initiative is working with the Malawi Red Cross to identify the country’s most vulnerable areas. A workshop and mapping exercise were held on data collection and sharing with different stakeholders in Malawi. Vulnerability data contributes to the Red Cross’ data preparedness initiative (see also previous blog), yet availability is very limited. A team of twenty Malawi Red Cross Society (MRCS) volunteers, in cooperation with 510, have developed and successfully tested a new method to determine vulnerabilities based on a remoteness indicator. This method can now be deployed and fine-tuned in other countries.

‘Remoteness’ as a proxy indicator for vulnerability

For the MRCS and the Red Cross movement, the ability to find the most vulnerable communities is a priority. Vulnerability is also an important indicator used in the Community Risk Assessment & Prioritization (link) developed by 510 as explained in two previous blog posts (here and here). However, crucial data on vulnerability is often missing. Whereas vulnerability is regularly measured using a set of tools called the ‘Vulnerability and Capacity Assessments’ or VCAs (link), here we develop and test a complementary method as outlined below.

As a proxy for ‘social vulnerability’, so-called “remoteness indicators” are being developed on a community level within Malawi. Remoteness of communities is identified through several parameters: (i) distance, such as to health facilities, sanitation points, schools, and public and private facilities; (ii) geographic properties, such as ruggedness of the landscape, and (iii) density figures, of the population, houses, roads and other structures. Most indicators are created with the use of OpenStreetMap data (, and can therefore be applied to more countries when proven valid. These proxy indicators can supplement areas in which the vulnerability assessments are absent or incomplete. Lack of data collaboration within Malawi is the main concern in obtaining accurate and timely data, as NGO’s and governments have different ways to collect, store and share data. This often results in data being spread out across time and space, which subsequently causes information gaps. Proxy indicators are a powerful measure to fill the gaps between these different data sets. Below is an example output of a remoteness indicator developed for access to hospitals in Malawi. Survey outcomes indicate that the analysis of travel times to hospitals was accurate for citizens travelling by car or motorbike under normal circumstances. The algorithm used is then modified to fit the average travel time including those community members travelling to hospitals by public transport, bike or foot. A more elaborate description of the indicator will follow in a separate blog post.

Composite travel times, an indicator for remoteness

The remainder of this blog will focus on how we have trained enumerators for data collection, and how we used a combination of innovative tools such as Missing Maps, Mapillary and the Portable OSM server to collect the data needed to verify our remoteness indicator.

Remote data collection

The area of interest where the remoteness indicator was field tested was first remotely mapped through the Missing Maps project. We joined in on a global effort to map Malawi. During 10 mapathons in the Netherlands a total of 500 volunteers from corporate sector, universities and the Netherlands Ready2Help network have mapped over 100.000 houses and thousands of kilometres of roads and paths by tracing satellite imagery.

Mapathon for Malawi

The outcome of the mapathon is a base map with all buildings, roads and paths (see below). Key datasets, such as the locations of hospitals, schools, and water points and sanitary facilities were collected from the Malawi Spatial Data Platform (Masdap), from Openstreetmap and from

Outcome of a mapathon – remotely created base map of Thunga, Malawi

Local data collection

Remote data collection can only give so much detail. Therefor local data collection with enumerators is needed. As part of our data preparedness mission in Malawi we provided a training on the use of digital surveying tools and a general introduction to the upcoming field work. The enumerators (surveyors using digital tooling) were trained in using OpenDataKit (ODK) and OpenMapKit (OMK): tablet- or smartphone-based applications to conduct digital surveys in the field. ODK is specifically designed to collect survey data, while OMK is used to enrich OpenStreetMap data by using the geographic locations of the buildings. This way, the survey data is linked to the geographic locations of the buildings and forms an additional source of data for the Geographic Information Systems (GIS) analysis of the future.


Training enumerators

After two weeks of data preparedness activities and training in Lilongwe and Blantyre the team moved to very remote areas to put all equipment and conducted analyses to the test. We visited villages in the Thyolo district, to the south of the city of Blantyre. During the field week, we validated the travel times to hospitals and schools as two of the most important proxy indicators for vulnerability. Therefore, we gave these measures a prominent role in the field survey. Survey questions were carefully selected to fit two purposes: Validation of the proxy indicators, but also creation of initial baseline data on the Thyolo district for future MRCS projects.

The OMK application provided the enumerators a platform to add sanitation and water points to the Open Street Map dataset. As the buildings were mapped remotely, it was interesting to see that around seventy to eighty percent of the mapped buildings were correct. In those cases, where the building was either mapped in the wrong place or missing completely, it was often washed away by heavy rains and rebuilt next to its original location. In some cases, new buildings were completely missing from the mapped region. In these cases, the map is as good as the remotely sensed imagery; it may contain clouds, lack in resolution or be outdated. Outdated maps are the most frequent cause of error; new satellite imagery is therefore necessary for making accurate maps for humanitarian purposes.

Dealing with low internet connectivity

In rural Malawi, it is often a challenge to upload base maps of the visitation sites and the correct survey of the day to the 20 tablets, and to download all the data collected in the field from the tablets in an efficient manner. To mitigate dependence on internet connectivity, we brought a Portable OpenStreetMap device (POSM, developed by the American Red Cross – GIS team – more information about the device here) to Malawi to temporarily store offline edits made during the survey. This device creates a Wi-Fi network for the tablets, enabling data to be processed on the go.  To prevent errors, collected data was checked during lunch breaks, during which everyone was in range of the POSM device, which was plugged into the car. This way, collection errors could be caught in time, and enumerators were guided in the right direction.


Field Kit, with Portable OSM server and Camera’s

Collecting streetview data for remote analysis

During the field work the entire traveled path was recorded using Garmin VIRB cameras which captured the GPX points of each photo. The data collected by the cameras was uploaded to Mapillary ( and OSM to add to the open ‘street view’ images of the world – these are the first street view images to be made for Malawi, take a virtual tour here. The images will be used in our Missing Maps mapathons to enrich OSM data, as through studying the images the building material and even the function of buildings can be identified.


Camera’s mounted to the vehicle


Resulting streetview in Mapillary


In Malawi, the OMK application and the POSM have proven to be a powerful combination. Availability of building location data was necessary for validation of the research as well as for creation of more accurate data on the Thyolo communities. Meanwhile, the very limited internet connectivity could be overcome by the POSM, with direct verification of data collected as one of its key strengths.

The collection of ‘street view’ imagery was easier than expected, with absence of power supplies being the largest challenge, as twenty tablets and personal phones needed charging as well.

The survey was conducted with the help of twenty volunteers and the training and mapping activities proved very successful, with all volunteers performing above expectations. For surveys, mapping activities and even IT purposes, volunteers are invaluable for the Red Cross as they form an important pillar of the local capacity.

Working with remotely sensed data is often a cost-effective way of gathering information and has proven to be a good source for analysis of remoteness indicators.

Data-responsibility & ethics

Data collected in this project was the minimum needed to verify the remoteness algorithms, and to contribute to specific key datasets, such as the locations of schools and hospitals, as well as building materials of buildings. In each village where data was collected, the village chief was consulted and permission was asked. Personally identifiable data was not collected, nor did we collect any data about the household composition and vulnerabilities. Respondents were asked about the average travel time to different locations, for all family members, reducing the need to request personal data from the family members. Photos collected for Mapillary have gone through computer vision technology that detects faces and licence plates on the photos and applies blurs to them before they are visible on Mapillary. The vulnerability product that we derive from the data was discussed with government departments and with researchers working on vulnerability mapping for Malawi, as well as with the Malawi Red Cross. Respondents were neutral to our approach and did not express concerns. This way we try to make sure we involve local partners and take their concerns seriously. We believe that neither this data, nor the vulnerability proxy will do harm to the people in Malawi. On the contrary, it can help government and NGOs to better target the most vulnerable. If you do identify a risk, please reach out to us directly.

Data & code

Key location data, such as schools and hospitals, and building material data, collected in this project has been added to Openstreetmap and is therefore publicly available. Photos collected by us in the Mapillary streetview are available here. Travel distance data collected from families is not publicly available and will only be used to calibrate the remoteness indicator. The remoteness indicator will be finalized before July 2017 and the link to the Github repository will be shared here.



Our champions

Red Cross support

Grants for this research were provided by the Prinses Margriet Fund and the Netherlands Red Cross.

510An initiative of the Netherlands Red Cross. We want to shape the future of humanitarian aid by converting data into understanding, and put it in the hands of humanitarian relief workers, decision makers and people affected, so that they can better prepare for and cope with disasters and crises. Among our data scientists are many volunteers and their input to our work is highly appreciated.

Want to join us and have an impact in humanitarian aid through the use of data? Contact us.

Netherlands Red Cross a humanitarian aid organization

Data preparedness in Malawi

Data preparedness in Malawi


The Netherlands Red Cross has a long-term partnership with Malawi Red Cross, working together on humanitarian response and Disaster Risk Reduction. In 2015 the shelter cluster was activated in Malawi during one of the worst floods in the history of Malawi that led to a severe food crisis.

The shelter cluster was there to co-lead the shelter response, part of which was setting up an inter-cluster rapid needs assessment to get more accurate data on who was affected where. Since such assessments cost time, responders were already making decisions based on incomplete data and had to deal with influencing factors such as political pressures, media coverage and access. During the response, the need for more data-driven decision making became evident, and to this point Data Preparedness is key.

In February 2017 three team members  joined the Malawi Red Cross in a first of its kind Data Preparedness Mission. This blog post describes the results of this mission and explains how such a mission can be replicated in other countries.

What is data preparedness?

510 believes Data Preparedness should be an essential part of the preparedness activities that humanitarian organizations undertake together with communities at risk. It is about pre-staging data with sufficiently high data quality (that matches the prospective information needs of responders) and developing capacities to collect data with – and about – affected communities and areas once a disaster hits, to ensure a timely, efficient, and effective response. We developed a framework of five components to further describe which activities are part of Data Preparedness:

Data Sets: What data in relation to disaster management does your organization collect? Do you use a framework with indicators for this and what are your information needs? Which gap is there between your information needs and the data that is available to you? Do you have an overview of those data providers that will be important for you once a disaster strikes?

For example, during the Typhoon Haiyan the international community did not have the automatic reflex to request data on cities from mayors.

Data Tooling and Services: Which tooling (software, hardware, but can also be paper-based) does your organization use to collect, analyse, and share data? Which tooling does your organization use for collaboration with other organizations and/or dissemination (like geospatial sharing platforms and collaborative digital tooling)? Which data services does your organization offer or rely on (like early warning information)?

Data Literacy: Do you have training programs for your employees in relation to data? Do you face obstacles in terms of lack of data literacy at several hierarchical levels within your organization? How do you assess the level of data literacy within your organization or possibly also of the partners you work with? Do you have an HR policy that attracts data literate staff?

Data Governance: What is the mandate of your organization in terms of data for disaster management and/or the business rationale? Do you have specific guidelines in place in relation to data collection, analysis and sharing? How do you safeguard privacy and ensure that sensitive data is handled responsibly? How are data harms prevented from occurring?

Networked Organizations for Data: With which organizations do you coordinate or collaborate in terms of data? With which organizations do you share data or get data from? Do you have an open data policy and are you actively sharing data online? Have you reached agreements with others for datasets that cannot be shared openly?

Remotely kick starting Data Preparedness

Already in 2016, we started with several activities regarding the above five components working remotely from The Hague with organizations in Malawi. We collected data on several risk indicators from a variety of data providers as risk is an important predictor of where the impact will be highest after a disaster strikes. Some data was easy to find such as the data that was online on Malawi’s Spatial Data Platform, i.e. the MASDAP, or on the Humanitarian Data Exchange; other data required finding specific contact persons and directly asking them for the data. The figure below shows the indicators it was possible for us to find data for remotely.

Data availability

In many cases, data was available but only at the national or district level. Data at the Traditional Authority level, i.e. closer to communities, was missing. To fill data gaps on vulnerable communities we are developing proxy indicators. One such indicator is the remoteness indicator as a proxy for vulnerability. The remoteness indicator is calculated on data in openstreetmap. To generate the data we held mapathons where about 1000 Dutch volunteers helped in mapping those parts of Malawi that were most relevant for the Malawi Red Cross. In a typical mapathon with 200 participants, one can map about 12000 houses.

Part of the remote work was also developing priority index models for Floods in Malawi. In a separate blogpost, we will describe how we are predicting the areas that are most affected by floods using only data from before and up to 24 hours into the disaster.

In-country: A Data Preparedness review for and with the Malawi Red Cross

The remote activities go hand in hand with activities in the given country. Only through being in the country and through co-creating national and local capacities, can Data Preparedness truly become a part of humanitarian aid processes. Consequently, the objectives of our mission to Malawi were to get an in-depth understanding of where the Malawi Red Cross currently is in terms of Data Preparedness, to increase awareness of the importance of Data Preparedness and to ignite and catalyze corresponding activities. In addition, we worked with Malawi Red Cross staff to learn how the priority index model and the remoteness indicator could be used in development of programs and humanitarian operations.

In the first week, an interactive workshop with participation from nearly all departments within the Malawi Red Cross was organized. Participants rotated across tables and discussed extensively each of the Data Preparedness components. This helped us in identifying key barriers and ways to overcome them. Essential for embedding Data Preparedness in an organization are the Planning, Monitoring, Evaluation, Reporting (PMER), and IT officers. A first prerequisite is having an IT infrastructure that enables data sharing. Secondly, it is important that data can be easily shared among projects baselines and M&E. Over the course of two weeks, we held over 25 semi-structured interviews with a wide variety of stakeholders, ranging from government (such as the Department of Disaster Management, Department of Surveys, National Statistics Office), international organizations (World Bank), universities up to NGOs both in the capital Lilongwe and in the southern part of Malawi (i.e. Blantyre, Zomba and small villages close to the border of Mozambique). A very tangible result was many useful datasets were shared with us. Often data was on the individual laptop of the person we spoke to and he or she simply was not aware of the possibility of uploading it to a geospatial sharing platform or was refrained from doing it due to technical difficulties, among which a very slow internet was the main one.

Of course, it was not always possible to get relevant data remotely, but by meeting in person this was easier. Explaining the humanitarian purpose and showing the type of analysis we were doing is paving the way for future data sharing.

Implications at an organizational and country level

The Data Preparedness mission was a good learning experience for Malawi Red Cross and ourselves. It increased our understanding of the strengths and weaknesses in terms of data in Malawi. During our interviews with other stakeholders, it became evident that some other organizations were ahead in terms of being data-driven, and the Malawi Red Cross realized the urgency of stepping up the pace and strengthening their data- and IT-infrastructure. Malawi Red Cross also expressed that their 33.000 volunteers in 33 branches across the country are an enormous asset for data collection and sharing at grassroots level, especially when supported with training on tooling such as OpenMapKit and Mapillary.

At a country level, it was evident that data relevant for humanitarian response is scattered among many different organizations with sometimes overlapping mandates and roles. Hereby NGOs have usually very patchy data (of their project areas), whereas governments often have data only on paper or cannot open up their data that easily.

Local government agencies expressed the risk of data being misrepresented or misunderstood at national level, given the political implications this might have. Therefore, data sharing is also political and thus a time consuming and complex process. Even for simple data collection, aimed at starting a mapping activity in a community, one must get approval from the local government, and a key point is what one can offer in return to the government.

A challenge hereby is that there is often a time lag between the data collection and the actual application of the data for humanitarian programming or research, further complicated by the digital divide. Communicating the results back to local communities cannot be done through a nice interactive website if there is no IT infrastructure and a lack of digital literacy, but has to be done mainly offline.

The way forward

The Netherlands Red Cross and the Malawi Red Cross are well-positioned to continue work on Data Preparedness through several projects that will start in the coming months. Most recently, a project on Data for the Sustainable Development Goals (SDG) has been approved by the Global Partnership for SDG (link), where the objective is to create a national data collaborative in Malawi through which organizations can share data that can be used to monitor and report on the SDGs (especially WASH and health) in Malawi. The Netherlands Red Cross will also continue working on Data Preparedness with other national societies, whereby the elements from the approach used for the Malawi Data Preparedness serve as a blueprint that will be contextualized together with each individual national society. Last but not least, the Malawi Red Cross is assessing how they can build up within their organization a structure similar to the one of the 510 team: a unique and powerful mix of data savvy volunteers, professional staff and students. The discussions with the many stakeholders during the Data Preparedness mission have helped to identify potential strategic partners for doing so.


Our champions

Red Cross support

Grants for this research were provided by the Prinses Margriet Fund and the Netherlands Red Cross.

510An initiative of the Netherlands Red Cross. We want to shape the future of humanitarian aid by converting data into understanding, and put it in the hands of humanitarian relief workers, decision makers and people affected, so that they can better prepare for and cope with disasters and crises. Among our data scientists are many volunteers and their input to our work is highly appreciated.

Want to join us and have an impact in humanitarian aid through the use of data? Contact us.

Netherlands Red Cross a humanitarian aid organization

Outliers and missing data in datasets

Outliers and missing data in datasets


In the previous blog post on data verification, see: [link] we mentioned the need to identify outliers in the data. This blog post will look at some of the humanitarian aid datasets/formats and the types of techniques we apply to identify outliers and how to effectively deal with them. The remainder of this blog post will focus on the occurrence of missing data and possible reasons for their occurrence.


Outliers are data points whose values are much lower, or much higher than the rest of the data points. We need to identify them, as they may impact the predictive accuracy and model fit when applying simple or multivariate regression analyses. If there are many outliers in the higher values, it is likely that the model will underestimate these values. Similarly, if there are many outliers in the lower values, the model can overestimate them.

Outliers can be classified as either “valid” or “invalid”, depending on the underlying cause(s). For example, in the case of a survey, an observation may have been wrongly entered, or numerical values may have been inserted where descriptive text was expected. Other causes may be the inconsistent use of zero, or “non-applicable”. If a CSV file (“comma separated values”), or a TSV file (“tab separated values”) does not have the proper encoding format (e.g. UTF-8) then rendering their contents on a UTF-8 encoded website may result in specific characters being replaced by black squares or question marks. As a side effect, due to the encoding errors, table entries may be shifted and so end up in the wrong columns. If the table is “scraped” from a website using scripting, these encoding errors will also appear in the downloaded data.

Identifying outliers

In case of large structured datasets such as surveys, manually identifying outliers remains cumbersome.

In order to accelerate the identification we use a number of methods:

  • Visual inspection methods such as histogram (Figure 1) or box-and-whisker plots (Figure 2) help us to ‘see’ and interpret the distribution of the data
  • Non-visual inspection methods using spreadsheet functions or scripts help us to identify any empty entries, inconsistencies between numerical values and their textual entries, etc.

Figure 1: An example of a histogram, showing the number of completely damaged houses due to the Gorka earthquake in Nepal.

Figure 2: An example of a box-and-whisker plots showing outliers below the “minimum value mark”.

Handling outliers

Once the outliers have been identified, the next step is to determine what to do with them:

  • Retaining outliers that appear to be valid data.
  • Replacing outliers with a known (or derived) entry from related datasets.
  • Deleting outliers from the dataset.

In early stages after a natural disaster, when detailed data is still scarce, humanitarian aid organizations often publish high-level data which then gets updated over time. Also, as more and more data becomes available, we can use triangulation, where outliers are retained if different sources report similar/equal values and are removed if values differ in two or more of the datasets.

Missing data

In other instances values may be missing from the dataset, either intentionally (subjects refusing to provide answers to survey questions), or unintentionally (data got corrupted or subjects were no longer available to complete a survey). The extent to which the missing data impacts further analyses starts with determining the type of missing data:

  • Missing Completely At Random (MCAR): data are missing independently of both observed and unobserved data. An example of this would be: entire surveys that, at random, were not submitted, leading to missing values.
  • Missing At Random (MAR): given the observed data, data are missing independently of unobserved data. An example of this would be: collecting data about a subject’s profession where it is known that certain professions are more likely not to share their income. Within subgroups of the profession, missing incomes will be random.
  • Missing Not at Random (MNAR): missing observations related to values of unobserved data. An example of this would be: people with a low income are less likely to report their income on a data collection form.

We can ignore missing data (= omit missing observations) if we have MAR or MCAR.

In a recently held survey in 2016 on competitiveness index for municipalities in the Philippines, only 1245 municipalities out of a total of more than 1600 municipalities were ranked and no reason was given for the missing 400 municipalities. In such instances it is advised to reach out to the researchers to understand why not all data was published.

After the Gorkan earthquake in Nepal, the number of damaged houses was reported at the lowest administrative level (level 4, Village Development Committee). Each VDC was associated with an identification label. The p-coding of the document was done automatically by means of an algorithm searching for matching letters in the label. Unfortunately, as the administrative borders in Nepal are rather dynamic, several VDCs did not have an associated p-code.

Figure 3: A visualisation of the number of completely damaged houses in Nepal due to earthquake Gorka.


Our champions

510An initiative of the Netherlands Red Cross. We want to shape the future of humanitarian aid by converting data into understanding, and put it in the hands of humanitarian relief workers, decision makers and people affected, so that they can better prepare for and cope with disasters and crises. Among our data scientists are many volunteers and their input to our work is highly appreciated.

Want to join us and have an impact in humanitarian aid through the use of data? Contact us.

Netherlands Red Cross a humanitarian aid organization