The 510 team works with data from many different sources. We often face challenges of completeness, validity and reliability. Therefor we took a deep dive into best practices for data verification. This blog describes shortly why it is important and how we address it.
One of the weaknesses of “disaster data” is the lack of standardised methodologies and definitions. One such example is the use of the category people “affected” by disaster. Much of the data are retrieved from a variety of public sources: aid agencies, newspapers, insurance reports, etc. Even if the organisation compiling the data uses specific definitions and a standardised methodology, the contributing suppliers of information may not.
Fortunately, due to increased pressures for accountability from various sources, many donor, development agencies and humanitarian relief organisations have started placing priority on data collection and its methodologies. But this has not yet resulted in a recognized, acceptable and effective international system for disaster-data gathering, verification and storage. In this blog we describe our current process for data verification and data storage.
Data verification is an activity related to checking and confirming information. We apply data verification prior to data analyses, data visualisation, model training, model building and model validation. We use the verification process as proposed in the Verification Handbook, 2014:
The verification process focuses on the following 4 aspects:
While addressing these items we triangulate and challenge the datasets and data collection methods, for example by cross-checking similar or related datasets provided by other organisations shortly after a disaster took place, and then several weeks later. If the dataset indicates values without mentioning how those values were obtained, or what definition was used for a particular parameter, or if unprocessed/raw data is required rather than aggregated data, or if we simply cannot access the data for security reasons, we reach out to the content publisher or researcher for further details. This ensures that we understand the dataset, its limitations, any ambiguity and its relevancy before we use the dataset in any of the subsequent steps.
We also realize that in some cases a thorough data verification is not possible, not practical, or not required. Census data may not be up-to-date or not available, so we may decide to use historic census data instead. Satellite imagery of weather conditions, geographical data and hydrological data are examples of specialized data sets for which we rely on data gathering methodologies used by specialized agencies in the areas of engineering, space and meteorology.
In the Missing Maps Project we work with thousands of volunteers to digitize satellite imagery. To improve the quality and completeness of this effort we agree upon a methodology with all parties involved. The methodology includes tools and training / instructions for data contributors. It involves facilitation by people that know how to properly classify data. A two-step process of data collection by one person and validation by another person, helps us to improve the quality and completeness of the data. We know how accurate the data is (or isn’t) and we therefor know what kind of analysis we can do on that data, and how reliable the outcome of that analysis is.
In July 2014, after extensive research, OCHA launched Humanitarian Data Exchange (HDX). This is a new data sharing platform that encompasses the best standards in data collection, offering access to useful and accurate data. We publish data on HDX in two ways:
We always provide the metadata to allow others to verify how the data was generated and to judge if and how they want to use the data. An example dataset can be seen here. A properly formatted dataset includes:
We encourage users dealing with humanitarian datato use the HDX platform for sharing with the wider community.
We would value a qualitative and quantitative review of the HDX platform, to better understand what are limitations, incentives and opportunities for humanitarian organisations to share data there in a standardised way, and how data is being used by both information managers and decision makers.
In an upcoming update to this blog we will explain how we: