A centralized data portal about Africa

Hannah Diorio-Toth

Jan 25, 2024

Researchers presenting the portal

Source: The Upanzi Network

The team is made up of Patrick Iradukunda (MSIT '23), David Ntamakemwa (MSIT '24), and Jean Paul Nishimirwe (MSIT '23).

Researchers from the Upanzi Digital Public Infrastructure Lab at Carnegie Mellon University Africa have set out to make it easier to find and explore data about the African continent. The team, made up of Patrick Iradukunda (MSIT '23), David Ntamakemwa (MSIT '24), and Jean Paul Nishimirwe (MSIT '23), created a searchable portal that aggregates datasets ranging from educational outcomes to population density. The web-based platform is called Open Data Portal and was launched in August 2023.

"Currently, data about Africa is either unavailable or scattered in different places. Researchers, academics, and policymakers have a difficult time finding the quality information they need in order to do things like build new technologies or make informed decisions about new social programs," says Nishimirwe. "This can delay projects or reduce the quality of their outcomes." In the absence of data, researchers must either manually collect new information or run a simulation. And, a simulation can sometimes mean biased results, explains Iradukunda.

For the initial launch of the portal, the team focused on functionality and user experience. The site includes a search function and tags such as "health," "geospatial," and "education" in order to guide a user to different categories of content. The platform already has over 4,000 datasets, but the research team hopes to have many more as the project continues. Users can upload their own data to the portal through a simple process that includes filling out a form and creating a profile. If a user would like to locally host their data, they can instead share a link to be highlighted in the portal.
System design of the open portal diagram

Source: The Upanzi Network

System design of the Open Data Portal, illustrating the 3 main components of the open data ecosystem and how users interact with the ecosystem.

"But the challenge that we face is, what do you mean by 'quality'? Quality data in health might have different criteria from data in education," explains Nishimirwe. "For example, somewhere, narrow data might be a positive thing, while in other areas, narrow data is nonsense."

To monitor this, a second team from the Upanzi Network will be creating an algorithm that will score data sets. The team includes Furaha Benedict (MSIT '23) and Paul Ewuzie (MSEAI '23). The project, still in its early stages, will assess data quality across six major dimensions: accuracy, consistency, completeness, uniqueness, timeliness, and validity.

As the portal develops, the team hopes that it will help to increase visibility for less-understood parts of Africa and encourage more research about those countries. Ntamakemwa explains that for some small countries in West Africa, there is no data available on important aspects of daily life such as agriculture or energy. With Open Portal, Upanzi researchers hope to change that.