Creating a complete system that deduplicated the data of approximately 3.3 million non-profit organisations
Executive Summary
Challenge: GlobalGiving, Candid, GuideStar and TechSoup Global approached us with a need to get richer access to data for non-profit leaders so that they can learn about social change at scale.
Approach: We were responsible for technical decisions and the direction of development. We developed the complete solution integrated with the AWS Cloud and able to store and deduplicate millions of organisations from the clients’ systems.
Result: The result is enhanced transparency in the NGO sector thanks to the system which went through 3.3. million organisations, identified 540,0000 duplicates and assigned 2.7 million BRIDGE IDs.
Table of Contents
About the project
The Basic Registry of Identified Global Entities (BRIDGE) is a collaborative project that aims to revolutionise information sharing.
BRIDGE is a system that assigns non-governmental organisations (NGOs), NGO programs and projects, and other entities in the social sector with a unique global identifier, a “numerical fingerprint” for non-profits in order to better understand the flows of philanthropic dollars and enhance transparency and effectiveness in the global social sector.
The main stakeholders in the project are organisations from the non-profit sector: GlobalGiving, Candid, and TechSoup that have a huge amount of data, and there are many non-governmental organisations in the world that could use it in countless ways.
An innovative system
The BRIDGE system could be a significant step toward connecting disparate sources of data and providing a richer access to this data to non-profit leaders so that they can learn about social change at scale. It will also help donors to understand the sector and the organisations working for change.
Challenge there was, of course, that we had to bring together the data of all those four organisations into one place and the assign these unique Bridge IDs to all of the unique organisations that were in the four data sets. BRIDGE had to tackle a system and design a system that allowed us to figure out which records across the four organisations were the same. The only way that Bridge could be successful, if less than 3% out of that whole matching exercise, would end up being, what we can call, an ambiguous record.
BRIDGE project in numbers
540,000
of duplicates that we identified
2.7 million
BRIDGE IDs in the system
3.3 million
of non-profit organisations provided by the clients
18 months
duration of the first phase of the project
Scope of work
To reach the final solution, we had to solve a specific problem: the deduplication of data about nonprofit organisations.
Our role was to develop a complete solution integrated with the Amazon AWS cloud and able to store and deduplicate millions of organisations from the clients’ systems.
We had to take care of performance issues, security, data backups or the appropriate redundancy of servers, and clusters of database systems.
We were also responsible for technical decisions and the direction of the development. Thanks to that, the team had a strong sense of ownership for the deduplication solution and a sense of responsibility for the state of its development.
This was a generally well-run project, showing good ownership and good creativity on the product side. Technically producing good results. Good openness of activities and product progress guiding us where that was needed. Well engaging the clients in various meetings weekly. Liked the demos and making progress concrete. Liked the openness to investigate or do things differently as we went through the months.
The deduplication issue
Because of the exceptional nature of the project, in which we worked with four clients, our role in BRIDGE was, in some respects, different from the typical one.
The first phase of the project lasted about 18 months. Future Processing was responsible for creating a complete system that was able to deduplicate data on the approximately 3.3 million non-profit organisations provided by the clients.
As part of the project, 97.4 per cent of NGOs were assigned a unique identifier: a BRIDGE ID. In total, we’ve identified approximately 540,000 duplicates, resulting in 2.7 million BRIDGE IDs in the system. The deduplication engine has been fine-tuned to perform as best as possible for NGOs and is very easy to scale within our infrastructure.
Currently, BRIDGE can help the clients exchange information efficiently and improve communication about NGOs.
You were extremely responsive and professional. Very organized and everyone seemed impressed by the quality of your work. We stayed pretty much on track the whole time, and your team brought very good intellectual and technical capacity to the project. It seemed like you and your team genuinely took an interest in the project.
SCRUM and Technical Advisor
Within the company, the project was supported by a Technical Advisor, whose primary goal was to ensure the high quality of the development process and the solutions that were provided.
Technical Advisor contributed a lot in terms of the structural quality of the product and frequently inspired team members to achieve continuous, iterative improvement of the code.
Since the very beginning of the project, we have followed the Scrum framework as closely as possible. It seemed to be an ideal solution for both us and our clients.
Watch the interview
Because of the exceptional nature of the project, in which we worked with numerous clients, our role in BRIDGE was, in some respects, different from the typical one. Watch the video in which Paul van Haver explains more about the project and our partnership.
Visual identification
We initiated the effort to create a product logo from scratch. It was natural to design the logo in a form of a bridge connecting people, being a stable construction that you can safely rely upon — it is the name of the project after all.
The fact that the people are holding hands is very important because it shows that people involved in the project support one another. The logo is blue because this is a colour associated with trust.
It was our pleasure working with the Future Processing team. We found their engineers to be technically proficient, engaging, and open to new ideas. The Future Processing team did an outstanding job managing the project and the final product met all of our expectations. We would definitely work with Future Processing again and would recommend them to others as well.