With the Digital Justice Seed Grant, the team used machine learning models to identify relationships, recognize handwriting, and redact sensitive information from about 700 letters written by family members of imprisoned anti-apartheid activists. Now, the team will use their Digital Justice Development Grant to implement solutions based on the project’s first phase and hold training sessions at the Mayibuye Centre Archive, the collections’ home archive in Cape Town.
“We think we’ve come up with a really innovative way to open up this archive while ensuring privacy concerns are respected. Further, because machine learning methods are transferable, we also believe this workflow could be applied to other archives that contain similarly valuable but sensitive information. Making these archives even partially accessible through machine learning could revise our thinking about any number of histories,” said Davis, an associate professor at the University of Kentucky.
The ACLS Digital Justice Grant Program, made possible by the Mellon Foundation, provides resources for projects at various stages of development that diversify the digital domain, advance justice and equity in digital scholarly practice, engage in appropriately scaled capacity building efforts, and contribute to public understanding of racial and social justice issues. Digital Justice Seed Grants support projects at early stages of development, while Digital Justice Development Grants fund projects that have advanced beyond the start-up or early phases of development.
“We would not have been able to have the sustained conversations and onsite presence required to make a sensitive project like this succeed had it not been for ACLS funding,” Davis said.
We think we’ve come up with a really innovative way to open up this archive while ensuring privacy concerns are respected. Further, because machine learning methods are transferable, we also believe this workflow could be applied to other archives that contain similarly valuable but sensitive information.
Stephen Davis
Left: File boxes from the Mayibuye Centre Archive. The Programme II Collection contains 420 Boxfiles of letters. It remains closed to the public.
The letters explored in the project were predominately written by Black South African women whose spouses had been detained during apartheid. Unbeknownst to authorities, the letters were created as a method by International Defense and Aid (IDAF), an anti-apartheid legal defense organization, to fund imprisoned activists and their families.
Under the apartheid government system, it was illegal for international organizations to provide funding to activists. Thus, the organization asked volunteers from Europe, North America, and the Commonwealth to write letters to activist’s families expressing support, along with money that was secretly from IDAF. The scheme resulted in letter exchanges that went on for decades, providing rich insights about the experiences of Black activist families in South Africa.
Davis believes the letters will revise what is known about the history of South Africa.
“Over the course of this letter writing, which essentially began as a ruse, or a kind of accounting mechanism to get around apartheid laws, real friendships developed and a lot of unique and rare details of Black family life were recorded in the voices of the people who lived those experiences, which is really rare,” he said.
To gather data from the letters, the researchers have fine-tuned open-source handwritten text recognition (HTR) models from Microsoft and leveraged open-source line detection models from Kraken to convert lines from the letters into machine readable, raw text. The researchers then used Python to reconstruct the letters.
The researchers also trained machine learning models to mask personally identifiable information contained in the letters without using writers’ actual data, in order to protect the letter writers’ privacy. The team noticed the letters were all written in a similar structure, so they used a generative machine learning tool to create synthetic data in the same format. This synthetic data was then used to train the model how to mask the writer’s actual private data.
“The idea was to have a machine learning model that could mask personally identifiable information, and then have another machine learning model that could do important information extraction so you could actually understand when you view the data, collectively, large scale challenges that were shared across different people from different communities,” explained Mattingly, a Postdoctoral Fellow at the Smithsonian Institution.
The most important thing of all of this, I can’t emphasize this enough, is that it keeps everything local. They don’t have to let their data ever leave their museum or their archives.
William Mattingly
While crafting their project, one of the team’s greatest priorities was ensuring they used open-source models that could be downloaded and used locally and non-commercially on servers in South Africa. Using open-source models ensures the letters remain in their archive and allows individuals in South Africa to train their own models and make corrections to improve the tool’s accuracy.
“The most important thing of all of this, I can’t emphasize this enough, is that it keeps everything local. They don’t have to let their data ever leave their museum or their archives,” Mattingly said. “This workflow, once established, can be replicated fairly easily. We designed everything so that it can be copied and pasted, loaded up for a different archive, and it will work.”
The team’s training sessions at the Mayibuye Centre Archive will give them the opportunity to explain how to use the tools on servers and how to train machine learning models.
“We’re still broadly working within the realm of machine learning and human rights, but this project has taken on a life of its own,” Davis said. “The tools we’ve developed are really promising, and we’re looking forward to sharing them with our South African partners and seeing them work in practice.”
ACLS Digital Justice Grants
The ACLS Digital Justice Grant Program supports digital projects across the humanities and interpretative social sciences that critically engage with the interests and histories of people of color and other historically marginalized communities through the ethical use of digital tools and methods.
The next round of applications will open on September 10, 2024, with applications due December 3, 2024, 9:00 PM EST.