• VoxCroft

VoxCroft Celebrates Africa’s Language Diversity

The international community celebrates World Mother Language Day on 21 February. This day is an important reminder that one’s mother tongue is the first conduit for interpreting the world around us and shaping our identity. VoxCroft Analytics believes that technological innovation should not solely service the financial interests of large multinationals. We build machine-based solutions for low resource languages with a different goal in mind: to close the gap between decision-makers and local communities, in an effort to make Africa a safer and more prosperous continent.

Access. The key that opens doors to education, employment, invention, and innovation. Yet, lack of access to education and the information highway is one of the largest global challenges of the 21st century. Unequal access to knowledge restricts entry into the vibrant world of industry and technology to individuals and communities that have access to the best education, the right connections, and the necessary financial capital.

Barriers to knowledge, however, begin long before the classroom. Language is the very first portal to access the education and ideas that drive creativity. Unfortunately in Africa, secondary education mostly happens in English, French, and Portuguese – the languages of the continent’s former colonizers.

This leads to a ranking system that classifies languages into two categories: the informal language of the home, and the formal language of economic life. The message this gives mother tongue speakers is that their home language is inferior, to be used only in domestic contexts. In contrast, great effort is given to perfecting the official language used in educational and professional contexts. As a result, entire populations who are unable to develop fluency in the dominant language become excluded from public life, curtailing their participation in the formal economy and active citizenship.

The Internet explosion has further exacerbated the language divide. A 2021 study shows that English forms 60.4 percent of the 10 million most visited websites, even though only 16.2 percent of the world’s population speak it as a mother tongue. Conversely, languages with only a million speakers are rarely visible on the Internet.

There are 7,000 languages spoken in the world today. Of these, 2,000 languages are in Africa.

World Mother Language Day on 21 February seeks to restore, to some small degree, pride of place to the thousands of mother tongues spoken throughout the world. However, much more needs to be done to promote the use and public visibility of mother tongues, some of which are spoken by millions of people. The theme for this year’s commemoration is “Using technology for multilingual learning: Challenges and opportunities.”

There are already several information technology (IT) initiatives to make African language communities visible online and develop vocabularies for the fields of science and technology. Wolof speakers in Senegal have created a podcast that discusses IT developments, with the express objective of creating and using Wolof vocabulary to speak about new technologies. A South African collaborative project promotes the development of Natural Language Processing for African languages.

At VoxCroft Analytics, we are also deeply passionate about promoting African languages and building technologies to make them more accessible online. One such project is to build machine translation technology for low-resource languages.

Most language technologies require very large volumes of language data to build and train models. Typically, a low-resource language is a language for which there is little digital language data available. VoxCroft Analytics uses mother tongue speakers and the latest machine learning techniques to leverage relatively small language datasets to build machine translation models that can be used for a range of functions, including keyword detection in broadcast data, gisting of written text, automatic transcription, and machine translation.

The purpose of these low-resource language technologies is two-fold:

  • They assist our data collection activities, which in turn equips our analysts to identify the various language communities contributing to online conversations and the dominant themes and trends in these conversations. These tools are used to detect misinformation campaigns and hate speech. Our analysts then contextualize these dynamics to help decision-makers make better choices for their organizations and the communities they serve.

  • These technologies also have the potential to enrich communities in which low-resource languages are spoken and written. VoxCroft also actively seeks opportunities for these technologies to become readily available to language communities, so that they can increasingly see their languages represented online. Creating a more equitable online environment reduces real and imagined differences among population groups and promotes interaction which can, in turn, foster greater solidarity and unity.

VoxCroft Analytics has already built technological solutions to enhance multilingual learning for Ethiopia’s most widely-spoken languages. We have working machine translation models for Oromo and Tigrinya, and are currently working on identifying keywords in Amharic audio data.

We have also built a machine translation model for Uganda and are excited to expand these successes to other central and west African languages. Our team of data scientists interacts directly with expert speakers of these languages to ensure high standards of quality in the linguistic data and draw on the speakers’ insights about the syntax and semantics of their mother tongues, which are then encoded into the models.

With 2,000 mother tongues in Africa, there is much work to be done to preserve and promote the use of these languages through the creation of tools and technologies that can enhance speakers’ participation in the digital and online economy. If you are interested in joining us on this quest, get in touch with us to find out more.