Unlock the power of language with the TWB Learning Center

Discover our free online translator training courses: launching TWB’s new-look Learning Center! 

Introducing the new TWB Learning Center – a place for community members to gain experience in humanitarian translation and learn in-demand language industry skills. 

Part of CLEAR Global, TWB brings together over 100,000 language volunteers globally, helping people get vital information and be heard, whatever language they speak. Together, we’re dedicated to translating and localizing important information to support the world’s most marginalized people. Our community members work to help our nonprofit partners worldwide provide lifesaving multilingual messages, ensuring everyone can understand. Now we’re launching our new-look Learning Center and brand-new, self-paced online translation courses! The TWB Learning Center is a great way for newcomers and language professionals alike to continue to develop their skills and stay ahead in the industry. So, dive in and get ready to learn something new!

Photo: All community members who successfully complete a course will attain a downloadable certificate. Here are our TWB Nigeria team members with their certificates. 

Explore new translator training opportunities exclusive to TWB Community members

Available courses: 

Every contribution matters – in every language: making translator training accessible 

Our team of experts has revamped the existing e-learning platform for translators to provide our community members with free, high-quality courses. These courses are designed to be accessible to both experienced localization professionals and those of you who speak marginalized languages. Even if you’re starting from scratch. Perhaps you speak a language that lacks useful translation training resources? Whether you are looking to refresh yourself on the basics, or learn about machine translation and translating for humanitarian contexts, TWB’s Learning Center courses allow you to develop and apply your language skills. So you can make a bigger impact professionally, and personally. 

Our community members help us make vital material accessible to more people around the world. The work you do matters. You’re helping some of the world’s most vulnerable people get answers to their questions in times of crisis, know their rights, and how to stay safe when forcibly displaced. You’re helping people get accurate and reliable health information. And you’re helping those who are most impacted by climate change protect themselves and our planet. 

Grow your skills and translate for good

The TWB Learning Center offers community members a variety of interactive, self-paced online courses to learn and grow professionally and acquire new skills. Our courses empower linguists and non-professional bilinguals to participate in humanitarian and development translation tasks on the TWB Platform and initiatives for making their languages part of global conversations. These courses cater to everyone, from newcomers to the language industry with no previous experience, to professional translators who are looking to keep up to date with the latest innovations. In the TWB Learning Center, TWB Community members can choose to improve and build their capabilities in areas of their choice, such as translation, machine translation post-editing (MT PE), our computer-assisted translation tool (Phrase TMS), target terminology development and glossaries, desktop publishing, and more to come.

New to TWB, translation, or the humanitarian field?

Everyone is welcome. Our courses are designed to be accessible by speakers of low-resourced and well-resourced languages alike. If you’ve not yet joined the TWB Community, you can sign up today. Learn more about the community and join here. If you’re new to translation and the humanitarian field, complete the TWB Learning Center courses to learn about our translation tools and get practice using your new skills on impactful projects. 

Work towards your professional goals with TWB:

  • Learn about key translation concepts and tools
  • Get familiar with the tools and skills you need to start working on translation tasks with TWB and in your career
  • Develop the experience and capability to take on more complex translator training and explore more specialized topics
Photo: a sneak peek of our Learning Center. Ibrahim, left, with a participant testing out a device powered by community members’ translations. It enables displaced people in Bakassi camp, Nigeria, to give feedback to camp staff in their own language. It lets people listen to vital information that matters to them, like how food distribution works.

Don’t miss out – course certification 

Once you successfully complete a course on the TWB Learning Center, you can download a certificate. Showcase your skills, share certificates with your network, and enhance your resume. We love to see our global community learning and growing – here are some posts people have shared after completing their courses – why not join them? 

Our language volunteers shared their experience

We spoke to Yuriy Kovalenko, English, Ukrainian and Russian translator  who shares our love for learning on the TWB Platform

“I have been working with TWB for almost two years, but more actively since the full-scale war in Ukraine started. Now, for almost a year, the flood of information, manuals, and reports was overwhelming and this required faster rendering of diverse texts into the target language. Faster, but maintaining a high quality of translation, meeting deadlines, sustaining attention to detail, localization, and consistency, to name a few. TWB has a user-friendly platform, comradely and supportive staff, detailed and easy-to-follow Translator’s Toolkit for newbies, a Guide for TWB Community members, and Language Quality Inspection/Assessment.”

Photo: Yuriy at work.

“When I was invited by TWB to attend their online course on MT PE (machine translation post-editing), without hesitation, I signed up and learned how to apply my skills in a more efficient way. Now, this experience allowed me to understand better how machine translation works, and how AI (artificial intelligence) can be helpful in many respects. I now find it easier to translate more accurately using other different platforms and CAT (computer-assisted translation). I definitely recommend these TWB courses to any aspiring professional. The knowledge, skills and experience you acquire and hone will be invaluable. In my case, working with and learning from the TWB made me feel more accomplished.” 

Yuriy Kovalenko, TWB Community member.

Mirriam Kitaka joined TWB as a young Swahili translator:

“I joined TWB in 2019 after a thorough Google search for a translation website that could give me an opportunity to grow as a young translator, and this was two years after my mentor introduced me to the field. When I found TWB, I joined as a Swahili Translator Volunteer (TWB Community member). I have since translated, reviewed, and proofread a lot of tasks on the TWB platform. Under the community recognition program, I have been awarded a Certificate of Volunteer Activity and a Reference Letter as a Translator, not to forget a phone top-up for attaining the minimum threshold designed by the organization.” 

“TWB has given me the opportunity to take courses which have scaled my translation, proofreading, editing, and reviewing skills. For me, they offered specific content and information, especially in the humanitarian field. They are very nice and rich courses that I would recommend current and upcoming translators to study through elearn.translatorswb.org. As I write this, I have donated 42,870 words already. I am also working on a very huge revision project. And I can also confirm that I am now a “TWB Traveller!” Thank you Translators without Borders and CLEAR Global for the opportunity to save lives through my native language.”

Mirriam Kitaka, TWB Community member.

Your invitation to join us

Go ahead and explore the Learning Center’s free translation courses today at elearn.translatorswb.org 

If you’re new to TWB – sign up here. 

Our goal is to make our training resources multilingual, with a special focus on low-resource languages. We are starting by translating our Basic Translator Training course with the support of our community! Our team hopes to make it available in at least ten languages this year. By March, we will upload module two of the Basic translator training course, plus a brand-new course on how to use CAT tools including Phrase TMS. Behind the scenes, are also working on making new training courses available on language quality, developing glossaries, and more. 

Watch this space as we learn and grow together!

We thank Microsoft for their kind Azure donation which hosts much of our language technology resources.

Conversations with chatbots: helping people in the DRC access multilingual COVID-19 information

“How is coronavirus different from Ebola?”

“What are the symptoms of Corona?”

“How many times a day should I wash my hands?”

“How else can I protect myself from Corona?”

These are questions that people are asking in the Democratic Republic of Congo in Lingala, French, and Congolese Swahili. And their questions are being answered by a bot, in their own language.

The bot’s name is “Uji,” which is short for ukingo and jibu, which mean “prevention” and “response” respectively. Uji is TWB’s first multilingual chatbot and a key part of making sure people have the health information they want, in their own language.

Uji supports collaborative and two-way communication

Everyone has the right to access the information they need and want, when they want it, and in a language they understand. Yet frequently information is only available in global commercially-viable languages, or in the national languages of a country. Furthermore, this information is often only available in a top-down manner, with humanitarians and health agencies deciding what information people can and should receive.

TWB has long advocated for humanitarians and development professionals to integrate multilingual technology in their programs. This allows people living through crises to proactively and independently get answers to their questions. And with the COVID-19 pandemic related restrictions denying crisis-affected people access to humanitarians, new communication tools are needed.

Uji unites language and technology to bring us closer to this vision of truly equitable information access.

The development of Uji

Access to credible, multilingual COVID-19 information is a challenge in the DRC. “Many Lingala and Congolese Swahili speakers in the DRC are accessing COVID-19 information from different radio shows, websites, and posters,” explains Rodrigue Bashizi, TWB’s DRC Community Engagement Officer. “But the main challenge for accessing COVID-19 information is the cost of internet bundles in the country. Sometimes people receive videos talking about COVID-19, but they can’t open them due to a lack of good internet and the cost of bundles.”

People needed a better solution for their COVID-19 questions. Enter Uji. Rodrigue says, “Uji is a very important tool for people in DRC because they lack trusted information. Since Uji is on Telegram and WhatsApp, it will not consume a lot of internet bundles. It is easy to use. Once it is on SMS it will even be available for people in remote areas with no internet access.”

Rodrigue is from Bukavu in the DRC and speaks Swahili, French, English, Lingala, Kinyarwanda and Luganda. Before joining TWB, he worked as a trainer with refugees in Uganda. At TWB, he is a core member of the team developing our multilingual chatbots for two-way communications. Rodrigue is passionate about technology and says he loves working on chatbots, as he is learning something new every day.

Rodrigue and other TWB team members developed the tool in partnership with Kinshasa Digital, a DRC communication agency that was already working with the DRC Ministry of Health to develop a COVID-19 chatbot. By collaborating with Kinshasa Digital and bringing multilingual technology to the existing bot, we will be able to reach more people, in more languages.

TWB developed Uji in French, Congolese Swahili, and Lingala. The bot responds to a wide range of questions about COVID-19, from debunking popular rumors, to tips on how to help children cope with stress due to COVID-19. We are working on expanding its scope to also respond to questions about Ebola. The chatbot is available on WhatsApp and Telegram. By using existing messaging platforms people can access COVID-19 information wherever they are, whenever they want. Whether they are at home, on the bus, or at work, they can find the information they need, right from their phone.


To engage with Uji, users message their COVID-19 questions to the chatbot on WhatsApp or Telegram. They can ask their questions in French, Congolese Swahili, or Lingala. The bot automatically responds in the language in which the question was asked.

The questions were ready and the bot was developed. But before launching the bot fully across these platforms, we needed to test and perfect it.

Linguist-tested and approved

Uji is a work in progress, and it requires human testing in multiple languages to make sure it’s effective and useful. Rodrigue led the testing efforts with volunteers from TWB’s community of translators, IFRC, and other partners. At the beginning of the process, Uji had to learn to understand questions and match responses accurately. But with time and testing, Uji has improved dramatically. And feedback from our community of testers is positive:

“The bot is making great progress in Swahili.”

“It’s getting harder to get an answer that doesn’t match the question. Seems the bot is improving continuously.”

Not only is this individual feedback important, but nearly 70% of users who participated in our satisfaction survey about the bot report that they find the information useful. The chatbot also allows TWB to gather insights about what questions are asked most frequently and what languages are used most often. Humanitarian and health organizations can use this data to tailor their communication strategies, to better provide the information that people want.

We will continue to improve Uji in the coming weeks and months, and welcome additional feedback from users.

The future of TWB chatbots

We hope that Uji is the start of a global restructuring of how multilingual conversations happen. Our aim is to demonstrate Uji’s value as a successful multilingual two-way communication channel in the DRC, and then expand the model into additional countries and for additional uses.

We encourage humanitarian and development professionals to consider incorporating chatbots and other language technology into their programming.

To learn more about incorporating chatbot and language technology into your programming, email [email protected].

Written by Krissy Welle, TWB’s Senior Communications Officer

Language data fills a critical gap for humanitarians

Until now, humanitarians have not had access to data about the languages people speak. But a series of open-source language datasets is about to improve how we communicate with communities in crisis. Eric DeLuca and William Low explain how a seemingly simple question drove an innovative solution.

“Do you know what languages these new migrants speak?”

Lucia, an aid worker based in Italy, asked this seemingly simple question to researchers from Translators without Borders in 2017. Her organization was providing rapid assistance to migrants as they arrived at the port in Sicily. Lucia and her colleagues were struggling to provide appropriate language support. They often lacked interpreters who spoke the right languages and they asked migrants to fill out forms in languages that the migrants didn’t understand.

Unfortunately, there wasn’t a simple answer to Lucia’s question. In the six months prior to our conversation with Lucia, Italy registered migrants from 21 different countries. Even when we knew that people came from a particular region in one of these countries, there was no simple way to know what language they were likely to speak.

The problem wasn’t exclusive to the European refugee response. Translators without Borders partners with organizations around the world which struggle with a similar lack of basic language data.

Where is the data?

As we searched various linguistic and humanitarian resources, we were convinced that we were missing something. Surely there was a global language map? Or at least language data for individual countries?

The more we looked, the more we discovered how much we didn’t know. The language data that does exist is often protected by restrictive copyrights or locked behind paywalls. Languages are often visualized as discrete polygons or specific points on a map, which seems at odds with the messy spatial dynamics that we experience in the real world. 

In short, language data isn’t accessible, or easily verifiable, or in a format that humanitarians can readily use.

We are releasing language datasets for nine countries

Today we launch the first openly available language datasets for humanitarian use. This includes a series of static and dynamic maps and 23 datasets covering nine countries: DRC, Guatemala, Malawi, Mozambique, Nigeria, Pakistan, Philippines, Ukraine, and Zambia.

This work is based on a partnership between TWB and University College London. The pilot project received support from Research England’s Higher Education Innovation Fund, managed by UCL Innovation & Enterprise. With support from the Centre for Translation Studies at UCL, this project was the first of its kind in the world to systematically gather and share language data for humanitarian use.

The majority of these datasets are based on existing sources — census and other government data. We curated, cleaned, and reformatted the data to be more accessible for humanitarian purposes. We are exploring ways of deriving new language data in countries without existing sources, and extracting language information from digital sources.

This project is built on four main principles:

TWB Language Data Initiative

1. Language data should be easily accessible

We started analyzing existing government data because we realized there was a lot of quality information that was simply hard to access and analyze. The language indicators from the 2010 Philippines census, for example, were spread over 87 different spreadsheets. Many census bureaus also publish in languages other than English, making it difficult for humanitarians who work primarily in English to access the data. We have gone through the process of curating, translating, and cleaning these datasets to make them more accessible.

2. Language data should work across different platforms

We believe that data interoperability is important. That is, it should be easy to share and use data across different humanitarian systems. This requires data to be formatted in a consistent way and spatial parameters to be well documented. As much as possible, we applied a consistent geographic standard to these datasets. We avoided polygons and GPS points, opting instead to use OCHA administrative units and P-codes. At times this will reduce data precision, but it should make it easier to integrate the datasets into existing humanitarian workflows.

We worked with the Centre for Humanitarian Data to develop and apply consistent standards for coding. We built an HXL hashtag scheme to help simplify integration and processing. Language standardization was one of the most difficult aspects of the project, as governments do not always refer to languages consistently. The Malawi dataset, for example, distinguishes between “Chewa” and “Nyanja,” which are two different names for the same language. In some cases, we merged duplicate language names. In others, we left the discrepancies as they exist in the original dataset and made a note in the metadata.

Even when language names are consistent, the spelling isn’t always. In the DRC dataset, “Kiswahili” is displayed with its Bantu prefix. We have opted instead to use the more common English reference of “Swahili.”

Every dataset uses ISO 639-3 language codes and provides alternative names and spellings to alleviate some of the typical frustrations associated with inconsistent language references.

3. Language data should be open and free to use

We have made all of these datasets available under a Creative Commons Attribution Noncommercial Share Alike license (CC BY-NC-SA-4.0). This means that you are free to use and adapt them as long as you cite the source and do not use them for commercial purposes. You can also share derivatives of the data as long as you comply with the same license when doing so.

The datasets are all available in .xlsx and .csv formats on HDX, and detailed metadata clearly states the source of each dataset along with known limitations. 

Importantly, everything is free to access and use.

4. Language data should not increase people’s vulnerability

Humanitarians often cite the potential sensitivities of language as the primary reason for not sharing language data. In many cases, language can be used as a proxy indicator for ethnicity. In some, the two factors are interchangeable.

As a result, we developed a thorough risk-review process for each dataset. This identifies specific risks associated with the data, which we can then mitigate. It also helps us to understand the potential benefits. Ultimately, we have to balance the benefits and risks of sharing the data. Sharing data helps humanitarian organizations and others to develop communication strategies that address the needs of minority language speakers.

In most cases, we aggregated the data to protect individuals or vulnerable groups. For each dataset, we describe the method we used to collect and clean the data, and specify potential imitations. In a few instances, we chose to not publish datasets at all.

How can you help?

This is just the beginning of our effort to provide more accessible language data for humanitarian purposes. Our goal is to make language data openly available for every humanitarian crisis, and we can’t do it alone. We need your help to:

  1. Integrate and share this data. We are not looking to create another data portal. Our strategy is to make these datasets as accessible and interoperable as possible using existing platforms. But we need your feedback so we can improve and expand them.
  2. Add language-related questions into your ongoing surveys. Existing language data is often outdated and does not necessarily represent large-scale population movements. Over the past year, we have worked with partners such as IOM DTM, REACH, WFP, and UNICEF to integrate standard language questions into ongoing surveys. This is essential if we are to develop language data for the countries that don’t have regular censuses. The recent multi-sectoral needs assessment in Nigeria is a good example of how a few strategic language questions can lead to data-driven humanitarian decisions.
  3. Use this language data to improve humanitarian communication strategies. As we develop more data, we hope to provide the tools for Lucia and other humanitarians to design more appropriate communication strategies. Decisions to hire interpreters and field workers, develop radio messaging, or create new posters and flyers should all be data-driven. That’s only possible if we know which languages people speak. An inclusive and participatory humanitarian system requires two-way communication strategies that use languages and formats that people understand.

Clearly, the answer to Lucia’s question turned out to be more complicated than any of us expected. This partnership between TWB and the Centre for Translation Studies at UCL has finally made it possible to incorporate language data into humanitarian workflows. We have established a consistent format, an HXL coding scheme, and processes for standardizing language references. But the work does not stop with these nine countries. Over the next few months we will continue to curate and share existing language datasets for new countries. In the longer term we will be working with various partners to collect and share language data where it does not currently exist. We believe in a world where knowledge knows no language barriers. Putting language on the map is the first step to achieving that.

Eric DeLuca is the Monitoring, Evaluation, and Learning Manager at Translators without Borders.

William Low is a Senior Data and GIS Researcher at University College London.

Funding for this project was provided by Research England’s Higher Education Innovation Fund, managed by UCL Innovation & Enterprise.

Transfer Learning Approaches for Machine Translation

This article was originally posted in the TWB Tech Blog on medium.com

TWB’s current research focuses on bringing language technology to marginalized communities

Translators without Borders (TWB) aims to empower people through access to critical information and two-way communication in their own language. We believe language technology such as machine translation systems are essential to achieving this. This is a challenging task given many of the languages we work with have little to no language data available to build such systems.

In this post, I’ll explain some methods for dealing with low-resource languages. I’ll also report on our experiments in obtaining a Tigrinya-English neural machine translation (NMT) model.

The progress in machine translation (MT) has reached many remarkable milestones over the last few years, and it is likely that it will progress further. However, the development of MT technology has mainly benefited a small number of languages.

Building an MT system relies on the availability of parallel data. The more present a language is digitally, the higher the probability of collecting large parallel corpora which are needed to train these types of systems. However, most languages do not have the amount of written resources that English, German, French and a few other languages spoken in highly developed countries have. The lack of written resources in other languages drastically increases the difficulty of bringing MT services to speakers of these languages.

Low-resource MT scenario

Figure 2, modified from Koehn and Knowles (2017), shows the relationship between the BLEU score and the corpus size for the three MT approaches.

A classic phrase-based MT model outperforms NMT for smaller training set sizes. Only after a corpus size threshold of 15M words, roughly equivalent to 1 million sentence pairs, classic NMT shows its superiority.

Low-resource MT, on the other hand, deals with corpus sizes that are around a couple of thousand sentences. Although this figure shows at first glance that there is no way to obtain anything useful for low resource languages, there are ways to leverage even small data sets. One of these is a deep learning technique called transfer learning, which makes use of the knowledge gained while solving one problem to apply it to a different but related problem.

Cross-lingual transfer learning

Figure 3 illustrates their idea of cross-lingual transfer learning.

The researchers first trained an NMT model on a large parallel corpus — French–English — to create what they call the parent model. In a second stage, they continued to train this model, but fed it with a considerably smaller parallel corpus of a low-resource language. The resulting child model inherits the knowledge from the parent model by reusing its parameters. Compared to a classic approach of training only on the low-resource language, they record an average improvement of 5.6% BLEU over the four languages they experiment with. They further show that the child model doesn’t only reuse knowledge of the structure of the high resource target language but also on the process of translation itself.

The high-resource language to choose as the parent source language is a key parameter in this approach. This decision is usually made in a heuristic way judging by the closeness to the target language in terms of distance in the language family tree or shared linguistic properties. A more sound exploration of which language is best to go for a given language is made in Lin et al. (2019).

Multilingual training

What results from the example is one single model that translates from the four languages (French, Spanish, Portuguese and Italian) to English.

Multilingual NMT offers three main advantages. Firstly, it reduces the number of individual training processes needed to one, yet the resulting model can translate many languages at once. Secondly, transfer learning makes it possible for all languages to benefit from each other through the transfer of knowledge. And finally, the model serves as a more solid starting point for a possible low-resource language.

For instance, if we were interested in training MT for Galician, a low-resource romance language, the model illustrated in Figure 4 would be a perfect fit as it already knows how to translate well in four other high-resource romance languages.

A solid report on the use of multilingual models is given by Neubig and Hu (2018). They use a “massively multilingual” corpus of 58 languages to leverage MT for four low-resource languages: Azeri, Belarusian, Galician, and Slovakian. With a parallel corpus size of only 4500 sentences for Galician, they achieved a BLEU score of up to 29.1% in contrast to 22.3% and 16.2% obtained with a classic single-language training with statistical machine translation (SMT) and NMT respectively.

Transfer learning also enables what is called a zero-shot translation, when no training data is available for the language of interest. For Galician, the authors report a BLEU score of 15.5% on their test set without the model seeing any Galician sentences before.

Case of Tigrinya NMT

Tigrinya is no longer in the very low-resource category thanks to the recently released JW300 dataset by Agic and Vulic. Nevertheless, we wanted to see if a higher resource language could help build a Tigrinya-to-English machine translation model. We used Amharic as a parent language, which is written with the same Ge’ez script as Tigrinya and has larger public data available.

The datasets that were available to us at the time of writing this post are listed below. After JW300 dataset, the largest resource to be found is Parallel Corpora for Ethiopian Languages.

Our transfer-learning-based training process consists of four phases. First, we train on a dataset that is a random mix of all sets totaling up to 1.45 million sentences. Second, we fine-tune the model on Tigrinya using only the Tigrinya portion of the mix. In a third phase, we fine-tune on the training partition of our in-house data. Finally, 200 samples earlier allocated aside from this corpus are used for testing purposes.

As a baseline, we skip the first multilingual training step and use only Tigrinya data to train on.

We see a slight increase in the accuracy of the model on our in-house test set when we use the transfer learning approach. The results in various automatic evaluation metrics are as follows:

Conclusion

Written by Alp öktem, Computational Linguist for Translators without Borders

When words fail: audio recording for verification in multilingual surveys

A TWB trainer conducts comprehension research. Monguno, Borno State, Nigeria. Photo by: Eric DeLuca, Translators

“Sir, I want to ask you some questions if you agree?”

With that one sentence, our enumerator summarized the 120-word script provided to secure the informed consent of our survey participants – a script designed, in particular, to emphasize that participation would not result in any direct assistance. Humanitarian organizations, research institutes and think tanks around the world are conducting thousands of surveys every year. How many suffer from similar ethical challenges? And how many substandard survey results fall under the radar due to lack of effective quality assurance?

We were conducting a survey on the relationship between internal displacement, cross-border movement, and durable solutions in Borno, a linguistically diverse state in northeast Nigeria. Before data collection began, Translators without Borders (TWB) translated the survey into Hausa and Kanuri to limit the risk of mistranslations due to poor understanding of terminology. Even with this effort, however, not all the enumerators could read Hausa or Kanuri. Although enumerators spent a full day in training going through the translations as a group, there is still a risk that language barriers may have undermined the quality of the research. Humanitarian terminology is often complex, nuanced, and difficult to translate precisely into other languages. A previous study by Translators without Borders in northeastern Nigeria, for example, found that only 57% of enumerators understood the word ‘insurgency’.

We only know the exact phrasing of this interview because we decided to record some of our surveys using an audio recorder. In total, 96 survey interviews were recorded. Fifteen percent of these files were later transcribed into Hausa or Kanuri and translated into English by TWB. Those English transcripts were compared to the enumerator-coded responses, allowing us to analyze the accuracy of our results. While the process was helpful, the findings raise some important concerns.

A digital voice recorder in Maiduguri, Nigeria serves as a simple and low-tech tool for capturing entire surveys. Photo by: Eric DeLuca / Translators without Borders
A digital voice recorder in Maiduguri, Nigeria serves as a simple and low-tech tool for capturing entire surveys. Photo by: Eric DeLuca / Translators without Borders

Consent was not always fully informed

Efforts to obtain informed consent were limited, despite the script provided. According to the consultant, enumerators felt rushed due to the large numbers of people waiting to participate in the survey – but people were interested in participating precisely due to the misbelief that participation could result in assistance, which underlines the need for informed consent. 

Alongside these ethical challenges, the failure to inform participants about the objectives of the research increases the risk of bias in the findings, prompting people to tailor responses to increase their chances of receiving assistance. Problems related to capacity, language, or questionnaire design can also negatively impact survey results, undermining the validity of the findings. 

The enumerator-coded answers did not always match the transcripts

During data quality assurance, we also identified important discrepancies between the interview transcripts and the survey data. In some cases, enumerators had guessed the most likely response rather than properly asking the question, jumping to conclusions based on their understanding of the context rather than respondents’ lived experiences. If the response was unclear, random response options were selected without seeking clarification. Some questions were skipped entirely, but responses still entered into the surveys. The following example, comparing an extract of an interview transcript with the recorded survey data, illustrates these discrepancies. 

Interview transcript Survey data
Interviewer: Do you want to go back to Khaddamari?

Respondent: Yes, I want to.

Interviewer: When do you want to go back?

Respondent: At any time when the peace reigns. You know we are displaced here.

Interviewer: If the place become peaceful, will you go back?

Respondent: If it becomes peaceful, I will go back. 

Do you want to return to Khaddamari in the future? Yes

When do you think you are likely to return? Within the next month

What is the main reason that motivates you to return? Improved safety

What is the second most important reason? Missing home

What is the main issue which currently prevents return to Khaddamari? Food insecurity

What is the second most important issue preventing return? Financial cost of return

At no point in the interview did the respondent mention that he or she was likely to return in the next month. Food insecurity or financial costs were also not cited as factors preventing return. Without audio recordings, we would never have become aware of these issues. Transcribing even just a sample of our audio recordings drew attention to significant problems with the data. Instead of blindly relying on poor quality data, we were able to triangulate information from other sources, and use the interview transcripts as qualitative data. We also included a strongly worded limitations section in the report, acknowledging the data quality issues.

We suspect such data quality issues are common. Surveys, quite simply, are perhaps not the most appropriate tool for data collection in the contexts within which we operate. Certainly, there is a need to be more aware of, and more transparent about, survey limitations.

Despite these limitations, there is no doubt that surveys will continue to be widely used in the humanitarian community and beyond. Surveys are ingrained in the structure and processes of the humanitarian industry. Despite the challenges we faced in Nigeria, we will continue to use surveys ourselves. We know now, however, that audio recordings are invaluable for quality assurance purposes. 

A manual audio recording strategy is difficult to replicate at scale

In an ideal world, all survey interviews would be recorded, transcribed, and translated. This would not only enhance quality assurance processes, but also complement survey data with rich qualitative narratives and quotes. Translating and transcribing recordings, however, requires a huge amount of technical and human resources. 

From a technical standpoint, recording audio files of surveys is not straightforward. Common cell phone data collection tools, such as Kobo, do not offer full-length audio recordings as standard features within surveys. There are also storage issues, as audio files take up significant space on cell phones and stretch the limits of offline survey tools or browser caching. Audio recorders are easy to find and fairly reliable, but they require setting up a parallel workflow and a careful process of coding to ensure that each audio file is appropriately connected to the corresponding survey.

From a time standpoint, this process is slow and involved. As a general rule, it takes roughly six hours to transcribe one hour of audio content. In Hausa and Kanuri – two low resource languages that lack experienced translators – one hour of transcription often took closer to eight hours to complete. The Hausa or Kanuri transcripts then had to be translated into English, a process that took an additional 8 hours. Therefore, each 30-minute recorded survey required about one day of additional work in order to fully process. To put that into perspective, one person would have to work full time every day for close to a year to transcribe and translate a survey involving 350 people.

Language technology can offer some support

In languages such as English or French, solutions already exist to drastically speed up this process. Speech to text technologies – the same technologies used to send SMS messages by voice – have improved dramatically in recent years with the adoption of machine learning approaches. This makes it possible to transcribe and translate audio recordings in a matter of seconds, not days. The error rates of these automated tools are low, and in some cases are even close to rivaling human output. For humanitarians working in contexts with well resourced languages like Spanish, French, or even some dialects of Arabic, these language technologies are already able to offer significant support that makes an audio survey workflow more feasible.

For low-resource languages such as Hausa, Kanuri, Swahili, or Rohingya, these technologies do not exist or are too unreliable. That is because these languages lack the commercial viability to be priority languages for technology companies, and there is often insufficient data to train the machine translation technologies. In an attempt to close the digital language divide, Translators without Borders has recently rolled out an ambitious effort called Gamayun: the language equality initiative. This initiative is working to develop datasets and language technology in low-resource languages relevant to humanitarian and development contexts. The goal is to develop fit-for-purpose solutions that can help break down language barriers and make language solutions such as this more accessible and feasible. Still, this is a long term vision and many of the tools will take months or even years to develop fully.

In the meantime, there are four things you can do now to incorporate audio workflows into your data collection efforts

  1. Record your surveys using tape recorders. It is a valuable process, even if you are limited in how you are able to use the recordings right now. In our experience, enumerators are less likely to intentionally skip entire questions or sections if they know they are being recorded. Work is underway to integrate audio workflows directly into Kobo and other surveying tools, but for now, a tape recorder is an accessible and affordable tool.
  2. Transcribe and translate a small sample of your recordings. Even a handful of transcripts can prove to be useful verification and training tools. We recommend you complete the translations in the pilot stage of your survey, to give you time to adjust trainings or survey design if necessary. This can help to at least provide spot checks of enumerators that you are concerned about, or simply verify one key question, such as the question about informed consent.
  3. Run your recordings through automated transcription and translation tools. This will only be possible if you are working in major languages such as Spanish or French. Technology is rapidly developing, and every month more languages become available and the quality of these technologies improve. Commercially available services are available through Microsoft, Google, and Amazon amongst others, but these services often have a cost, especially at scale.
  4. Partner with TWB to improve technology for low-resource languages. TWB is actively looking for partners to pilot audio recording and transcription processes, to help gather voice and text data to build language technologies for low resource languages. TWB is also seeking partners interested in actively integrating these automated or semi-automated solutions into existing workflows. Get in touch if you are interested in partnering: [email protected]
Written by:

Chloe Sydney, Research Associate at IDMC

Eric DeLuca, Monitoring, Evaluation, and Learning Manager at Translators without Borders

Marginalized mother languages – two ways to improve the lives of the people who speak them

21 February. This is the date chosen by UNESCO for International Mother Language Day, which has been observed worldwide since 2000. This year deserves special attention as 2019 is the International Year of Indigenous Languages. Both initiatives promote linguistic diversity and equal access to multilingual information and knowledge.

Languages can be a huge resource. At the same time, the mother language that people speak can be a barrier to accessing opportunities. People who speak marginalized mother languages often belong to remote or less prosperous communities and, as a result, they are more vulnerable when a crisis hits.

Yet, the humanitarian and development sector has been largely blind to the importance of language. International languages such as English, French, Arabic, and Spanish dominate, excluding the people who most need their voices heard. Marginalized language speakers are denied opportunities to communicate their needs and priorities, report abuse, or get the information they need to make decisions.

If aid organizations are to meet their high-level commitments to put people at the center of humanitarian action and leave no one behind, this needs to change. To understand better how to address language barriers facing marginalized communities, two actions can lead our sector in the right direction.

Aerial view of Monguno, Borno State, Nigeria. Photo by Eric DeLuca, Translators without Borders.

Putting languages on the map

The first is language mapping. No comprehensive and readily accessible dataset exists on which language people speak where.

TWB has started to fill that gap by creating maps from existing data and from our own research. Our interactive map shows the language and communication needs of internally displaced people in northeast Nigeria. The map uses data collected by the International Organization for Migration’s Displacement Tracking Matrix team. This data shows, for instance, that access to information is a serious problem at over half of sites where Marghi is the dominant language. Aid organizations can use this map to develop the right communication strategy for reaching people in need.

Humanitarian and development organizations can add some simple standard questions to their household surveys and other assessments to gather valuable language data. Aid workers will then understand the communication needs and preferences of the 176 million people in need of humanitarian assistance globally.

But communication in a crisis situation – or in any situation – should not be one-way. That’s where the second action comes in.

Building machine translation capacity in marginalized languages

Language technology has dramatically shifted two-way communication between people who speak different languages. In order to truly help people in need, listen to and understand them, we need to apply technology to their languages as well.

TWB is leading the Gamayun Language Equality Initiative to make it happen. We have built a closed-environment, domain-specific Levantine Arabic machine engine for the UN World Food Programme. This initiative will improve accountability to Syrian refugees facing food insecurity. Initial testing indicates that Gamayun will provide an efficient method for accessing local information sources. It will enable aid organizations to better understand the needs of their target populations, especially in hard-to-reach areas.

TWB Fulfulde Team Lead conducting comprehension research. Waterboard camp in Monguno, Borno State, Nigeria. Photo by Eric DeLuca, Translators without Borders.

We need to continue building the parallel language datasets from humanitarian and development content that make machine translation a viable option. That will expand the evidence that machine translation can enable better communication, including by empowering affected people to hold aid organizations to account in their own language.

Taking action

These two actions can help the humanitarian and development sector improve lives by promoting two-way communication with speakers of marginalized languages.  These actions will need to be expanded to be truly effective, but International Mother Language Day in the Year of Indigenous Languages is a great time to start.

To read:

    • The IFRC 2018 World Disasters Report, which includes clear and compelling recommendations about the importance of language to ensure that the world’s most vulnerable people are not “left behind”
  • TWB’s white paper on the Gamayun Language Equality Initiative

To do:

    • Consult our dashboard and think about how you can start collecting this data to inform your programs
    • Follow our journey as we continue to move forward with Gamayun (and learn along the way!)
  • Email us if you have an idea to share or want to do more in this area: [email protected]
Written by Mia Marzotto, Senior Advocacy Officer for Translators without Borders. 

Translating mental health — finding language solutions in northeast Nigeria

If the sign at the mental health clinic read, “Services for mad people,” would you walk in for help?

Yet that is the reality for many people in northeast Nigeria because of the difficulty in translating concepts like ‘mental health’ into Nigerian languages. Translators without Borders (TWB) is working with humanitarian experts in mental health to better understand the nuances among languages so that words can encourage use of services rather than hinder access.

Northeast Nigeria is linguistically diverse, with more than 30 mother tongues spoken by 1.9 million people displaced by conflict. Often traumatized by the conflict, many internally displaced people (IDPs) could benefit from mental health services. Yet the translation of ‘mental health’ into the main two languages used in the response – Hausa and Kanuri – carries a heavy stigma, possibly keeping people away from clinics.


How can those working in the mental health and psychosocial support (MHPSS) sector communicate information about services, when the very name of the sector scares people away?


To address this question, TWB worked with affected people and sector specialists to identify terms that need to be communicated more effectively. The resulting terminology recommendations and the proposed language glossary, with terms translated into Hausa and Kanuri, promote the use of unambiguous and less stigmatizing language. Use of these terms may, by extension, increase the use of services by those who need them.  

TWB began by identifying 301 key mental health terms that are either difficult to translate, commonly misunderstood, or stigmatizing.

This list was then researched and discussed extensively. TWB facilitated a workshop with the International Organization for Migration (IOM) MHPSS specialists to identify particularly difficult words and discuss alternative translations, aiming to use plain language and to avoid words that stigmatize. A group of 53 internally displaced people then reviewed the translations. TWB tested comprehension among the group and explored alternative translations. Throughout the process, TWB discovered key areas where language posed a significant challenge in the delivery of mental health and psychosocial support services.

Points of confusion

One major finding was that many terms commonly used by English speakers when discussing mental health are heavily stigmatized or misunderstood in northeast Nigeria. “Mental health” in Hausa is literally “services for mad people” — a shocking example of stigma. An alternative way of discussing this sector may lie with the phrase “psychosocial support,” which TWB discovered did not carry the same stigma in Hausa.

Generic terms such as “abuse” and “stress” caused confusion as there is often not a comparable generic term in Hausa or Kanuri. In both languages, the translation of “abuse” was generally understood by respondents to refer only to ‘verbal abuse,’ similar to an insult. Similarly, “stress” meant only physical stress to respondents, such as the physical strain you feel after a day of hard labor. If an aid worker intends to communicate how to relieve “mental stress” or how to heal after experiencing “physical abuse,” it’s clear that miscommunication may occur. Therefore, it is best to always pair descriptive words like “physical,” “verbal,” or “emotional,” with “abuse” and “stress.”

A similar issue was found with the concept of a “safe space.” When used in an English-speaking mental health context, it refers to a physical space where one feels cared for and emotionally supported. However, those surveyed understood this concept as a place with armed guards. This is an example of how sector-specific jargon may not make sense to those who need services. In northeast Nigeria, the concept “accepted space” may translate better.

The TWB MHPSS Glossary


“This is a very laudable work that will hasten the delivery of services to the affected people of north east Nigeria.”  
– Dr. Muhammad A. Ghuluze. Director, Emergency Medical Response and Humanitarian Services


To provide a solution for these issues, TWB has updated its Glossary for Nigeria with the 301 MHPSS-related terms. This glossary app includes words, definitions, sample sentences, and audio recordings for the selected terms. It can be accessed on a computer, tablet, Android, or iOS device, and can be used both on- and offline, which is useful given the poor connectivity in northeast Nigeria.

The app is already being used in training sessions with positive results. Thomas Eliyahu Zanghellin, theMental Health and Psychosocial Support / Gender-based Violence Focal Point for the NGO INTERSOS in Maiduguri, Nigeria, has used the glossary in four training sessions already, generating “really fun group work with stimulating discussions.”

Language and terminology play a key role in the delivery of aid. Many sectors, including mental health and psychosocial support, use jargon and generic terms that do not readily translate in some cultures. Discussions about language allow the humanitarian world to challenge this terminology. The TWB Glossary for Nigeria provides a potential solution, allowing affected communities to access services and claim their rights in a language they understand.

Learn more about the TWB Glossary for Nigeria, and other TWB glossary projects here.

Bringing words to life in northeast Nigeria

yoga I recently returned from northeast Nigeria, where Translators without Borders (TWB) is providing language support in one of the most severe humanitarian crises and linguistically diverse areas in the world. Unsurprisingly, I had many conversations about language issues with humanitarian responders.

The good news is that many were already aware of the need to communicate information in languages people understand, despite humanitarian programming often disregarding local language communications. When hearing about TWB’s language support capacity, many felt relieved that someone might be able to help them tackle language barriers. The bad news is that, even with that acknowledgment, the most common refrain I heard throughout my four-week assignment was, “I have never thought about language so carefully before and neither has my organization.”

So I found myself asking, “How much is being lost in translation?” And, more importantly, “If two-way communication in the right languages in northeast Nigeria was truly integrated into programming, how would humanitarian action improve?”

The fact is that the importance of two-way communication between local communities and aid providers, in a language affected people can understand, is increasingly recognized by humanitarians.

Some of the best humanitarian programs are now consciously factoring language into their efforts to meet people’s information and communication needs. They do so recognizing that only when those needs are met can affected people reliably access assistance, provide input, and make the best decisions for themselves and their families. But despite the nod to language, mainstreaming solutions to language barriers within humanitarian work is still not the norm.

This was clear to me in northeast Nigeria.

After nine years, the humanitarian crisis remains one of the most severe in the world. In the three worst-affected states of Borno, Adamawa, and Yobe, 1.9 million Nigerians have been displaced from their homes; overall, 7.7 million people are in need of humanitarian assistance. Data shows that displaced people speak over 30 languages as their mother tongues. Overwhelmingly, they prefer receiving information in their own language. However, humanitarian responders are communicating with affected people mainly in two languages, Hausa and Kanuri. This is not enough to meet people’s needs, and serious problems persist due to the lack of two-way communication.

Humanitarian field staff shared many concerns about language needs in the response. They were unsure how to provide potentially life-saving information in camps where they do not know which languages people understand. There was concern that language diversity and low education levels prevent them from accurately gauging people’s needs and priorities. I also heard frustrations from some aid workers, particularly those who spoke local languages in addition to Hausa or Kanuri. These field workers are often asked to translate complex messages and concepts into those local languages with little or no support or experience in translation. In this situation, I wasn’t surprised that translation was seen as a considerable additional burden for multilingual staff, often an add-on to agreed job descriptions.

These conversations were both concerning and compelling. It’s no secret that for field workers in the humanitarian aid sector, day-to-day work can be more than a little complicated. Language should help, not hinder, the ability to provide effective and accountable aid to those who need it.

The problem is not a lack of awareness among field staff. What is missing is for those who direct organizational policies and program design to focus on language needs early in a response and appropriately resource language support.

To that end, it was exciting to be working with TWB’s team on the ground in northeast Nigeria. We are striving to provide that language support for humanitarian responders communicating with vulnerable people. We have already started to roll out the TWB Glossary for Northeast Nigeria – an in-the-hand tool for humanitarian field staff, interpreters, and translators to ensure use of consistent, accurate, and easily understood words in local languages.  

Yet so much more needs to be done.

The only way for this tool and other forms of language support to make a difference is by mainstreaming their use across the humanitarian response. This begins with ensuring field staff have the knowledge and resources to meet language needs in the response – and the support internally to prioritize the role of language in communication and community engagement programs. Otherwise, we risk seeing too few of these examples reach their potential for humanitarian accountability and effectiveness.

Having conversations about the importance of two-way communication in the languages of the most vulnerable is the necessary first step. Now we must move from words to action about language.

Like most things in life, it’s not what you do but how you do it.

Read more about TWB’s response in northeast Nigeria. 

Written by Mia Marzotto, Advocacy Officer for Translators without Borders.

Language Technology Could Help 157 Million People Get Access To Information

I was exhausted.  It had been a great week in Bangladesh, but the overload of language, smells, refugee camp, seeing old friends, meeting new friends, government, donors, and all the while pretending like I wasn’t jetlagged, was taking its toll.  I just wanted to go to sleep.

My last meeting was in Dhaka with someone in the Prime Minister’s office.  I had little hope of staying awake through the meeting.

And yet, I was captivated.

Bangladesh Help Desk Signage
Bangladesh Help Desk Signage

The literacy rate in Bangladesh is considered low (72.8% according to UNESCO in 2016) but is just below the global average. Literacy among women is lower (69.9%); but, in general, the majority of the people have at least basic literacy skills.  There is 90 percent mobile phone penetration and 96 percent mobile internet access. The International Mother Language Institute, the body in Bangladesh that supports the promotion, spread, and preservation of Bangla languages, says that 41 languages are spoken in the country, only five of which have written scripts.  In the humanitarian response for Rohingya refugees in Cox’s Bazar, Translators without Borders (TWB) finds the situation particularly difficult. Rohingya has no agreed written script. Very few of the refugees can read and write, there are few people who speak Rohingya and anything else well. Add to this mix low radio coverage – not only do the Rohingya not have radios, even if they did there is not even radio coverage in parts of the camps, and about one million people living in poor and difficult conditions that speak many different dialects and you begin to understand why communicating effectively is difficult.

It’s vitally important that there is two-way communication between the people – refugees and local Bangladeshis – and the government and aid workers. Take the issue of the coming monsoon. The formal and makeshift refugee camps have sprouted up all over the Cox’s Bazar district, an area that includes a national park and lush forest. But now the trees have been torn down to make room for shelters and for firewood.  This makes the soil very unstable and dangerous, with monsoon rains promising huge mud pits and the possibility of landslides. It is also a hilly area; tents are built on the sides of hills that will become slippery and unstable with heavy rains and wind. Refugees, as well as local residents, need to know where to go, what to do if there’s an emergency, how to get help for those needing medical attention, and what to do if food gets swept away.  

The challenges abound. The digital world seems a world away.    

And yet, enter Dr. Jami.  In a buzzy, busy office with a high level of excitement and a relatively good gender balance, I was suddenly in the middle of a high tech environment.  Dr. Jami launched directly into what he wanted us to know and do.

Dr. Jami runs the Access to Information (A2I, inevitably) project in the Prime Minister’s office. The aim is to help the people of Bangladesh quickly and easily get information on public services. One of A2I’s projects is the digitization of government institutions; they have developed over 1,000 key government websites.  Dr. Jami is not a language guy (he’s a solutions architect), but he proceeds to tell me quickly that Bangla was only standardized in Unicode five years ago, so there is very little data available from which to build good translation engines.  While there’s 90 percent mobile phone penetration, in 2018 GSMA estimated that only 28-30 percent of those were smartphones. Yet, 96 percent of internet access is via phones. Whaaa? How does that work? It’s also startling how little desktops and laptops are used to access the internet.  

I asked a taxi driver, who was using a smartphone, if he used his phone for the internet.  He replied, “No, but I use it for Facebook.”

There are no data charges for Facebook in Bangladesh – unless you want to see videos or pictures.  Internet use is Facebook and Facebook is only text. Those who are illiterate, or only barely literate, won’t have smartphones.

To Dr. Jami, who needs more people to have smartphones to help ensure they can get access to information, the cost is not the barrier:  There are very inexpensive smartphones in Bangladesh. He believes it is fear of technology, which he believes is associated with illiteracy. To reach his goal of migrating 70 percent of the current mobile phone users to smartphones, he must address fear.

Language is an issue.  With a population of over 157 million people, and one of the most widely spoken languages in the world, you’d think that the language technology for Bangla would be outstanding.  It’s not. That’s surprising. And without that technology, equipping 1,000 websites with dynamic information in Bangla is nearly impossible, not to mention making them interactive and/or adding audio.

The work that A2I is doing is globally relevant, of course.  Other countries are already seeking their support to bring better access to information to their people.  He mentions that they are already working in South Sudan – which has the 2nd lowest literacy rate in the world.  Again, the language barrier is huge. And, again, there is little digital language data.  

Dr. Jami has heard of TWB’s Gamayun project – can we help?  Can we be a neutral broker to bring together the limited language data out there and leverage our knowledge of language and the language industry to help Bangladeshis get access to information about basic services?  

Dr. Jami and the TWB team will continue this conversation – there are still many questions to be asked and answered.  But I was impressed by the enthusiasm and the accomplishments of his team. And I am really excited to see where Dr. Jami and other countries take this exciting initiative.

Written by Translators without Borders' Executive Director Aimee Ansari. This article was also published on HuffPost UK.


Read a related post on The #LanguageMatters blog, ‘Language: Our Collective Blind Spot in the Participation Revolution’.  In TWB’s last blog post, Executive Director Aimee Ansari explains why we need to create and disseminate a global dataset on language and communication for crisis-affected countries.