Translators without Borders

  • Email
  • Facebook
  • Instagram
  • LinkedIn
  • Twitter
  • Donate Now
  • Home
  • About us
    • Careers
    • Testimonials
  • Our work
    • Crisis Response
      • European refugee response
      • Global COVID-19 response
      • Mozambique
      • Northeast Nigeria
      • Rohingya response
      • The Democratic Republic of Congo
    • Gamayun Language Initiative
    • Language Data Initiative
    • TWB Platform
    • TWB Chatbots
    • TWB Glossaries
    • Resources
  • Support us
    • Donate to Translators without Borders
    • Become a Fundraiser
    • Corporate Sponsorships
      • Give financial support
      • Give pro bono support
      • Our sponsors
  • Volunteer
    • Meet the TWB Community
    • Join the TWB Community
    • TWB language volunteer role
    • Community Recognition Program
  • Partner with us
    • TWB Partner Program
    • Apply to be a Partner
  • News & Blog
    • Blog
    • Press

Language Data Disparity

Machine translation requires vast amounts of data to be effective. At a minimum, 4-5 million strings of data are needed to build a successful machine translation engine, although some professionals recommend at least 100 million strings.

Yet despite the rapid development of language technology for languages like French, Spanish, and German, languages of people living in parts of the world with less commercial power are being left behind. Unless we do something, this digital language gap will only continue to grow.

That’s why we’re using Gamayun, the language equality initiative, to build language data sets and create scalable, replicable machine translation engines.

The table below demonstrates the current availability of language data that can be used to develop language technology. The volume of data does not always correspond to the number of speakers. For example, there are 23 million Dutch speakers worldwide, and 1.6 billion strings of Dutch language data. Meanwhile, there are 63 million speakers of Hausa globally, but only 3.1 million strings of data.

This needs to change—everyone deserves the right to communicate in their own language.

Help us bridge the language data and technology gap: contact Grace Tang, Gamayun Program Manager, at [email protected].

News

TWB welcomes the RWS Foundation as Sapphire sponsor for 2023

February 28, 2023

Celebrating 100,000 TWB community members

August 8, 2022

TWB welcomes Semantix and TransPerfect in joint Sapphire sponsorship

April 5, 2022

Subscribe to the TWB newsletter

Contact Us

[email protected]

How to contact us

Report misconduct

Connect

  • Facebook
  • Instagram
  • LinkedIn
  • Twitter
  • YouTube

Subscribe to our newsletter

Donate

More

Annual report

Financials

Careers

Sitemap

 

 

DISCLAIMER : Translators Without Borders is not in any way affiliated with Doctors Without Borders, which is a registered trademark of Bureau International de Médecins Sans Frontières

Copyright © 2025 Translators without Borders

image001  Privacy Policy

  • About us
    • Careers
    • Testimonials
  • Our work
    • Crisis Response
      • European refugee response
      • Global COVID-19 response
      • Mozambique
      • Northeast Nigeria
      • Rohingya response
      • The Democratic Republic of Congo
    • Gamayun Language Initiative
    • Language Data Initiative
    • TWB Platform
    • TWB Chatbots
    • TWB Glossaries
    • Resources
  • Support us
    • Donate to Translators without Borders
    • Become a Fundraiser
    • Corporate Sponsorships
      • Give financial support
      • Give pro bono support
      • Our sponsors
  • Volunteer
    • Meet the TWB Community
    • Join the TWB Community
    • TWB language volunteer role
    • Community Recognition Program
  • Partner with us
    • TWB Partner Program
    • Apply to be a Partner
  • News & Blog
    • Blog
    • Press
  • Donate Now
  • About us
    • Careers
    • Testimonials
  • Our work
    • Crisis Response
      • European refugee response
      • Global COVID-19 response
      • Mozambique
      • Northeast Nigeria
      • Rohingya response
      • The Democratic Republic of Congo
    • Gamayun Language Initiative
    • Language Data Initiative
    • TWB Platform
    • TWB Chatbots
    • TWB Glossaries
    • Resources
  • Support us
    • Donate to Translators without Borders
    • Become a Fundraiser
    • Corporate Sponsorships
      • Give financial support
      • Give pro bono support
      • Our sponsors
  • Volunteer
    • Meet the TWB Community
    • Join the TWB Community
    • TWB language volunteer role
    • Community Recognition Program
  • Partner with us
    • TWB Partner Program
    • Apply to be a Partner
  • News & Blog
    • Blog
    • Press
  • Donate Now