Translators without Borders

  • Email
  • Facebook
  • Instagram
  • LinkedIn
  • Twitter
  • Donate Now
  • Home
  • About us
    • Careers
    • Testimonials
  • Our work
    • Crisis Response
      • European refugee response
      • Global COVID-19 response
      • Mozambique
      • Northeast Nigeria
      • Rohingya response
      • The Democratic Republic of Congo
    • Gamayun Language Initiative
    • Language Data Initiative
    • Kató Translation Platform
    • TWB Chatbots
    • TWB Glossaries
    • Resources
  • Support us
    • Donate to Translators without Borders
    • Become a Sponsor
    • Become a Fundraiser
    • Our Sponsors
  • Volunteer
    • Meet the TWB Community
    • Join the TWB Community
    • TWB language volunteer role
    • Community Recognition Program
  • Partner with us
    • TWB Partner Program
    • Apply to be a Partner
  • News & Blog
    • Blog
    • Press

Language Data Disparity

Machine translation requires vast amounts of data to be effective. At a minimum, 4-5 million strings of data are needed to build a successful machine translation engine, although some professionals recommend at least 100 million strings.

Yet despite the rapid development of language technology for languages like French, Spanish, and German, languages of people living in parts of the world with less commercial power are being left behind. Unless we do something, this digital language gap will only continue to grow.

That’s why we’re using Gamayun, the language equality initiative, to build language data sets and create scalable, replicable machine translation engines.

The table below demonstrates the current availability of language data that can be used to develop language technology. The volume of data does not always correspond to the number of speakers. For example, there are 23 million Dutch speakers worldwide, and 1.6 billion strings of Dutch language data. Meanwhile, there are 63 million speakers of Hausa globally, but only 3.1 million strings of data.

This needs to change—everyone deserves the right to communicate in their own language.

Help us bridge the language data and technology gap: contact Grace Tang, Gamayun Program Manager, at [email protected].

News

Celebrating 100,000 TWB community members

August 8, 2022

TWB welcomes Semantix and TransPerfect in joint Sapphire sponsorship

April 5, 2022

Lionbridge celebrates 25th anniversary with first Sapphire sponsorship for TWB

February 23, 2022

Subscribe to the TWB newsletter

Contact Us

[email protected]

How to contact us

Report misconduct

Connect

  • Facebook
  • Instagram
  • LinkedIn
  • Twitter
  • YouTube

Subscribe to our newsletter

Donate

More

Annual report

Financials

Careers

Sitemap

Copyright © 2023 Translators without Borders

image001  Privacy Policy