Translators without Borders

  • Email
  • Facebook
  • Instagram
  • LinkedIn
  • Twitter
  • Donate Now
  • Home
  • About us
    • Board of Directors
    • Our Team
    • Ambassadors
    • Careers
    • Testimonials
  • Our Work
    • Crisis Response
      • European refugee response
      • Global COVID-19 response
      • Mozambique
      • Northeast Nigeria
      • Rohingya refugee response
      • The Democratic Republic of Congo
    • Advocacy
    • Gamayun Language Initiative
    • Language Data Initiative
    • Development & Preparedness
    • Kató Translation Platform
    • TWB Chatbots
    • TWB Glossaries
    • Impact
    • Resources
  • Donate
    • Donate to Translators without Borders
    • Become a Sponsor
    • Become a Fundraiser
    • Our Sponsors
  • Volunteer
    • Apply as a translator
    • Our Volunteers
  • Partner with Us
    • TWB Partner Program
    • Apply to be a Partner
  • News & Blog
    • Blog
    • Press

Language Data Disparity

Machine translation requires vast amounts of data to be effective. At a minimum, 4-5 million strings of data are needed to build a successful machine translation engine, although some professionals recommend at least 100 million strings.

Yet despite the rapid development of language technology for languages like French, Spanish, and German, languages of people living in parts of the world with less commercial power are being left behind. Unless we do something, this digital language gap will only continue to grow.

That’s why we’re using Gamayun, the language equality initiative, to build language data sets and create scalable, replicable machine translation engines.

The table below demonstrates the current availability of language data that can be used to develop language technology. The volume of data does not always correspond to the number of speakers. For example, there are 23 million Dutch speakers worldwide, and 1.6 billion strings of Dutch language data. Meanwhile, there are 63 million speakers of Hausa globally, but only 3.1 million strings of data.

This needs to change—everyone deserves the right to communicate in their own language.

Help us bridge the language data and technology gap: contact Grace Tang, Gamayun Program Manager, at [email protected].

News

TWB develops language technology to improve humanitarian communication in northeast Nigeria

April 7, 2021

TWB and KoBo Inc develop speech recognition technology to capture voices of speakers of marginalized languages

September 1, 2020

TWB’s Access to Knowledge Awards celebrate people who share knowledge across languages

July 28, 2020

Subscribe to the TWB Newsletter

Contact Us

[email protected]

How to contact us

Report misconduct

Connect

  • Facebook
  • Instagram
  • LinkedIn
  • Twitter
  • YouTube

Create / Manage your TWB account

Subscribe to our Newsletter

Donate

More

Annual report

Financials

Careers

Sitemap

Copyright © 2021 Translators without Borders

image001  Privacy Policy

This site uses cookies. Consult our Cookie Policy.