Optimizing T5 for Lightweight Tibetan-English Translation

This article has 0 evaluations Published on
Read the full article Related papers
This article on Sciety

Abstract

We present the first lightweight Tibetan-English machine translation models optimized for low-resource settings and edge deployment. Our approach combines (1) a custom tokenizer trained on Tibetan script, (2) continued pretraining on Tibetan-English corpora, and (3) supervised fine-tuning on domain-specific translation pairs. Through ablation studies, we quantify each component’s contribution to translation quality. Results show that both the tokenizer and pretraining significantly improve performance, especially at small data scales. This work establishes the first strong baseline results for Tibetan-English translation with compact models and offers a practical framework for other underrepresented, non-Latin-script languages.

Related articles

Related articles are currently not available for this article.