Optimizing T5 for Lightweight Tibetan-English Translation

Jacob Moore
Paula Lauren

0 evaluations Published on Nov 25, 2025

This article on Sciety

Abstract

We present the first lightweight Tibetan-English machine translation models optimized for low-resource settings and edge deployment. Our approach combines (1) a custom tokenizer trained on Tibetan script, (2) continued pretraining on Tibetan-English corpora, and (3) supervised fine-tuning on domain-specific translation pairs. Through ablation studies, we quantify each component’s contribution to translation quality. Results show that both the tokenizer and pretraining significantly improve performance, especially at small data scales. This work establishes the first strong baseline results for Tibetan-English translation with compact models and offers a practical framework for other underrepresented, non-Latin-script languages.

Related articles are currently not available for this article.