Reasoning transfer for an extremely low-resource and endangered language: Bridging languages through sample-efficient language understanding

Research output: Contribution to conferencePaper

Abstract

Recent advances have enabled Large Language Models (LLMs) to tackle reasoning tasks by generating chain-of-thought (CoT) rationales, yet these gains have largely applied to high-resource languages, leaving low-resource languages behind. In this work, we first investigate CoT techniques in extremely low-resource scenarios through previous prompting, model-editing, and fine-tuning approaches. We introduce English-Pivoted CoT Training, leveraging the insight that LLMs internally operate in a latent space aligned toward the dominant language. Given input in a low-resource language, we perform supervised fine-tuning to generate CoT in English and output the final response in the target language. Across mathematical reasoning benchmarks, our approach outperforms other baselines with up to 28.33% improvement in low-resource scenarios. Our analysis and additional experiments, including Mixed-Language CoT and Two-Stage Training, show that explicitly separating language understanding from reasoning enhances cross-lingual reasoning abilities. To facilitate future work, we also release LC2024, the first benchmark for mathematical tasks in Irish, an extremely low-resource and endangered language. Our results and resources highlight a practical pathway to multilingual reasoning without extensive retraining in every extremely low-resource language, despite data scarcity.
Original languageEnglish (Ireland)
Pages1-10
Number of pages10
Publication statusSubmitted - 2025

Keywords

  • Reasoning transfer
  • Chain-of-thought (CoT)
  • Large Language Models (LLMs)

Fingerprint

Dive into the research topics of 'Reasoning transfer for an extremely low-resource and endangered language: Bridging languages through sample-efficient language understanding'. Together they form a unique fingerprint.

Cite this