Key Results
Overview: Current models reason in English even for Bengali questions (left). Our solution combines the GANIT dataset with SFT and Curriculum-GRPO (center) to achieve native Bengali reasoning with 88% Bengali tokens, 79% fewer words, and +8 accuracy (right).
Abstract
We present a Bengali mathematical reasoning model called GanitLLM (named after the Bangla word for mathematics, Ganit), together with a new difficulty-aware Bengali math corpus and a curriculum-based GRPO pipeline. Bengali is one of the world's most widely spoken languages, yet existing LLMs either reason in English and then translate, or simply fail on multi-step Bengali math, in part because reinforcement learning recipes are tuned for high-resource languages and collapse under reward sparsity in low-resource settings.
To address this, we construct GANIT, a rigorously filtered and decontaminated Bengali math dataset with automatic difficulty tags derived from the pass@k of a strong evaluator model. Building on this dataset, we propose Curriculum-GRPO, which combines multi-stage training (SFT + GRPO) with difficulty-aware sampling and verifiable rewards for format, numerical correctness, and Bengali reasoning.
On Bn-MGSM and Bn-MSVAMP, GanitLLM-4B improves over its Qwen3-4B base by +8 and +7 accuracy points, respectively, while increasing the percentage of Bengali reasoning tokens from 14% to over 88% and reducing average solution length from 943 to 193 words.
The Problem: English Reasoning for Bengali Questions
Current LLMs reason in English even when asked Bengali math questions, reducing interpretability for native speakers.
Then, x + y = 12 and xy = 32
From these conditions, we form the quadratic equation:
t² - 12t + 32 = 0
Solving this equation gives: t = 4 and t = 8
Only 14% Bengali tokens
x + y = ১২
xy = ৩২
দ্বিঘাত সমীকরণ পাই:
t² - ১২t + ৩২ = 0
সুতরাং, t = ৪ এবং t = ৮
88% Bengali tokens
Our Approach
1. GANIT Dataset
A rigorously-processed, difficulty-aware Bengali math dataset:
- Quality Screening: Manual evaluation, >95% accuracy threshold
- Rule-based Filtering: Numerical solutions, >99% Bengali text
- Deduplication: Fuzzy matching + MinHash
- Decontamination: Against MGSM & MSVAMP
- Difficulty Tagging: Using pass@32 from Qwen3-32B
| Split | Examples | Purpose |
|---|---|---|
| GanitSFT | 11,023 | Instruction tuning |
| GanitRLVR | 7,328 | RL training (balanced) |
| GanitDEV | 776 | Evaluation |
2. Curriculum-GRPO
A novel training recipe to tackle the cold-start problem:
- Format: Validates <think> and <answer> tags
- Correctness: +2.0 Bengali, +1.0 English match
- Bengali: Ensures >80% Bengali reasoning
Figure: GANIT dataset construction pipeline. Starting from ~1.5M Bengali math problems, we apply multi-stage quality filtration, verification, deduplication, and decontamination.
Results
Main Results
GanitLLM enables smaller models to match or exceed larger counterparts while reasoning natively in Bengali:
| Model | Bn-MGSM ↑ | Bn-MSVAMP ↑ | Words ↓ | Bengali % ↑ |
|---|---|---|---|---|
| GPT-4.1 | 89.20 | 82.30 | 200 | 88.16% |
| GPT-4.1-mini | 87.20 | 78.60 | 232 | 88.18% |
| Qwen3-32B | 85.60 | 76.10 | 712 | 21.08% |
| Qwen3-14B | 83.60 | 75.80 | 767 | 17.87% |
| Qwen3-8B | 69.20 | 52.60 | 977 | 19.26% |
| Qwen3-4B | 69.20 | 70.50 | 943 | 14.79% |
| GanitLLM-4B Ours | 76.80 | 76.40 | 193 | 88.71% |
| Qwen3-1.7B | 15.20 | 14.10 | 1124 | 19.64% |
| GanitLLM-1.7B Ours | 52.80 | 66.80 | 210 | 87.80% |
| Qwen3-0.6B | 8.40 | 12.20 | 1265 | 12.43% |
| GanitLLM-0.6B Ours | 28.40 | 52.40 | 248 | 88.70% |
Ablation: Why Multi-stage Training?
SFT grounds language, GRPO improves accuracy. Both are necessary:
| Configuration | Bn-MGSM | Bn-MSVAMP | Words | Bengali % |
|---|---|---|---|---|
| Qwen3-4B (base) | 69.20 | 70.50 | 943 | 14.79% |
| + SFT only | 74.00 | 74.60 | 184 | 86.65% |
| + CGRPO only | 82.40 | 78.50 | 844 | 14.94% |
| SFT + CGRPO (Ours) | 76.80 | 76.40 | 193 | 88.71% |
CGRPO alone achieves highest accuracy but only 14.94% Bengali reasoning. Our multi-stage approach balances accuracy with interpretability.
Models & Dataset
Pre-trained Models
All models available on HuggingFace:
| Model | Params | Training |
|---|---|---|
| GanitLLM-4B_SFT_CGRPO | 4B | SFT + CGRPO |
| GanitLLM-4B_SFT_GRPO | 4B | SFT + GRPO |
| GanitLLM-1.7B_SFT_CGRPO | 1.7B | SFT + CGRPO |
| GanitLLM-1.7B_SFT_GRPO | 1.7B | SFT + GRPO |
| GanitLLM-0.6B_SFT_CGRPO | 0.6B | SFT + CGRPO |
| GanitLLM-0.6B_SFT_GRPO | 0.6B | SFT + GRPO |
GANIT Dataset
Difficulty-aware Bengali math dataset:
Difficulty Distribution
| Difficulty | Criteria | GanitDEV |
|---|---|---|
| Easy | >75% LLMs correct | 28.7% |
| Medium | 50-75% correct | 26.0% |
| Hard | 25-50% correct | 24.3% |
| Olympiad | <25% correct | 21.3% |
BibTeX
will be updated