TabularGRPO: Modern Mixture-Of-Experts Transformer with Group Relative Policy Optimization GRPO for Tabular Data Learning
Abstract
Tabular data remains the cornerstone of decision-making in healthcare, finance, and industrial analytics. We propose TabularGRPO, a novel reinforcement learning framework that synergizes Mixture-of-Experts (MoE) architectures with variance-reduced policy gradients. TabularGRPO addresses three fundamental challenges in tabular learning: 1) Feature-type heterogeneity through dynamic expert routing, 2) Class imbalance via group-wise advantage normalization, and 3) Sample inefficiency with KL-regularized policy updates. Evaluations on challenging datasets demonstrate TabularGRPO’s superiority over current dominanting models as XGBoost, Catboost with 6.0% higher precision and 13.0% higher F1 score, establishing new state-of-the-art performance. Code and benchmarks are publicly released. The code we used to train and evaluate our models is available at https://github.com/enkhtogtokh/tabulargrpo
Related articles
Related articles are currently not available for this article.