JD.com Develops Efficient AI Training Paradigm for Reasoning
Researchers at JD.com and academic institutions have developed a new training technique called Reinforcement Learning with Verifiable Rewards with Self-Distillation (RLSD). This approach allows enterprise teams to build custom AI reasoning models more efficiently by combining the strengths of traditional reinforcement learning and self-distillation methods. Experimental results show that models trained with RLSD significantly outperform those using older algorithms, thus lowering both technical and financial barriers for enterprises looking to develop tailored AI solutions.
The implications of this new training paradigm are considerable. By streamlining the training process and reducing the reliance on larger teacher models, RLSD enables companies to train AI systems that are both cost-effective and effective. This technique not only enhances the development of AI systems specific to business needs but also promotes national autonomy in AI technologies by minimizing dependence on external resources and large-scale compute infrastructures, signifying a step towards greater data sovereignty in AI development.