Research·APAC

Kuaishou Unveils SRPO to Enhance RL Training Efficiency

Global AI Watch · Editorial Team·7 March 2026·5 min read·Synced Review

Kuaishou has announced a new reinforcement learning (RL) framework called Two-Staged history-Resampling Policy Optimization (SRPO) aimed at addressing inefficiencies in existing training methodologies for large language models (LLMs). The SRPO framework challenges conventional approaches, specifically the standard Reinforcement Learning from Preference Optimization (GRPO), by mitigating performance bottlenecks and optimizing sample utilization. This newly introduced framework reportedly achieves DeepSeek-R1-Zero-level performance using only a fraction of the training steps typically required, marking a significant advancement in RL methodologies.

The implications of this development are substantial, particularly within the context of China's national AI strategies. By enhancing the training efficiency of sophisticated reasoning capabilities in LLMs, this innovation supports domestic advancements in AI technology while potentially reducing reliance on foreign solutions. As more countries focus on autonomous AI development, Kuaishou's contributions could play a pivotal role in shaping the future landscape of AI infrastructure and capabilities in China.

Free Daily Briefing

Top AI intelligence stories delivered each morning.

Subscribe Free →

SourceSynced ReviewRead original

Explore Trackers

Global AI Activity MapLive regional intelligence

Related Articles

ARC Prize Analysis Reveals AI Models' Systematic Errors

CERN Discovers Anomaly in Particle Decay at LHC

KPR Institute Develops Hybrid Model for Health Monitoring

Arabic AI Models Misidentify Cultural Items, Risking Credibility

Top U.S. Scientist Moves to Singapore Amid Policy Changes

Explore Trackers