Alibaba Introduces HDPO Optimizing AI Tool Efficiency

Global AI Watch·30 April 2026·3 min read·VentureBeat AI

Alibaba's researchers unveiled a new reinforcement learning framework called Hierarchical Decoupled Policy Optimization (HDPO), which significantly improves AI agents' efficiency. The Metis model trained with HDPO manages to reduce redundant tool invocations dramatically, from 98% down to 2%, while achieving state-of-the-art accuracy on key benchmarks. This framework addresses the latency and costs associated with excessive API calls, which are common in existing large language models that frequently misuse external tools despite having sufficient internal information.

The implications of this development are substantial. By decoupling the training processes for accuracy and efficiency, HDPO enables models to maximize correctness without falling into the trap of excessive tool usage. This strategic innovation not only enhances computational resource utilization but also positions Alibaba as a leader in AI agent technology, potentially reducing the dependency on external API services and fostering greater autonomy in domestic AI capabilities. Overall, this advancement could reshape the landscape of AI applications by enabling more responsive and cost-effective systems.

Source

VentureBeat AIhttps://venturebeat.com/orchestration/alibabas-metis-agent-cuts-redundant-ai-tool-calls-from-98-to-2-and-gets-more-accurate-doing-it

Read original

Explore Trackers

Global AI Activity MapLive regional intelligence

Alibaba Introduces HDPO Optimizing AI Tool Efficiency

Related Sovereign AI Articles

AI Expert Develops Tools to Combat Deepfakes

AI Model Surpasses Doctors in Clinical Diagnosis Accuracy

AI Advances Bacterium Design for Custom Proteins

OpenAI Addresses 'Goblin' Phenomenon in Latest LLM Update

Goodfire Launches Silico to Enhance LLM Debugging

Explore Trackers