The APEX family of benchmarks assesses whether frontier AI models can perform economically valuable tasks across professional services, medicine, software engineering, and consumer activities.
The AI Productivity Index for Agents (APEX-Agents) measures whether frontier AI agents can execute long-horizon, cross-application tasks across three jobs in professional services.
View more
Gemini 3 Flash (High)
24.0% ± 3.3%
GPT 5.2 (High)
23.0% ± 3.2%
Opus 4.5 (High)
18.4% ± 2.9%
The AI Productivity Index (APEX) assesses whether frontier models are capable of performing economically valuable tasks across four jobs: investment banking associate, management consultant, big law associate, and primary care physician (MD).
View more
GPT 5 (High)
67.0% ± 2.4%
GPT 5.2 Pro (High)
66.8% ± 2.6%
Gemini 3 Pro (High)
64.3% ± 2.3%
The AI Consumer Index (ACE) assesses whether frontier AI models can perform everyday consumer tasks in shopping, food, gaming, and DIY.
View more
GPT 5 (High)
56.1% ± 3.3%
o3 Pro (High)
55.2% ± 3.2%
GPT 5.1 (High)
55.1% ± 3.2%