The AI Productivity Index (APEX) assesses whether frontier models are capable of performing economically valuable tasks across four jobs: investment banking associate, management consultant, big law associate, and primary care physician (MD).
The APEX leaderboard shows the ranking of frontier models measured against the hidden heldout eval set for APEX. Scores are models' average across the four jobs.
View more
GPT 5 (High)
67% ± 2.45%
GPT 5.2 Pro (High)
66.8% ± 2.6%
Gemini 3 Pro (High)
64.3% ± 2.3%
The ACE leaderboard shows the ranking of frontier models measured against the hidden held out eval set for ACE. Scores are models' average across the four activities.
View more
GPT 5 (High)
56.1% ± 3.3%
o3 Pro (High)
55.2% ± 3.2%
GPT 5.1 (High)
55.1% ± 3.2%