Frontier Data for Frontier AI

We develop benchmarks, evaluation environments, and large-scale human datasets to fuel AI breakthroughs at the frontier, all through our marketplace of top-tier experts.

View Benchmarks
Frontier Data Illustration
0M+Annotations
0K+Experts
0+Domains
0Benchmarks

Data, evals, and post-training at the frontier

Mercor is used by the top 5 AI labs and 6 of the Mag 7.

Frontier Data

When model capabilities reach their limits, progress depends on data quality. Mercor's talent platform mobilizes deep subject-matter experts across professional and consumer domains to produce specialized data at scale.

Model Improvements

Frontier-grade data unlocks advanced reasoning, long-horizon planning, tool use, and safe behavior under uncertainty. We power meaningful gains with novel datasets that are realistic, challenging, and diverse.

RL Environments

We build reinforcement learning (RL) environments in three steps: creating realistic data-rich worlds that capture real behavior, implementing the tools and applications that agents need to interact with the world, and making rigorous tasks and verifiers.

Frontier DataModel ImprovementsRL Environments

Benchmarks

Benchmarks for evaluating the strengths and weaknesses of frontier models on high-value tasks

APEX-Agents

The AI Productivity Index for Agents (APEX-Agents) measures whether frontier AI agents can execute long-horizon, cross-application tasks across three jobs in professional services.

Read More
GPT 5.2 (xHigh)

GPT 5.2 (xHigh)

48.2% ± 3.5%

Gemini 3.1 Pro (High)

Gemini 3.1 Pro (High)

48.1% ± 3.4%

Opus 4.6 (Max)

Opus 4.6 (Max)

47.7% ± 3.4%

0%
10%
20%
30%
40%
50%
60%

APEX

The AI Productivity Index (APEX) assesses whether frontier models are capable of performing economically valuable tasks across four jobs: investment banking associate, management consultant, big law associate, and primary care physician (MD).

Read More
GPT 5 (High)

GPT 5 (High)

67% ± 2.4%

GPT 5.2 Pro (High)

GPT 5.2 Pro (High)

66.8% ± 2.6%

Gemini 3 Pro (High)

Gemini 3 Pro (High)

64.3% ± 2.3%

40%
50%
60%
70%
80%

ACE

The AI Consumer Index (ACE) assesses whether frontier AI models can perform everyday consumer tasks in shopping, food, gaming, and DIY.

Read More
GPT 5 (High)

GPT 5 (High)

56.1% ± 3.3%

o3 Pro (High)

o3 Pro (High)

55.2% ± 3.2%

GPT 5.1 (High)

GPT 5.1 (High)

55.1% ± 3.2%

30%
40%
50%
60%
70%

APEX-SWE

Coming Soon

Mercor Blog

Read our latest insights in frontier data and AI research.

Opportunities

We're looking for exceptional people to join our Research and Engineering team.

View All Openings