Black Box vs. Open Box

The way we think about collecting high-quality human data for GenAI use cases:

  • Data has become core product / IP for both foundation model and application layer companies. It increasingly makes sense to build data collection and annotation processes in-house, both for improving core capabilities and data security.
  • The quality of your annotator team is the biggest lever on the quality of your data. QA processes and tooling used to be the biggest lever with non-specialized work, but now, annotators must truly be experts to push the frontier of model capabilities.
  • Experimentation speed is critical for continuously improving your model. Data annotation should not be a blocker for your experimentation cycles.

Unfortunately, neither of the two options research teams have for creating human data are perfect:

  1. Outsource the entire data collection/annotation to external vendors, resulting in:
    1. Slower experimentation speed, as time is spent negotiating per-task prices and sharing feedback with the vendor.
    2. Data security concerns, especially if the vendor prefers to use their own platform. Many vendors have already leaked data.
    3. Lack of transparency around cost and quality of data annotators, since vendors can be incentivized to keep costs low by obfuscating time spent and annotator backgrounds.
  2. Build a human data team themselves, which comes with:
    1. Higher fixed costs, in the form of hiring a larger internal human data team and maintaining a team of annotators.
    2. Having to constantly source, vet, hire, performance manage, and terminate contractors for your annotator team.
    3. Spend time and resources on operations which are not core competencies.

Our goal at Mercor is to help our clients achieve the best of both worlds with an “Open Box” approach, vs the “Black Box” approach commonly favored by data vendors. The Open Box is characterized by (1) simple hourly pricing that enables quick iteration, (2) optimizing for the highest quality expert annotators, and (3) allowing clients to maintain control over their data — all without spending time sourcing and fully managing annotators themselves.

A more detailed comparison of the Black Box and Open Box approaches:
CriteriaBlack BoxOpen Box
Experimentation speed
  • Spend time negotiating with vendors over price per task for each iteration
  • Play a game of telephones between your research team, the vendors, and annotators
  • No new pricing negotiations every time you kick off a new project, just a flat %
  • Work directly with annotators when you need to iterate quickly
Data security
  • Risk data getting leaked by unknown annotators if stored or processed on vendor’s platform
  • Retain control of data and IP by keeping work on your own platform if preferred
  • Know exactly who handles your data
Cost structure
  • Unknown vendor margins
  • Full visibility into annotator pay
Management overhead
  • Vendor handles operations, but will likely still need a small team to coordinate with research teams, manage vendors, and assess quality
  • Build an internal human data team, but outsource sourcing and managing annotators, and setting up data pipelines and processes to vendors

Here’s how we help teams build the “Open Box” solution:

  1. Create a proposal on scope and methodology
    1. We come in with a custom proposal based on our initial understanding of their needs, and will collect input on specific clarifications on topics like data distribution, volume, and what they’re optimizing for.
    2. There’s no lengthy pricing and scoping discussions each time they kick off a new project. We just take a flat percentage of the annotators’ pay rates.
  2. Source and vet high quality talent
    1. We have 300k+ experts in our talent pool, and will look at a combination of factors including interviews, work experience, education, GitHub profiles, Google Scholar citations, and more to find the best annotators for specific projects.
    2. We surface all selected annotator profiles before moving forward, for full transparency.
  3. Design and set up pipelines
    1. Teams can choose to use their own platform or ours. We can set up the workflow and pipeline with our tooling if they don’t have a platform to start with.
    2. We have a set of documents and processes to get started with as part of our Human Data Handbook — this includes guidelines, style guides, rubrics, and pipeline designs. We can help modify these for custom needs.
    3. We also help set up QA processes, from automated metrics to peer review pipelines, based on the specifics of the type of data and required quality.
  4. Measure key metrics on an ongoing basis
    1. We define a set of metrics to monitor the quality, volume, and cost efficiency of data being created, on aggregate and for each annotator.
    2. Based on the project, these metrics can be more or less custom. Quality in particular is usually quite context-specific, so we often create rubrics specific to each project to define quality for annotator work.
  5. Swap out talent based on performance and changing needs
    1. In cases where annotators are underperforming or no longer needed for projects, we will notify and off-board them. We do this proactively based on metrics.
    2. If companies have new needs, we source, surface, and onboard new annotators within hours to days. Teams can also use our platform to easily search for specific top talent themselves.
It’s our hope that research teams can get the best of both worlds (building an in-house data team and outsourcing to data vendors) with the Open Box approach.