SaaS / B2B Software · Containerized Dataset Registry At Scale

Ethara.ai

Human-in-the-loop AI data infrastructure platform.

Weeks operated

3k+

Hours of work

Engineers

The brief

Accelerate development of aligned, safe, and powerful AI by building the highest-quality human-in-the-loop data infrastructure at scale — encompassing annotator training and certification, multi-project annotation execution across multiple AI lab clients, containerized reproducible dataset environments, and the internal tooling needed to manage annotation workforce quality and throughput at industrial scale.

What we built

A full-stack AI data infrastructure platform serving as the operational backbone for human-in-the-loop model alignment work. Phase 1 (Oct–Dec 2025): Annotator onboarding on Outlier platform, RLHF/SFT training data generation for Claude Sonnet, TTS model annotation (Guitar Pinstripe/Riff), and UI annotation (Meadow UI). Phase 2 (Mar–May 2026): Industrial-scale Dockerized dataset registry (5,000+ Docker images across multiple programming languages pushed to AWS ECR), large-scale agent trajectory generation for Claude and Gemini models (Jaeger/Jaeger 2.0 projects), ARC-AGI-style reasoning game suite (Arc Agents), VINDEX response quality evaluation, KAIJU repository validation, Leviathan PRD generation for web-building AI agents, and an ETP (Ethara Task Platform) internal workflow tool with rubric-based QC flows. Throughout: production EKS infrastructure with GPU/CPU Karpenter autoscaling, Istio service mesh, multi-model AWS Bedrock hosting (Claude, Qwen, Kimi K2.5, MiniMax, GLM), and full GitOps CI/CD via ArgoCD and GitHub Actions.

Live in production

Production AWS EKS cluster with Bedrock-hosted multi-model inference (Claude, Kimi, MiniMax, GLM) is live. 5,000+ Dockerized dataset environments pushed to ECR. 10,000+ annotation tasks delivered to clients. Trajectory datasets for Claude and Gemini LHT benchmarks delivered. ARC-style reasoning game suite (158+ games) QC'd and delivered. ETP Task Platform UI designs fully delivered for development implementation. Ethara's application stack (arc agents, ETP backend/frontend) deployed on production and staging EKS clusters.

Delivery timeline

How it was built, phase by phase.

8 workstreams across 30 weeks of operated delivery.

buildWeek 1–7 (Oct–Nov 2025)
RLHF & SFT Model Training / Rubric Engineering
Deep hands-on work training LLMs via Supervised Fine-Tuning and Reinforcement Learning from Human Feedback.
Trained annotators and engineers on rubric-based evaluation, SFT fine-tuning with Claude Sonnet, and RLHF preference data generation.
Claude SonnetPythonRLHF frameworksSFT pipelinesRubric frameworks
discoverWeek 1–4 (Nov 2025)
Outlier / Third-Party AI Data Platform Onboarding
Multiple engineers onboarded to Outlier (Scale AI contractor platform), completing structured training sessions on LLM post-training, prompt techniques, SFT, RLHF, and evals.
Engineers certified on Outlier platform, enabling Ethara to route AI data annotation and model training tasks through an established contractor.
Outlier platformLLM evaluation toolsPython
buildWeek 4–7 (Nov–Dec 2025)
AI Data Project Execution (Guitar Pinstripe, Guitar Riff, Meadow UI, Happy Robots)
Execution of multiple named AI data generation and UI annotation projects. Guitar Pinstripe/Riff focused on TTS (Text-to-Speech) model prompt/response correction and optimization.
Delivered curated prompt-response pairs for TTS model training, UI annotation data via Figma.
FigmaOutlier platformTTS model evaluation toolsPython
buildWeek 18–26 (Mar–May 2026)
Dockerized Dataset Registry & LHT Data Pipeline
Large-scale Phase 2 effort to build a multi-language Docker image registry for AI training datasets.
5,000 Docker images built and pushed to ECR, covering multiple programming language environments for LHT dataset reproducibility at scale.
DockerAWS ECRPythonGitHubHarness
buildWeek 18–26 (Mar–Apr 2026)
ARC-AGI / Grid-Based Reasoning Game Development (Arc Agents)
A dedicated engineer (Arshia Parmar) spent 4+ weeks building a suite of grid-based reasoning games resembling ARC-AGI tasks—including symmetry detection, pathfinding, spatial reasoning, object detection.
Full suite of ARC-style reasoning games with QC validation, delivered with metadata and scoring infrastructure for AI model evaluation benchmarking.
PythonBFS algorithmsRule enginesScoring frameworksQC scripts
testWeek 18–27 (Mar–Apr 2026)
VINDEX / KAIJU Repository Validation & Response Evaluation
Deeksha Pathak executed two sustained annotation projects: VINDEX (response quality evaluation using rubrics on a 1-6 scale.
Thousands of response ratings and repository validations delivered.
GitPythonRubric frameworksEvaluation spreadsheets
buildWeek 22–30 (Apr–May 2026)
Trajectory Generation for Claude & Gemini Models (Jaeger Project)
Systematic generation of agent trajectories for Claude and Gemini models under the Jaeger and Jaeger 2.0 projects.
Large-scale trajectory datasets generated for Claude and Gemini across LHT benchmarks, delivered as structured JSONL for downstream model training.
Claude (Anthropic)Gemini (Google)HarnessDockerPythonAWS ECR
buildWeek 26–28 (Apr–May 2026)
Leviathan Project — AI Website Training Data (PRD Generation)
Deeksha Pathak worked on generating high-quality training data for AI agents that build award-winning websites by creating detailed Product Requirements Documents (PRDs).
Structured PRD training corpus created for web-development AI agents, contributing to agentic coding capability training data.
PythonPRD templatesLLM evaluation tools

More case studies

Related work

Officebanao

SaaS / B2B Software · Collaborative 2d Space Planning Tool

Multi-module SaaS platform for commercial interior design and procurement.

53 wks8k hrs8 eng

Compport

SaaS / B2B Software · Comp SaaS Enterprise Scaling

Integrated SaaS Compensation Management Platform.

89 wks6k hrs27 eng

Fanclash

SaaS / B2B Software · Kafka Event Driven Video Pipeline

Multi-product sports-tech platform.

85 wks6k hrs12 eng

09 · Run a function

Stop renting hours. Start running functions.

Pick the function you want off your plate. We'll map the brain and name the outcome we'd commit to — before you do.

Claims & underwriting Member & benefits admin KYC & onboarding Product delivery

Talk to us Model your function

Ethara.ai

The brief

What we built

How it was built, phase by phase.

RLHF & SFT Model Training / Rubric Engineering

Outlier / Third-Party AI Data Platform Onboarding

AI Data Project Execution (Guitar Pinstripe, Guitar Riff, Meadow UI, Happy Robots)

Dockerized Dataset Registry & LHT Data Pipeline

ARC-AGI / Grid-Based Reasoning Game Development (Arc Agents)

VINDEX / KAIJU Repository Validation & Response Evaluation

Trajectory Generation for Claude & Gemini Models (Jaeger Project)

Leviathan Project — AI Website Training Data (PRD Generation)

Related work

Stop renting hours. Start running functions.