Practical Guide

Stop guessing.
Start optimizing.

EigenPrompt finds better LLM prompts automatically — cheaper, more accurate, or both — so you can make data-driven decisions instead of hopeful ones.

The Problem

Manual prompt tuning doesn't scale

Every team building with LLMs hits the same wall. You write a prompt, it works okay, and then you spend days tweaking it — rephrasing instructions, adjusting examples, restructuring output formats — trying to squeeze out better results.

LLM Engineers

Building classification, extraction, or summarization into a product? Replace guesswork with data on which prompt actually performs best.

Engineering Managers

Your team is spending days on prompt tuning instead of shipping features. EigenPrompt condenses that work into a 5–10 minute automated run.

CTOs & Product Leaders

Concerned about LLM costs at scale? See the exact trade-off between quality and cost — quantified, not guessed.

How It Works

From prompt to Pareto frontier

You provide your current prompt and an evaluation dataset. EigenPrompt systematically generates and tests hundreds of variations, then shows you the best options on an interactive cost-vs-quality chart.

Your InputsPrompt templateEval dataset (CSV)Model selectionOptimization EngineGenerate variationsEvaluate on train setValidate on held-out setRetain non-dominatedIterate & refinePareto FrontierCost →Quality →
1

Define what "good" looks like

Provide an evaluation dataset — example inputs paired with expected outputs. This is the yardstick EigenPrompt uses to measure whether a variation is actually better. A good dataset has 50–200 examples. Upload a CSV, or generate a synthetic dataset directly from the platform.

2

Provide your prompt

Paste in the prompt you're currently using. If it has variable placeholders (like {{customer_name}} or {{document_text}}), EigenPrompt detects them automatically and maps them to your dataset columns.

3

Choose your model and go

Select your LLM provider and model, pick an optimization preset, and launch. Results stream to your browser in real time.

PresetIterationsBest for
Standard8Quick exploration, most use cases
Advanced8Deeper search with higher cost budget
Max15Thorough optimization for production-critical prompts
4

Pick the winner and deploy

Click any point on the Pareto frontier to inspect the full prompt text, its quality score, cost per call, and latency. Copy it in one click and drop it into your application.

Core Concept

The Pareto frontier, explained

You want the best quality at the lowest cost. No single prompt wins on both — the most accurate ones use more tokens and cost more. But there's a set of prompts where you can't improve one dimension without worsening the other. That set is the Pareto frontier, and it contains every worthwhile option.

Pareto frontier chart showing quality vs cost trade-offs — baseline, frontier, and dominated candidates with size representing latency

Every blue point represents a prompt that can't be beaten in both dimensions simultaneously - together they form the Pareto frontier. The grey points to the bottom right are strictly worse - aka dominated - meaning a frontier prompt exists that is either cheaper at the same quality, or better at the same cost, or both. Your original prompt appears as a diamond baseline marker, so you can see how much headroom exists at a glance.

Case Study

Entity resolution in 8 minutes

A data engineering team used EigenPrompt to optimize their entity-matching prompt — a common task where small prompt differences measurably affect accuracy and cost. One Standard run (8 iterations, ~8 minutes) produced:

Baseline

0.72

accuracy

$0.008 per call · hand-tuned

Best Quality

0.91

accuracy · +26%

$0.009 per call · negligible cost increase

Best Value

0.73

accuracy · -62% cost

$0.003 per call · same accuracy

Both improved prompts sat on the Pareto frontier. The team deployed the high-accuracy variant for their production pipeline and the cheap variant for bulk data cleaning jobs. Two prompts, two use cases, one optimization run.

Dataset QA

Your data has bugs too

During optimization, hundreds of prompt variations evaluate every example in your dataset. When nearly all prompts disagree with the expected output, it usually means the label is wrong — not the prompts.

Diagnostic table showing majority answer disagreeing with expected labels — evidence of mislabeled evaluation data

Here, the expected labels say “account” or “general” but the majority of prompts consistently answer “billing.” This kind of labeling error silently caps optimization performance — EigenPrompt catches it automatically.

Under the Hood

What happens behind the scenes

When you launch a run, EigenPrompt doesn't randomly rewrite your prompt. It uses a multi-strategy optimization loop that learns from each iteration:

Baseline evaluation

Your current prompt is evaluated against the full dataset to establish a performance anchor.

Variation generation

Dozens of strategies are applied — improving accuracy, cutting token count, restructuring reasoning flow — to produce candidate prompts.

Training-set screening

Each variation is tested on a training subset first. A preflight check catches obviously broken variants before wasting evaluation budget.

Held-out validation

Surviving candidates are evaluated on a held-out test set for reliable scoring. Only non-dominated candidates are retained.

Iterate & refine

The system learns from what worked and generates smarter variations each round. Batching (multiple examples per LLM call) keeps optimization costs low.

Best Suited For

What kind of tasks work best?

EigenPrompt is designed for single, well-defined LLM tasks within a larger workflow — tasks where success is clearly measurable.

Task typeEvaluation approach
Entity extractionQuantitative (exact/fuzzy)
Classification / routingQuantitative (exact match)
SummarizationQualitative (LLM judge)
Information extractionQuantitative (substring)
Tool callingQuantitative (exact match)
Content generationQualitative (judge + rubric)

Less suited: Open-ended creative writing with no evaluation criteria, or highly interactive multi-turn conversations where a single prompt doesn't capture the full picture.

Compatibility

100+ models, all major providers

Choose separate models for evaluation (the model you're optimizing for production) and meta operations (the model that generates prompt variations). These can be from different providers.

OpenAI
Anthropic
Google
Mistral
Groq
Cerebras
DeepSeek
Fireworks AI
Together AI
Cohere
AWS Bedrock
Azure OpenAI
Ollama
LM Studio
Llamafile
& 90+ more
Pricing

Simple, credit-based pricing

One credit = one optimization run. LLM inference costs go through your own provider accounts — the platform estimates total LLM cost before each run, and you set a maximum budget ($1–$1,000).

PlanCredits / monthDataset limitPrice
Starter100100 examples$99 / month
Pro5001,000 examples$299 / month
Business2,00010,000 examples$999 / month

Improvement guarantee: If EigenPrompt doesn't find a prompt that improves on your baseline in at least one dimension, your credit is refunded automatically — no support ticket required.

Security

How your data is protected

A healthy dose of scepticism when a platform asks for your API keys and evaluation data is entirely reasonable. Here's exactly what happens.

Key Encryption ArchitectureYOUR PIN6-digit PINArgon2id hashedWeak PINs rejectedKEY DERIVATIONPIN + server master secret→ HKDF-SHA256→ Key Encryption KeyENCRYPTED STORAGEKEK wraps random DEKDEK encrypts API keysAES-256-GCMADDITIONAL PROTECTIONSKeys never reach the browser30-min session after PIN entry5 failed attempts → lockoutTiming-safe secret comparisonKeys stripped from all logsRate-limited login & signup

You bring your own API keys. EigenPrompt doesn't resell API access or proxy through shared accounts. Usage appears on your own provider dashboard, your rate limits apply, and you can revoke access at any time by rotating your keys.

Evaluation data encryption: Your datasets and run data are encrypted at rest with per-account AES-256-GCM keys derived via HKDF. Deleted datasets and runs are soft-deleted immediately and hard-purged on a retention schedule. Account deletion purges all associated data and anonymizes your record.

Quick Start

Up and running in five minutes

1

Sign up at eigenprompt.ai and start a trial.

2

Store your API keys in Settings — encrypted and PIN-protected.

3

Create a new optimization — a guided wizard walks you through every step.

4

Upload your eval dataset as a CSV, or generate a synthetic one to experiment.

5

Launch the run and watch the Pareto frontier build in real time.

6

Pick the best prompt for your needs and deploy it.

Your prompts, optimized.

No black boxes. No lock-in. Your keys, your models, your data, your choice.