Free Sandbox tier · 60M tokens / month · no card
Cut your LLM bill 40–60%. Four optimizations, measured live.
Drop-in proxy for OpenAI, Anthropic, and 12 more. Routes to cheaper-equivalent models, caches repeated prompts, compresses context, batches eligible calls. Every request shows the dollars saved, live. Quality stays at ≥ 0.95 against your golden set or routing auto-disables. Remove one line of code and you're back to direct API in under two hours.
4Apache 2.0 · TypeScript + Python
curl https://api.tesseraai.io/v1/openai/chat/completions \
-H "X-Tessera-Key: tk_<your-free-key>" \
-H "Authorization: Bearer sk-<your-openai-key>" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [{"role":"user","content":"Hello"}]
}'
# Response is plain OpenAI shape.
# Behind the scenes: route + cache + compress + batch.
# Open ledger.tesseraai.io/portal — savings counter ticks live.Free tier ceiling
60M
tokens / mo
Optimization mechanics
4
per request
Providers supported
13
one OpenAI-shaped config each
Fee on free tier
$0
forever, until you upgrade
Providers supported
- OpenAI
- Anthropic
- Google Gemini
- xAI
- Cohere
- Mistral
- DeepSeek
- Groq
- Together
- Fireworks
- OpenRouter
- Perplexity
- Cerebras
GDPR ready
EEA-only Confidential Information storage
DPA on request
Standard contractual clauses on file
SOC 2 Type I
In progress · Q3 2026 attestation target
Vendor-neutral
Zero affiliate revenue, contractually bound
How it works
Three minutes to first measured savings.
01
Sign up
Email + ToS. No card. Get your tk_ key + magic-link instantly at ledger.tesseraai.io/signup-dev.
02
Drop two headers
Point your OpenAI / Anthropic client at api.tesseraai.io. Send your provider key in Authorization. Send your Tessera key in X-Tessera-Key. That's it.
03
Watch the counter
Every request goes through the proxy. Savings measured per-request, surfaced live in ledger.tesseraai.io/portal.
01 · LangChain
The dedicated tessera-langchain package wires ChatOpenAI / ChatAnthropic / ChatMistralAI through the Tessera proxy with one line of config — no header plumbing.
02 · Vercel AI SDK
The dedicated @tessera-llm/vercel-ai package wires createOpenAI / createAnthropic / createMistral through the Tessera proxy. Works with generateText, streamText, generateObject — unchanged.
03 · LlamaIndex
The dedicated tessera-llamaindex package wires the OpenAI / Anthropic / Mistral / Groq / Cohere LLM classes through the Tessera proxy. RAG, agents, query engines run unchanged.
Four mechanics, one proxy
Four optimizations every LLM request runs through.
These four are what you see. Internally Tessera runs nine — chained auto-route, per-role prompt split, output-length prediction, batch reconciliation, and more — plus reliability primitives that disable a single mechanic combination on canary regression and cross-provider failover on primary upstream 5xx. Full list at /how-it-works.
Route
Auto-route to cheaper-equivalent models
For each of your endpoints, we pre-compute which cheaper model returns equivalent quality. GPT-4o → GPT-4o-mini. Claude Opus → Sonnet. A 5% quality canary locks the assumption.
Cache
Auto-cache repeated prompt hashes
Identical-prompt requests within a 7-day window return cached responses. Hash-locked. Per-key TTL. Cache miss falls through transparently.
Compress
Auto-compress context with semantic preservation
Collapse redundant whitespace + structural noise from prompts before they hit the LLM. Per-role opt-in (system / user turns independent). Preserves code fences and JSON structure verbatim. LLMLingua-2 template substitution on roadmap.
Batch
Auto-batch eligible requests
When latency tolerates, batch parallel calls into a single upstream request. Provider batch APIs (50% discount on OpenAI, etc.) used when available.
Pricing
Free Sandbox for exploration. A paid tier when you scale.
Flat monthly subscription by token volume — you keep 100% of measured savings.
Free Sandbox
$0
forever, up to limit
- 60M tokens / month
- 30 requests / minute
- All 4 optimization mechanics
- Real-time savings counter
- Anomaly alerts (read-only)
- Apache 2.0 SDK · Python + TypeScript
- No card required
Paid tiers
Flat monthly subscription by token volume — you keep 100% of measured savings.
- Starter$199 / mo · ≤ 1B tokens / moStart with Starter →
- Growth$999 / mo · ≤ 5B tokens / moStart with Growth →
- Scale$3,999 / mo · ≤ 20B tokens / moStart with Scale →
- EnterpriseCustom · 20B+ tokens / moTalk to us →
- 60 requests / minute
- Flat monthly billing via Stripe
- Monthly savings statement + CSV export (audit-grade)
- Auto-throttle on cost spike, auto-halt on runaway
- Team seats (up to 5)
- Quality SLA floor 0.95 · auto-rollback on drift
FAQ
Why a proxy and not just an observability layer?
Observability shows you what your LLM calls did. Tessera does the optimization inside the request path. Same dollar saved, zero engineer hours. Compatible with whatever telemetry you already run.
I already have observability set up.
Keep it. Tessera sits on the request path; your existing tracer still receives downstream telemetry. Different layer, not a replacement.
Will routing change my output quality?
Per-workload, we run a quality canary on 5% of traffic. If the cheaper-equivalent model drifts >10% on score, the route auto-disables for that workload and Sentry alerts. Quality is never traded for cost.
What's the 60M ceiling for?
Free Sandbox tier prevents production traffic from squatting indefinitely on free quota. Hobby + side projects rarely hit it. When you do, upgrade to a paid plan: a flat monthly subscription by token volume, and you keep 100% of measured savings.
What providers are supported today?
OpenAI, Anthropic, Google (Gemini AI Studio), xAI, Cohere, Mistral, DeepSeek, Groq, Together, Fireworks, OpenRouter, Perplexity, Cerebras. AWS Bedrock, Azure OpenAI, Vertex AI — September 2026.
Where does my data go?
Through Cloudflare Workers (request-path proxy) → upstream provider (your existing key, your existing billing relationship). We log token counts + cost deltas in Supabase (Tessera-managed). Per-request prompt content is not stored. Full audit at /security.
Free key. 30 seconds. No card.
Get free API key →Email + ToS only. Magic link sign-in. Key shown once — copy it to your secret manager.