Skip to main content

Free Sandbox tier · 60M tokens / month · no card

Cut your LLM bill 40–60%. Four optimizations, measured live.

Drop-in proxy for OpenAI, Anthropic, and 12 more. Routes to cheaper-equivalent models, caches repeated prompts, compresses context, batches eligible calls. Every request shows the dollars saved, live. Quality stays at ≥ 0.95 against your golden set or routing auto-disables. Remove one line of code and you're back to direct API in under two hours.

4Apache 2.0 · TypeScript + Python

api.tesseraai.io · 30s demo
curl https://api.tesseraai.io/v1/openai/chat/completions \
  -H "X-Tessera-Key: tk_<your-free-key>" \
  -H "Authorization: Bearer sk-<your-openai-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role":"user","content":"Hello"}]
  }'

# Response is plain OpenAI shape.
# Behind the scenes: route + cache + compress + batch.
# Open ledger.tesseraai.io/portal — savings counter ticks live.

Free tier ceiling

60M

tokens / mo

Optimization mechanics

4

per request

Providers supported

13

one OpenAI-shaped config each

Fee on free tier

$0

forever, until you upgrade

Providers supported

  • OpenAI
  • Anthropic
  • Google Gemini
  • xAI
  • Cohere
  • Mistral
  • DeepSeek
  • Groq
  • Together
  • Fireworks
  • OpenRouter
  • Perplexity
  • Cerebras

GDPR ready

EEA-only Confidential Information storage

DPA on request

Standard contractual clauses on file

SOC 2 Type I

In progress · Q3 2026 attestation target

Vendor-neutral

Zero affiliate revenue, contractually bound

How it works

Three minutes to first measured savings.

01

Sign up

Email + ToS. No card. Get your tk_ key + magic-link instantly at ledger.tesseraai.io/signup-dev.

02

Drop two headers

Point your OpenAI / Anthropic client at api.tesseraai.io. Send your provider key in Authorization. Send your Tessera key in X-Tessera-Key. That's it.

03

Watch the counter

Every request goes through the proxy. Savings measured per-request, surfaced live in ledger.tesseraai.io/portal.

01 · LangChain

The dedicated tessera-langchain package wires ChatOpenAI / ChatAnthropic / ChatMistralAI through the Tessera proxy with one line of config — no header plumbing.

tessera-langchain

02 · Vercel AI SDK

The dedicated @tessera-llm/vercel-ai package wires createOpenAI / createAnthropic / createMistral through the Tessera proxy. Works with generateText, streamText, generateObject — unchanged.

@tessera-llm/vercel-ai

03 · LlamaIndex

The dedicated tessera-llamaindex package wires the OpenAI / Anthropic / Mistral / Groq / Cohere LLM classes through the Tessera proxy. RAG, agents, query engines run unchanged.

tessera-llamaindex

Four mechanics, one proxy

Four optimizations every LLM request runs through.

These four are what you see. Internally Tessera runs nine — chained auto-route, per-role prompt split, output-length prediction, batch reconciliation, and more — plus reliability primitives that disable a single mechanic combination on canary regression and cross-provider failover on primary upstream 5xx. Full list at /how-it-works.

Route

Auto-route to cheaper-equivalent models

For each of your endpoints, we pre-compute which cheaper model returns equivalent quality. GPT-4o → GPT-4o-mini. Claude Opus → Sonnet. A 5% quality canary locks the assumption.

Cache

Auto-cache repeated prompt hashes

Identical-prompt requests within a 7-day window return cached responses. Hash-locked. Per-key TTL. Cache miss falls through transparently.

Compress

Auto-compress context with semantic preservation

Collapse redundant whitespace + structural noise from prompts before they hit the LLM. Per-role opt-in (system / user turns independent). Preserves code fences and JSON structure verbatim. LLMLingua-2 template substitution on roadmap.

Batch

Auto-batch eligible requests

When latency tolerates, batch parallel calls into a single upstream request. Provider batch APIs (50% discount on OpenAI, etc.) used when available.

Pricing

Free Sandbox for exploration. A paid tier when you scale.

Flat monthly subscription by token volume — you keep 100% of measured savings.

Free Sandbox

$0

forever, up to limit

  • 60M tokens / month
  • 30 requests / minute
  • All 4 optimization mechanics
  • Real-time savings counter
  • Anomaly alerts (read-only)
  • Apache 2.0 SDK · Python + TypeScript
  • No card required
Get free API key →

Paid tiers

Flat monthly subscription by token volume — you keep 100% of measured savings.

  • 60 requests / minute
  • Flat monthly billing via Stripe
  • Monthly savings statement + CSV export (audit-grade)
  • Auto-throttle on cost spike, auto-halt on runaway
  • Team seats (up to 5)
  • Quality SLA floor 0.95 · auto-rollback on drift

FAQ

Why a proxy and not just an observability layer?

Observability shows you what your LLM calls did. Tessera does the optimization inside the request path. Same dollar saved, zero engineer hours. Compatible with whatever telemetry you already run.

I already have observability set up.

Keep it. Tessera sits on the request path; your existing tracer still receives downstream telemetry. Different layer, not a replacement.

Will routing change my output quality?

Per-workload, we run a quality canary on 5% of traffic. If the cheaper-equivalent model drifts >10% on score, the route auto-disables for that workload and Sentry alerts. Quality is never traded for cost.

What's the 60M ceiling for?

Free Sandbox tier prevents production traffic from squatting indefinitely on free quota. Hobby + side projects rarely hit it. When you do, upgrade to a paid plan: a flat monthly subscription by token volume, and you keep 100% of measured savings.

What providers are supported today?

OpenAI, Anthropic, Google (Gemini AI Studio), xAI, Cohere, Mistral, DeepSeek, Groq, Together, Fireworks, OpenRouter, Perplexity, Cerebras. AWS Bedrock, Azure OpenAI, Vertex AI — September 2026.

Where does my data go?

Through Cloudflare Workers (request-path proxy) → upstream provider (your existing key, your existing billing relationship). We log token counts + cost deltas in Supabase (Tessera-managed). Per-request prompt content is not stored. Full audit at /security.

Free key. 30 seconds. No card.

Get free API key →

Email + ToS only. Magic link sign-in. Key shown once — copy it to your secret manager.

Questions? Join the GitHub Discussions →