Snake API: Economics of Deterministic Matching

Charles Dana · Monce SAS · May 2026

snake.aws.monce.ai · /paper · /architecture · /math

1. The headline number

Cost per matched line: ~$3.7 × 10⁻⁹

That is, $0.0000037 per query. Or: $3.70 per million matched lines. Flat. Linear. No bulk discount needed because the baseline is already negligible.

This is the cost of a fully-loaded production query: nginx ingress, gunicorn dispatch, FastAPI request parsing, Snake SAT vote across the relevant field model, fuzzy fallback if needed, JSON response with audit trail, full TLS round-trip. End to end. Same number whether the query hits Tier 1 (SAT 100%), Tier 2 (fuzzy), or returns no match.

2. Cost per matched line vs the alternatives

The natural alternative is an LLM call to interpret each line. Same task, different substrate. The cost gap spans four to five orders of magnitude, so the only honest way to draw it is on a log scale:

Cost per matched line, log scale. Snake sits ~27,000× left of Claude Haiku — a full deterministic match for the price of four-millionths of an LLM call.

Approach	Latency	Cost / line	Determinism	Audit trail
Snake API	3.7 ms	$3.7×10⁻⁹	Yes	SAT clauses
Claude Haiku 4.5	~400 ms	~$0.0001	No	Free-text only
GPT-4o-mini	~600 ms	~$0.00015	No	Free-text only
Embedding similarity	~80 ms	~$0.00002	~Yes	Cosine score
SageMaker endpoint	~50 ms	~$0.0001	Yes	Model-dependent

Snake API is ~27,000× cheaper than Claude Haiku per matched line.

3. The amortization model

The whole stack lives on a single always-on instance — no queue, no Lambda, no serverless cold start. So the per-query cost is simply the instance's hourly rate divided by sustained throughput:

$/query = (instance $/sec) / (sustained q/s)

At ~260 q/s sustained (single-instance benchmark) the amortized compute is below $10⁻⁶ per query. Folding in a conservative ~50% utilization assumption — the instance is provisioned 24/7 but real traffic ebbs and flows — lands the headline at $3.7·10⁻⁹.

The instance is deliberately over-provisioned: RAM headroom absorbs OCR batch storms and rebuild_all overlap without thrashing. The fixed monthly cost is dominated by that always-on instance and is a low-three-figure line item — negligible against the value of every order line it routes correctly and deterministically.

3.1 At what volume does the math flip?

Let F be the fixed monthly cost, c_S = $3.7×10⁻⁹ the Snake per-line cost, and c_L = $10⁻⁴ the LLM per-line cost. Snake wins above the crossover volume k* where total costs cross:

F + k·c_S < k·c_L ⇒ k* = F / (c_L − c_S) ≈ F / c_L

Below k* an LLM is cheaper (no instance to amortize). Above it, Snake's flat curve dominates — and stays dominant forever, because its slope is four orders of magnitude shallower.

With a low-three-figure monthly fixed cost, the crossover lands around a few million queries per month. Production load on this instance runs comfortably above that — firmly in Snake territory, where every additional million lines costs $3.70 instead of $100.

4. Memory and per-tenant footprint

Each tenant carries one global model, a set of field models, and a client model. Model size scales with catalog size — tens to a few hundred MB per tenant, deserialized population included. The full tenant set resides in every worker process; Python object overhead roughly doubles the on-disk JSON size at runtime. The instance is sized so the entire fleet fits resident with large headroom for additional workers as traffic grows.

5. Operational cost: rebuild_all

A full rebuild_all retrains every tenant's article + client models from scratch. Triggered nightly via cron, on-demand by ops, or after a bulk synonym import.

Factor	Value
Wall-clock duration	~9 min 20 s (article models)
Disk I/O	~GB-scale write to /tmp + upload to S3
S3 PUT requests	~20 model files per tenant
Worker reload (SIGHUP)	~30 s, fresh worker pool, old workers drain in-flight

A full rebuild costs cents of compute. Nightly cron: a coffee a month.

6. Adding a new tenant

Onboarding is code-free. The auto-onboarding cron polls the upstream database, discovers new factory IDs, and trains them with zero human intervention — it even skips IDs that exist but carry no articles yet:

Step	Cost	Time
Auto-onboarding cron picks up new factory_id from the upstream DB	$0	polling cycle
populate_models.py training run	cents	5–10 min
S3 upload	~$0.001	30 s
SIGHUP reload	$0	30 s
Total	~cents marginal	< 15 min from new tenant to live matching

No code changes needed. Factory IDs are discovered dynamically via S3 listing (model_manager._load_all_models()) and the upstream factory registry.

7. Why we don't go serverless

Lambda would be tempting: scale-to-zero, pay-per-invoke. But:

Model loading. A cold v5.4.6 Snake model takes 100–1300 ms to deserialize. Per-invocation cold starts wreck latency.
Memory. The full resident tenant set is multi-GB. Lambda cost scales with memory-time; the math gets ugly fast at steady load.
State. Fuzzy matchers cache per-tenant dictionaries. A long-lived instance keeps them warm forever. Lambda would rebuild on every cold start.
Observability. A persistent instance + journalctl is one tail away. Lambda needs CloudWatch plumbing.

SnakeBatch (the distributed trainer at snakebatch.aws.monce.ai) does use Lambda — because training is bursty and parallel. Inference is steady and stateful. Different workload, different infra.

8. Summary

$3.7 × 10−9 per matched line at production scale
~27,000× cheaper than Claude Haiku for the same task
Cents marginal cost per new tenant onboarding
Cents per full rebuild_all (nightly cron negligible)
Same per-line cost at a thousand queries/month or a hundred million
Latency is not just lower — it's ~100× lower than any LLM alternative

Charles Dana · Monce SAS · snake.aws.monce.ai · deployed 2026-05-20
Co-Authored-By: Claude (Anthropic)