That is, $0.0000037 per query. Or: $3.70 per million matched lines. Flat. Linear. No bulk discount needed because the baseline is already negligible.
This is the cost of a fully-loaded production query: nginx ingress, gunicorn dispatch, FastAPI request parsing, Snake SAT vote across the relevant field model, fuzzy fallback if needed, JSON response with audit trail, full TLS round-trip. End to end. Same number whether the query hits Tier 1 (SAT 100%), Tier 2 (fuzzy), or returns no match.
The natural alternative is an LLM call to interpret each line. Same task, different substrate. The cost gap spans four to five orders of magnitude, so the only honest way to draw it is on a log scale:
| Approach | Latency | Cost / line | Determinism | Audit trail |
|---|---|---|---|---|
| Snake API | 3.7 ms | $3.7×10−9 | Yes | SAT clauses |
| Claude Haiku 4.5 | ~400 ms | ~$0.0001 | No | Free-text only |
| GPT-4o-mini | ~600 ms | ~$0.00015 | No | Free-text only |
| Embedding similarity | ~80 ms | ~$0.00002 | ~Yes | Cosine score |
| SageMaker endpoint | ~50 ms | ~$0.0001 | Yes | Model-dependent |
The whole stack lives on a single always-on instance — no queue, no Lambda, no serverless cold start. So the per-query cost is simply the instance's hourly rate divided by sustained throughput:
At ~260 q/s sustained (single-instance benchmark) the amortized compute is below $10−6 per query. Folding in a conservative ~50% utilization assumption — the instance is provisioned 24/7 but real traffic ebbs and flows — lands the headline at $3.7·10−9.
The instance is deliberately over-provisioned: RAM headroom absorbs OCR
batch storms and rebuild_all overlap without thrashing. The fixed monthly cost
is dominated by that always-on instance and is a low-three-figure line item —
negligible against the value of every order line it routes correctly and deterministically.
Let F be the fixed monthly cost, cS = $3.7×10−9 the Snake per-line cost, and cL = $10−4 the LLM per-line cost. Snake wins above the crossover volume k* where total costs cross:
With a low-three-figure monthly fixed cost, the crossover lands around a few million queries per month. Production load on this instance runs comfortably above that — firmly in Snake territory, where every additional million lines costs $3.70 instead of $100.
Each tenant carries one global model, a set of field models, and a client model. Model size scales with catalog size — tens to a few hundred MB per tenant, deserialized population included. The full tenant set resides in every worker process; Python object overhead roughly doubles the on-disk JSON size at runtime. The instance is sized so the entire fleet fits resident with large headroom for additional workers as traffic grows.
A full rebuild_all retrains every tenant's article + client models from
scratch. Triggered nightly via cron, on-demand by ops, or after a bulk synonym import.
| Factor | Value |
|---|---|
| Wall-clock duration | ~9 min 20 s (article models) |
| Disk I/O | ~GB-scale write to /tmp + upload to S3 |
| S3 PUT requests | ~20 model files per tenant |
| Worker reload (SIGHUP) | ~30 s, fresh worker pool, old workers drain in-flight |
Onboarding is code-free. The auto-onboarding cron polls the upstream database, discovers new factory IDs, and trains them with zero human intervention — it even skips IDs that exist but carry no articles yet:
| Step | Cost | Time |
|---|---|---|
| Auto-onboarding cron picks up new factory_id from the upstream DB | $0 | polling cycle |
| populate_models.py training run | cents | 5–10 min |
| S3 upload | ~$0.001 | 30 s |
| SIGHUP reload | $0 | 30 s |
| Total | ~cents marginal | < 15 min from new tenant to live matching |
No code changes needed. Factory IDs are discovered dynamically via S3 listing
(model_manager._load_all_models()) and the upstream factory registry.
Lambda would be tempting: scale-to-zero, pay-per-invoke. But:
SnakeBatch (the distributed trainer at snakebatch.aws.monce.ai) does
use Lambda — because training is bursty and parallel. Inference is
steady and stateful. Different workload, different infra.
Charles Dana · Monce SAS · snake.aws.monce.ai · deployed 2026-05-20
Co-Authored-By: Claude (Anthropic)