Inside the engine.

This is how Altadore processes a message. Local classification, privacy scrubbing, structured memory. Type something and watch the real pipeline work.

Send a message through the pipeline
connecting to Pi...
pipeline — idle
Message In
raw text
The Gate
Deterministic / Local
14 questions in <5ms. No API call. No cost.
Real names replaced before anything reaches the cloud.
Classify
14 local + 8 deferred
Routing
decision
internet
Sort
fast model — classify + plan
Recall
memory retrieval
Fetch
external data
External data pulled before the model thinks — not after.
Think
reasoning model — generates response
skipped on DEEP_LITE
Simple messages skip the expensive reasoning call.
Shape
fast model — voice + format
restore
Response Out
pseudonyms rehydrated
pipeline
stages
total time
george

It doesn't guess. It scores.

Every message runs through a deterministic local gate before a single cloud API is called. The gate answers 22 binary yes/no questions — 14 resolved instantly by regex and keyword matching, 8 deferred to the cloud. It classifies what you're actually asking, scores how complex it is, detects PII, and routes to the correct pipeline.

The split is structural, not dynamic. 14 questions are answered deterministically in under 5ms — things like "is this a greeting?" or "does this contain a date?" The remaining 8 require language understanding and go to the first cloud call. The gate handles the cheap stuff. The cloud handles the hard stuff.

Simple stuff — greetings, confirmations — gets pattern-matched at zero cost. No API call. No latency. Real questions get real compute.

Privacy is structural, not promised. Every decision traces to a flag you can read.

How the gate works

The 22 questions split into two tiers. Tier 1 (14 questions) runs locally — regex, keyword matching, pattern detection. Done in under 5ms. Tier 2 (8 questions) requires language understanding and goes to the first cloud call alongside classification.

TIER 1 — 14 questions, local
Deterministic. Regex and keyword matching. Is this a greeting? Does it contain a date? Is it a single word? Answered in <5ms, zero cost.
TIER 2 — 8 questions, cloud
Requires inference. What domain is this? How complex? What emotional tone? Resolved by the first cloud call alongside response planning.

The split is fixed — always 14 local, always 8 deferred. A greeting gets pattern-matched by Tier 1 and never reaches the cloud at all. A complex question still gets its 14 local answers instantly, then sends the remaining 8 to the cloud. The system scales cloud usage to match complexity, not message length.

Nothing leaves the building unless it has to.

Before any message reaches the cloud, a 3-layer PII scanner (word list, regex, NER) finds names, phone numbers, emails, addresses, and sensitive identifiers. Names become realistic pseudonyms — not bracket tokens. The cloud models see natural language they were trained on, not synthetic [PERSON_1] syntax. Real names never reach an API.

LOCAL (Pi / Desktop) ─────────────────── CLOUD (Cloud Models)

Real data stays here: ─── sanitized text ───▸ Cloud sees only:
Phil Henderson ──────────────────── Michael Chen
403-555-0192 ───────────────────── 403-555-0147
phil@altadore.ai ──────────────────── [EMAIL_1]

◄──────────────── RESTORE ────────────────
rehydrate pseudonyms back to real values

The token map lives in local process memory. Never serialized. Never sent to any API. The cloud generates a response using pseudonyms, then the restore pass swaps them back before the user sees it.

Nothing leaves the building unless it has to.

Every fact is scored, not stored.

Each piece of information in Altadore carries ten numerical scores — weight, depth, domain, expiry, sensitivity, confidence, urgency, valence, feedback, scope. The system doesn't search a text file. It runs vector math against a SQLite table and pulls exactly what matters.

Four pipelines, one gate

DEEP — Full pipeline. 3 cloud calls: classify and plan (fast model), generate the response (reasoning model), enforce voice and format (fast model). External data pulled before the model thinks. For complex, multi-domain questions.

DEEP_LITE — Same pipeline, skips the expensive reasoning call. 2 cloud calls instead of 3. Simple messages skip the expensive model — the first call both classifies and drafts the response in one pass.

QUICK — Fast cloud model. One or two calls. Quick answers, casual questions, low-stakes lookups.

SNAP — Zero API calls. Zero LLM calls. Pattern-matched responses for greetings, confirmations, one-word replies. Instant. Free.

Most engine stages run at zero cost. The expensive part is the thinking — and the system only thinks when it has to.

What's inside

The engine is modular. Each piece does one thing. Green border means zero API cost — pure logic, math, and local ops. Amber means cloud model calls.