Inside the engine.

This is how Altadore processes a message. Local classification, privacy scrubbing, structured memory. Type something and watch the real pipeline work.

Send a message through the pipeline

connecting to Pi...

pipeline — idle

Message In

raw text

▸

The Gate

Deterministic / Local

14 questions in <5ms. No API call. No cost.
Real names replaced before anything reaches the cloud.

▸

Classify

14 local + 8 deferred

▸

Routing

decision

internet

Sort

fast model — classify + plan

▾

Recall

memory retrieval

▾

Fetch

external data

External data pulled before the model thinks — not after.

▾

Think

reasoning model — generates response

skipped on DEEP_LITE

Simple messages skip the expensive reasoning call.

▾

Shape

fast model — voice + format

restore

Response Out

pseudonyms rehydrated

pipeline —

stages —

total time —

george

How it thinks

It doesn't guess. It scores.

Every message runs through a deterministic local gate before a single cloud API is called. The gate answers 22 binary yes/no questions — 14 resolved instantly by regex and keyword matching, 8 deferred to the cloud. It classifies what you're actually asking, scores how complex it is, detects PII, and routes to the correct pipeline.

The split is structural, not dynamic. 14 questions are answered deterministically in under 5ms — things like "is this a greeting?" or "does this contain a date?" The remaining 8 require language understanding and go to the first cloud call. The gate handles the cheap stuff. The cloud handles the hard stuff.

Simple stuff — greetings, confirmations — gets pattern-matched at zero cost. No API call. No latency. Real questions get real compute.

Privacy is structural, not promised. Every decision traces to a flag you can read.

Two-tier classification

How the gate works

The 22 questions split into two tiers. Tier 1 (14 questions) runs locally — regex, keyword matching, pattern detection. Done in under 5ms. Tier 2 (8 questions) requires language understanding and goes to the first cloud call alongside classification.

TIER 1 — 14 questions, local

Deterministic. Regex and keyword matching. Is this a greeting? Does it contain a date? Is it a single word? Answered in <5ms, zero cost.

TIER 2 — 8 questions, cloud

Requires inference. What domain is this? How complex? What emotional tone? Resolved by the first cloud call alongside response planning.

The split is fixed — always 14 local, always 8 deferred. A greeting gets pattern-matched by Tier 1 and never reaches the cloud at all. A complex question still gets its 14 local answers instantly, then sends the remaining 8 to the cloud. The system scales cloud usage to match complexity, not message length.

Privacy architecture

Nothing leaves the building unless it has to.

Before any message reaches the cloud, a 3-layer PII scanner (word list, regex, NER) finds names, phone numbers, emails, addresses, and sensitive identifiers. Names become realistic pseudonyms — not bracket tokens. The cloud models see natural language they were trained on, not synthetic [PERSON_1] syntax. Real names never reach an API.

LOCAL (Pi / Desktop) ─────────────────── CLOUD (Cloud Models)

Real data stays here: ─── sanitized text ───▸ Cloud sees only:
Phil Henderson ──────────────────── Michael Chen
403-555-0192 ───────────────────── 403-555-0147
phil@altadore.ai ──────────────────── [EMAIL_1]

◄──────────────── RESTORE ────────────────
rehydrate pseudonyms back to real values

The token map lives in local process memory. Never serialized. Never sent to any API. The cloud generates a response using pseudonyms, then the restore pass swaps them back before the user sees it.

Nothing leaves the building unless it has to.

10-vector memory schema

Every fact is scored, not stored.

Each piece of information in Altadore carries ten numerical scores — weight, depth, domain, expiry, sensitivity, confidence, urgency, valence, feedback, scope. The system doesn't search a text file. It runs vector math against a SQLite table and pulls exactly what matters.

Four pipelines

Four pipelines, one gate

DEEP — Full pipeline. 3 cloud calls: classify and plan (fast model), generate the response (reasoning model), enforce voice and format (fast model). External data pulled before the model thinks. For complex, multi-domain questions.

DEEP_LITE — Same pipeline, skips the expensive reasoning call. 2 cloud calls instead of 3. Simple messages skip the expensive model — the first call both classifies and drafts the response in one pass.

QUICK — Fast cloud model. One or two calls. Quick answers, casual questions, low-stakes lookups.

SNAP — Zero API calls. Zero LLM calls. Pattern-matched responses for greetings, confirmations, one-word replies. Instant. Free.

Most engine stages run at zero cost. The expensive part is the thinking — and the system only thinks when it has to.

Engine modules

What's inside

The engine is modular. Each piece does one thing. Green border means zero API cost — pure logic, math, and local ops. Amber means cloud model calls.