Qorx is built around a plain measurement idea: keep large local context on the machine, send a small visible carrier to the model, and resolve exact evidence locally when the workflow needs it.
This page gives the math in public. There is no secret science here. Some parts are shipped in Community Edition today. Some parts, such as account entitlements and managed provider routing, belong to Qorx Ayie or Qorx Cloud.
Qorx uses a deterministic local token estimate unless a command says otherwise:
tokens_est(text) = ceil(utf8_character_count(text) / 4)
This is simple and repeatable. It is not a provider tokenizer and it is not an invoice number. It is good enough for local planning, benchmark comparison, and budgeted evidence selection.
Baseline-to-Compact accounting compares the local context mass with the model-visible carrier:
baseline_tokens = sum(tokens_est(indexed_local_quarks))
visible_tokens = tokens_est(carrier_or_pack_text)
b2c = baseline_tokens / max(1, visible_tokens)
omitted_tokens = max(0, baseline_tokens - visible_tokens)
In the current live proof, Qorx indexes the repo locally and sends a much smaller carrier or evidence pack. That proves context omission and resolver availability. It does not prove that every downstream answer is correct.
Qorx indexes files into bounded quarks. A quark carries path, line range, hash, token estimate, symbols, and sparse lexical features.
At query time, the packer chooses quarks under a token budget:
score = lexical_match
+ path_match
+ symbol_match
+ structure_match
+ cache_reuse_value
- token_cost
- duplication_penalty
- omission_risk
The exact weights live in the Rust source. The important part is the shape: Qorx picks useful local evidence before spending model-visible tokens.
qorx squeeze is extractive. It keeps matching lines from indexed quarks and
drops unrelated lines:
squeeze_reduction = source_quark_tokens / max(1, squeezed_excerpt_tokens)
It does not invent missing context. If the local index does not support an answer, strict evidence commands return an unsupported result instead of filling the gap with guesswork.
The live benchmark reports three useful ratios:
session_reduction = indexed_tokens / max(1, session_carrier_tokens)
pack_reduction = indexed_tokens / max(1, evidence_pack_tokens)
squeeze_reduction = indexed_tokens / max(1, squeeze_extract_tokens)
These are local Qorx ratios. They are strong proof that the visible context is smaller than the local evidence store. They are not a universal quality score.
Qorx can estimate avoided input cost from omitted local tokens:
estimated_input_usd = omitted_tokens * input_price_per_1m / 1_000_000
estimated_cached_usd = cached_tokens * cached_price_per_1m / 1_000_000
Provider invoices are the source of truth for billed savings. Qorx local numbers are planning and proof numbers.
Qorx Ayie Starter uses a service-side request allowance:
included_requests = 5000
used_requests = count(successful_metered_edge_cloud_requests)
remaining_requests = max(0, included_requests - used_requests)
usage_ratio = used_requests / included_requests
Only successful metered Ayie/Cloud service actions should count. Local CE commands stay unmetered.
The service should record an idempotent request id so retries do not burn extra allowance:
usage_key = hash(account_id, install_id, request_id, feature, day)
The client can display remaining requests, but the server owns the ledger. That is how Qorx can stay open source locally while protecting the official hosted service from unlimited free usage.
Qorx does not magically make models truthful. It reduces one common failure: asking a model to answer from context it never received.
The strict path is:
index local evidence
select bounded evidence
cite exact paths and line ranges
answer only from selected evidence
return unsupported when evidence is missing
That gives the downstream agent a smaller, cited working set. It is still the agent’s job to use the evidence correctly, and it is still the maintainer’s job to test the final code.
The local source map is REFERENCE_PAPERS.md. The current paper set and source list includes:
| Area | References |
|---|---|
| Prompt compression | LLMLingua, LongLLMLingua, Active Context Compression, Gist Tokens, Experience Compression Spectrum |
| Repository and code context | ReACC, Codebase-Memory, BM25 and lexical retrieval, Aider repository map |
| Agent memory | AgeMem, AtomMem, Titans, TokMem, Memory Survey, Structural Memory |
| Hierarchical memory | H-MEM, HiMem, TierMem, GAM |
| Cache and reuse | Preble, Don’t Break the Cache, Similarity Caching, GPT Semantic Cache, RAGCache, Cache-Craft, Approximate Caching for RAG, Domain-Specific Semantic Cache, vCache, ContextPilot, QVCache |
| Runtime and KV scope | TurboQuant, vCache, QVCache |
| Provider and tooling docs | OpenAI prompt caching, Anthropic prompt caching, Gemini context caching, Claude Code memory, Gemini CLI GEMINI.md, Cursor codebase indexing |
| Provenance and storage | Protocol Buffers, NIST FIPS 204, C2PA Specification, Microsoft kernel-mode signing requirements |
These references support the architecture class. Qorx-specific claims still need Qorx-specific benchmarks, tests, routed usage logs, and task success evaluation.