Substrate Reference Architecture
A technical map of the MergeOn substrate: semantic normalization, dependency intelligence, evidence lineage, governed execution, and operational replay. This is the reference for architects and engineering buyers.
How the layers compose
MIL is the context boundary. THEMIS is the deterministic reasoning compute. Together they sit between any frontier model and the systems of record an enterprise relies on.
Six-stage governed context pipeline
Every request passes through this pipeline before context reaches a model. Stages are composable but cannot be reordered.
The substrate types the world before reasoning over it
Canonical identity resolution
Deduplicates parties, assets, and obligations across schemas and acquisitions.
Coordinate-anchored extraction
Every value carries source page, region, and exact pixel coordinates.
Version-aware semantics
Amendments resolve against the correct effective version at the queried point in time.
Rule-pack-driven typing
Domain knowledge ships as configuration, not as engine code.
Relationships are a graph, not a vibe
Typed dependency graph
Edges carry causation, derivation, and reference semantics — not just adjacency.
Cascade propagation
A change at any vertex is propagated to dependents under explicit policy.
Constraint satisfaction
The substrate refuses to ingest a state that violates declared invariants.
Contradiction surfacing
Surfaces conflicts between documents before they reach downstream systems.
Nineteen orthogonal engines, one substrate
THEMIS exposes nineteen reasoning engines that compose into the six capability clusters summarised on the THEMIS page. The full enumeration follows. Every engine is rule-pack-driven and vertical-agnostic.
- Document IntakeMulti-format ingestion with layout-aware parsing.
- Structure ParserHierarchical section detection and cross-reference linking.
- AST GeneratorAbstract syntax tree for the contractual or operational language.
- Semantic AnalyzerConcept extraction and defined-term resolution.
- Entity ExtractorParty identification and canonical role assignment.
- Obligation EngineDuty extraction, rights identification, condition mapping.
- Temporal AnalyzerEffective dates, supersession chains, deadline derivation.
- Dependency GraphCross-clause and cross-document references as a typed graph.
- Constraint SolverLogical consistency verification and contradiction detection.
- State MachineLifecycle tracking, milestone resolution, transition validation.
- Risk ScorerClause-level and corpus-level risk quantification.
- ComparatorVersion differencing against canonical baselines.
- Anomaly DetectorUnusual clauses and outlier obligations.
- Transaction SimulatorOutcome and closing-probability simulation.
- Self-Healing EngineDeterministic fix proposals for surfaced contradictions.
- Amendment GeneratorConstrained drafting that respects every dependency.
- Compliance CheckerRegulatory and policy alignment.
- Knowledge IntegratorExternal-data enrichment against the canonical record.
- Output FormatterReport generation, structured exports, downstream API responses.
The substrate constrains action, not just reading
Policy enforcement at retrieval
Access, scope, and need-to-know boundaries apply before any context leaves the substrate.
Mediated tool invocation
Model-initiated tool calls evaluate against policy before any side-effect.
Explainable redaction
Every redaction decision is traceable to the rule that produced it.
Replay-safe operations
Every decision can be re-derived from logged context, policy, and inputs.
Models are commodity. The substrate is not.
Reasoning engines are interchangeable
MIL exposes a uniform context contract to any frontier model.
No fine-tuning lock-in
Knowledge does not enter weights. Provider swap is a configuration change.
Multi-engine coordination
Retrieval, reasoning, and policy engines compose without binding to a single vendor.
Deployment-shape neutral
Cloud, customer-cloud, hybrid, and air-gapped deployments share the same substrate interfaces.
The substrate is operational, not exploratory
Customer-held storage
Source corpora remain in customer infrastructure under their key material.
Regional residency
Policy and storage zoning are first-class concepts, not afterthoughts.
Versioned canonical record
The operational truth is exportable, diff-able, and rollback-safe.
Auditor-grade lineage
Every operation links back to source, policy, and the context the model received.
Bring engineering into the substrate
If you are architecting how AI will operate inside your enterprise, this is the conversation to have early — not after a procurement cycle.