MergeOn
Free Trial
Engineering reference

Substrate Reference Architecture

A technical map of the MergeOn substrate: semantic normalization, dependency intelligence, evidence lineage, governed execution, and operational replay. This is the reference for architects and engineering buyers.

Substrate topology

How the layers compose

MIL is the context boundary. THEMIS is the deterministic reasoning compute. Together they sit between any frontier model and the systems of record an enterprise relies on.

Operator
or workflow trigger
LLM
Any frontier model
MIL
MIL
Governed Context Firewall
THEMIS
THEMIS
Deterministic Reasoning
Systems of Record
Customer infrastructure
The model never touches systems of record directly
Context is shaped by MIL before reaching the model. Reasoning is delegated to THEMIS, which executes against the canonical record.
MIL — context pipeline

Six-stage governed context pipeline

Every request passes through this pipeline before context reaches a model. Stages are composable but cannot be reordered.

Intent Detection
Resolve what the requester actually needs.
Entity Linking
Bind references to canonical operational identity.
Context Selection
Retrieve only evidence required by the request.
Retrieval Policy
Enforce access, scope, and need-to-know boundaries.
Redaction
Remove sensitive fields before any external model receives context.
Context Shaping
Structure the bounded context the model is permitted to see.
Semantic normalization

The substrate types the world before reasoning over it

Canonical identity resolution

Deduplicates parties, assets, and obligations across schemas and acquisitions.

Coordinate-anchored extraction

Every value carries source page, region, and exact pixel coordinates.

Version-aware semantics

Amendments resolve against the correct effective version at the queried point in time.

Rule-pack-driven typing

Domain knowledge ships as configuration, not as engine code.

Dependency intelligence

Relationships are a graph, not a vibe

Typed dependency graph

Edges carry causation, derivation, and reference semantics — not just adjacency.

Cascade propagation

A change at any vertex is propagated to dependents under explicit policy.

Constraint satisfaction

The substrate refuses to ingest a state that violates declared invariants.

Contradiction surfacing

Surfaces conflicts between documents before they reach downstream systems.

THEMIS — reasoning layers

Nineteen orthogonal engines, one substrate

THEMIS exposes nineteen reasoning engines that compose into the six capability clusters summarised on the THEMIS page. The full enumeration follows. Every engine is rule-pack-driven and vertical-agnostic.

  1. Document Intake
    Multi-format ingestion with layout-aware parsing.
  2. Structure Parser
    Hierarchical section detection and cross-reference linking.
  3. AST Generator
    Abstract syntax tree for the contractual or operational language.
  4. Semantic Analyzer
    Concept extraction and defined-term resolution.
  5. Entity Extractor
    Party identification and canonical role assignment.
  6. Obligation Engine
    Duty extraction, rights identification, condition mapping.
  7. Temporal Analyzer
    Effective dates, supersession chains, deadline derivation.
  8. Dependency Graph
    Cross-clause and cross-document references as a typed graph.
  9. Constraint Solver
    Logical consistency verification and contradiction detection.
  10. State Machine
    Lifecycle tracking, milestone resolution, transition validation.
  11. Risk Scorer
    Clause-level and corpus-level risk quantification.
  12. Comparator
    Version differencing against canonical baselines.
  13. Anomaly Detector
    Unusual clauses and outlier obligations.
  14. Transaction Simulator
    Outcome and closing-probability simulation.
  15. Self-Healing Engine
    Deterministic fix proposals for surfaced contradictions.
  16. Amendment Generator
    Constrained drafting that respects every dependency.
  17. Compliance Checker
    Regulatory and policy alignment.
  18. Knowledge Integrator
    External-data enrichment against the canonical record.
  19. Output Formatter
    Report generation, structured exports, downstream API responses.
Governed execution

The substrate constrains action, not just reading

Policy enforcement at retrieval

Access, scope, and need-to-know boundaries apply before any context leaves the substrate.

Mediated tool invocation

Model-initiated tool calls evaluate against policy before any side-effect.

Explainable redaction

Every redaction decision is traceable to the rule that produced it.

Replay-safe operations

Every decision can be re-derived from logged context, policy, and inputs.

Replay
Every decision is reconstructable. Logged context plus logged policy plus logged inputs yields the same output, every time, against the version of the canonical record that was active at the moment of the decision.
Provider abstraction

Models are commodity. The substrate is not.

Reasoning engines are interchangeable

MIL exposes a uniform context contract to any frontier model.

No fine-tuning lock-in

Knowledge does not enter weights. Provider swap is a configuration change.

Multi-engine coordination

Retrieval, reasoning, and policy engines compose without binding to a single vendor.

Deployment-shape neutral

Cloud, customer-cloud, hybrid, and air-gapped deployments share the same substrate interfaces.

Reasoning providers
OpenAIAnthropicGoogleAzure OpenAIAWS BedrockCohereSelf-Hosted
Deployment modes
Cloud-hostedCustomer cloudHybridAir-gapped
Data source families
Document storesRelational databasesKnowledge graphsFile systemsERPsCRMsData lakesREST + GraphQL APIs
Survivability

The substrate is operational, not exploratory

Customer-held storage

Source corpora remain in customer infrastructure under their key material.

Regional residency

Policy and storage zoning are first-class concepts, not afterthoughts.

Versioned canonical record

The operational truth is exportable, diff-able, and rollback-safe.

Auditor-grade lineage

Every operation links back to source, policy, and the context the model received.

Bring engineering into the substrate

If you are architecting how AI will operate inside your enterprise, this is the conversation to have early — not after a procurement cycle.

Talk to engineeringSee MIL