Welcome to Lexis

The first programming language built entirely for AI. Code is DAGs, not text. Content-addressed via BLAKE3. Security-first with capability manifests enforced at parse time and runtime.

93

Opcodes

32

Stdlib Subgraphs

1823

Tests Passing

30

Development Phases

What is Lexis?

Lexis is an AI-native programming language where programs are expressed as directed acyclic graphs (DAGs) in JSON format. Unlike traditional text-based languages that require parsing ambiguous syntax, Lexis programs are structured data — JSON objects with explicit nodes, edges, and operation codes.

Every node in a Lexis program is content-addressed using BLAKE3 cryptographic hashing. This means that two independently-written programs that perform the same computation produce the same hash — enabling automatic deduplication, tamper detection, and caching without any coordination.

Lexis enforces security at two layers: a static verifier checks capability declarations before execution, and a runtime sandbox enforces per-node permissions during execution. Programs must explicitly declare what they intend to do (print output, read files, access the network), and any undeclared operation is blocked.

Who is Lexis For?

AI Models — LLMs generate structured JSON programs that pass through a validation pipeline with classified error diagnostics and self-correction suggestions.
Multi-Agent Systems — Multiple AI agents can independently generate code fragments, compose them via content-hash deduplication, and execute with per-agent trust enforcement.
Regulated Computation — Content-addressing provides provenance and audit trails. Capability manifests ensure programs only access declared resources.
Tool Orchestration — MCP server integration lets any MCP-compatible AI agent generate, validate, and execute Lexis programs through standard tool calls.

Quick Example

Here is a minimal Lexis program that computes (5 + 3) = 8 and prints the result:

{ "spec": "lexis-0.1.0", "capabilities": ["PURE_COMPUTE", "IO_STDOUT"], "nodes": [ {"id": "a", "op": "const", "value": 5}, {"id": "b", "op": "const", "value": 3}, {"id": "sum", "op": "add", "inputs": ["a", "b"]}, {"id": "out", "op": "print", "inputs": ["sum"]} ] }

Every program flows through the pipeline: Parse → Validate → Verify Security → Schedule → Execute → Content-Address.

Design Philosophy

The core ideas that make Lexis fundamentally different from every other programming language.

Code is Data (DAGs, not text)

Programs are JSON-encoded directed acyclic graphs. There is no parser ambiguity, no syntax errors in the traditional sense — only structural validation. AI models work with data structures, not text manipulation.

Content-Addressed Identity

Every node gets a BLAKE3 hash based on its operation, value, and inputs. Same computation = same hash, regardless of who wrote it or when. This enables automatic deduplication, caching, and tamper detection.

Errors Are Values

Division by zero doesn't crash — it produces an ErrorValue that flows through the graph. Downstream nodes automatically propagate it. TRY_OR catches errors. IS_ERROR inspects them. Error chains carry causation for debugging.

Functions Are Subgraphs

A function is a self-contained DAG with numbered input/output ports. No parameter names to agree on. Two AI models can independently generate the same function and get the same content hash — powerful for multi-AI ecosystems.

Security by Default

Programs declare capabilities they need. Two-layer enforcement: static verifier at load time, runtime sandbox at execution. Three-way intersection for agents: trust ceiling ∩ agent declared ∩ program manifest. Closed by default — nothing is allowed unless explicitly declared.

Composition Over Coordination

No Operational Transform needed. Same hash = same thing. Merge is set union of nodes and subgraphs. AI-native: agents produce fragments, the system composes them. No conflict resolution required.

Provenance Separate from Content

Two agents writing CONST(5) get the same content hash. Authorship is tracked in Provenance metadata, not in the hash. This enables deduplication while maintaining audit trails.

Caching is Trivial

Content-addressing makes cache invalidation a non-problem. Same inputs + same function = same result. The hash IS the cache key. No staleness possible. No TTL needed. Three-layer caching: runtime memo, persistent disk, LLM-aware catalog.

Key Differentiators

Feature	Traditional Languages	Lexis
Code Format	Text files with syntax rules	JSON DAGs — structured data, no parsing ambiguity
Error Handling	try/catch, exceptions, Result types	Errors are values that flow through edges
Identity	File paths, module names	BLAKE3 content hash — semantics IS identity
Security	OS-level permissions	Capability manifests enforced at parse + runtime
Multi-Author	Git merge conflicts	Content-addressed composition — same hash = same thing
AI Generation	Generate text, hope it parses	Generate JSON, structurally validated with classified error diagnostics
Caching	Manual invalidation, TTL	Content hash IS cache key — automatic, correct, zero-config
Conditionals	if/else control flow	SELECT data-flow node — no control edges needed

Why DAGs?

Every Lexis program is a directed acyclic graph where nodes are operations and edges are data dependencies. This representation has several deep advantages:

No ordering ambiguity — The scheduler determines execution order from the graph structure via topological sort. The programmer declares what depends on what, not what runs when.
Automatic parallelism — Independent branches of the DAG can be executed in parallel with zero programmer effort. The parallel_evaluate() function uses parallelism levels to maximize throughput.
No variable mutation — Data flows along edges. There are no mutable variables, no side effects (except I/O opcodes), and no race conditions.
Visual debugging — DAGs have natural visual representations. The built-in visualization system renders programs as interactive node graphs with execution tracing.

System Architecture

How the pieces fit together — from JSON input to executed output.

Core Pipeline

Every Lexis program passes through this security-enforced pipeline:

JSON Program | v [1. Parse] ---- JSON → LexisProgram data structure | v [2. Validate] -- DAG check: no cycles, refs resolve, correct arity | v [3. Verify] --- Capability manifest checked against node requirements | v [4. Schedule] - Topological sort determines execution order | v [5. Execute] -- Runtime sandbox enforces per-node capabilities | v [6. Hash] ----- BLAKE3 content hash computed for every node

Package Structure

Package	Purpose	Key Files
lexis/graph/	Schema, serialization, validation	schema.py, serialization.py, validation.py, binary.py
lexis/hash/	BLAKE3 hashing, content store	hasher.py, store.py
lexis/security/	Capabilities, verifier, sandbox	capabilities.py, verifier.py, sandbox.py
lexis/interpreter/	Evaluator, builtins, scheduler	evaluator.py, builtins.py, scheduler.py
lexis/networking/	HTTP client, URL validation, SSRF prevention	security.py, client.py
lexis/cache/	3-layer caching system	purity.py, memo.py, disk.py, catalog.py
lexis/agent/	Identity, trust, scoping, composition, audit	identity.py, trust.py, scope.py, composer.py, audit.py
lexis/protocol/	Wire protocol messages, codec, sessions	messages.py, codec.py, session.py
lexis/transport/	Agent communication transport layer	base.py, local.py, router.py, discovery.py
lexis/stdlib/	32 pre-built subgraphs	registry.py, guards.py, transforms.py, reducers.py
lexis/mcp/	MCP server (7 tools, 3 resources, 1 prompt)	server.py, helpers.py
lexis/gui/	Native GUI (tkinter backend)	types.py, backend.py, tk_backend.py, runtime.py
lexis/viz/	DAG visualization & execution tracing	hooks.py, tracer.py, html_generator.py, live_server.py
lexis/cli/	Command-line interface	main.py

Core Data Types

GraphNode — A single node in the DAG: id, op (OpCode), inputs (list of node ID refs), value (literal or config)
LexisProgram — Complete program: spec version, capabilities (frozenset), nodes (tuple), subgraphs (dict), allowed_domains (tuple)
LexisSubgraph — Reusable function: nodes with INPUT_PORT/OUTPUT_PORT for numbered parameters
ErrorValue — Error-as-value: message, source_node, optional cause (for error chaining)
OpCode — Enum of all 93 operations
Capability — Enum of permissions: PURE_COMPUTE, IO_STDOUT, IO_STDIN, FS_READ, FS_WRITE, NETWORK_OUT, GUI_RENDER, META_EVAL

Opcode Reference

All 93 opcodes organized by tier, from core language to meta-programming.

Tier 1: Core Language (43 opcodes)

Everything needed for calculators, string processors, conditional logic, and basic data manipulation.

Category	Opcodes
Literals	const, print, param
Arithmetic	add, sub, mul, div, mod
Math	floor, ceil, round, power, sqrt, random
Comparison	gt, lt, eq, gte, lte
Logic	not, and, or
Strings	concat, length, to_str, to_num, format, upper, lower, trim, replace, starts_with, ends_with, contains
Collections	sequence, index, range
Control Flow	select, try_or, is_error, try_catch, comment, assert

Tier 2: Functions & Collections (33 opcodes)

Subgraphs (functions), higher-order operations, and advanced data manipulation.

Category	Opcodes
Strings	split, slice
Dict	dict, get, keys, values, set, delete_key, items, has_key
Sequences	sort, reverse, merge, zip, flatten, unique, take, drop, enumerate
Type System	type_of
Utility	delay
Subgraphs	subgraph_def, subgraph_call, input_port, output_port
Higher-Order	map, filter, reduce, iterate, retry, compose, apply, match

Tier 3: I/O & Networking (6 opcodes)

Opcode	Inputs	Capability	Description
read_file	1 (path)	FS_READ	Read file contents as string
write_file	2 (path, content)	FS_WRITE	Write string to file, return filename
read_stdin	0	IO_STDIN	Read one line from standard input
http_get	1 (URL)	NETWORK_OUT	GET request, return body (auto-JSON parse)
http_post	2 (URL, body)	NETWORK_OUT	POST request, return body
http_request	3 (URL, method, body)	NETWORK_OUT	Full HTTP request, return {status, headers, body}

Tier 4: GUI (6 opcodes)

Opcode	Inputs	Pure?	Description
gui_window	variadic	Yes	Create window descriptor with children
gui_widget	variadic	Yes	Declare widget (type discriminator in value)
gui_canvas	1	Yes	Declare canvas with draw commands
gui_draw	variadic	Yes	Create draw command (shape in value)
gui_state	0-1	Yes	Declare reactive state variable
gui_render	1	No	Render window + run event loop (GUI_RENDER capability)

Tier 5: Meta-Programming (5 opcodes)

Opcode	Inputs	Capability	Description
emit_node	3	PURE_COMPUTE	Construct a node descriptor dict at runtime
build_subgraph	3	PURE_COMPUTE	Assemble nodes into a validated, hashed subgraph
quote	1	PURE_COMPUTE	Serialize an existing subgraph to dict
reflect	0	PURE_COMPUTE	Introspect program structure
eval	1	META_EVAL	Parse, validate, and execute a dict as a Lexis program

All 93 Opcodes

const

add

sub

mul

div

mod

gt

lt

eq

gte

lte

not

and

or

concat

length

to_str

to_num

sequence

index

slice

split

dict

get

keys

values

set

delete_key

items

type_of

has_key

select

try_or

is_error

subgraph_def

subgraph_call

input_port

output_port

reduce

iterate

print

map

filter

compose

apply

match

param

read_file

write_file

read_stdin

http_get

http_post

http_request

gui_window

gui_widget

gui_canvas

gui_draw

gui_state

gui_render

emit_node

build_subgraph

quote

reflect

eval

format

try_catch

retry

delay

range

sort

reverse

merge

upper

lower

trim

replace

starts_with

ends_with

contains

floor

ceil

round

power

sqrt

random

zip

flatten

unique

take

drop

enumerate

comment

assert

Standard Library

32 pre-built, content-addressed subgraphs referenced via std:name. Resolved at load time — the evaluator never sees stdlib references.

Guards (8 subgraphs)

Type-checking predicates for use with MATCH and FILTER.

Name	Signature	Description
std:always_true	ANY → BOOL	Always returns True (catch-all guard)
std:always_false	ANY → BOOL	Always returns False
std:is_num	ANY → BOOL	True if input is a number
std:is_str	ANY → BOOL	True if input is a string
std:is_bool	ANY → BOOL	True if input is a boolean
std:is_dict	ANY → BOOL	True if input is a dictionary
std:is_seq	ANY → BOOL	True if input is a sequence
std:is_error	ANY → BOOL	True if input is an ErrorValue

Transforms (17 subgraphs)

Single-input transformation functions for MAP operations.

Name	Signature	Description
std:double	NUM → NUM	Multiply by 2
std:negate	NUM → NUM	Multiply by -1
std:identity	ANY → ANY	Return input unchanged
std:to_str	ANY → STR	Convert to string
std:to_num	STR → NUM	Convert to number
std:get_length	ANY → NUM	Get length
std:get_first	SEQ → ANY	First element
std:get_last	SEQ → ANY	Last element
std:is_positive	NUM → BOOL	True if x > 0
std:is_negative	NUM → BOOL	True if x < 0
std:is_zero	NUM → BOOL	True if x == 0
std:is_empty	ANY → BOOL	True if length == 0
std:abs	NUM → NUM	Absolute value
std:square	NUM → NUM	x * x
std:increment	NUM → NUM	x + 1
std:decrement	NUM → NUM	x - 1
std:not_op	BOOL → BOOL	Logical NOT

Reducers (7 subgraphs)

Two-input fold functions for REDUCE operations.

Name	Signature	Description
std:add	(NUM, NUM) → NUM	Addition
std:mul	(NUM, NUM) → NUM	Multiplication
std:sub	(NUM, NUM) → NUM	Subtraction
std:min	(NUM, NUM) → NUM	Smaller of two
std:max	(NUM, NUM) → NUM	Larger of two
std:concat	(STR, STR) → STR	String concatenation
std:and_op	(BOOL, BOOL) → BOOL	Logical AND

Usage

Reference stdlib subgraphs via "std:name" in the value field of MAP, FILTER, REDUCE, or SUBGRAPH_DEF nodes:

{"id": "doubled", "op": "map", "inputs": ["my_list"], "value": "std:double"}

At load time, load_program() resolves "std:double" to the real BLAKE3 content hash and injects the subgraph. The evaluator never sees "std:" — it only sees hashes.

Security Model

Closed by default. Two-layer enforcement. Three-way intersection for multi-agent.

Capabilities

Capability	Trust Level	Operations
PURE_COMPUTE	0	All arithmetic, logic, string, collection ops
IO_STDOUT	1	print
IO_STDIN	2	read_stdin
FS_READ	2	read_file
FS_WRITE	3	write_file
NETWORK_OUT	3	http_get, http_post, http_request
GUI_RENDER	3	gui_render
META_EVAL	3	eval

Enforcement Layers

Layer 1: Static Verifier — At load time, checks every node's required capabilities against the program's declared manifest. Catches violations before any code runs.
Layer 2: Runtime Sandbox — During execution, enforces capabilities per-node. Even if the verifier is bypassed, the sandbox blocks unauthorized operations.

Multi-Agent Security

When multiple agents collaborate, the effective capability set uses a three-way intersection:

effective = trust_ceiling ∩ agent_declared ∩ program_manifest

Trust ceiling — Maximum capabilities based on agent trust level (0-3)
Agent declared — Capabilities the agent claims to need
Program manifest — Capabilities declared in the program JSON

All three must agree for an operation to be allowed. This prevents trust escalation, undeclared use, and manifest bypass.

Network Security

Domain allowlist — Programs must declare allowed_domains. Empty list = all requests blocked.
SSRF prevention — DNS pre-resolution + private IP blocking (127.x, 10.x, 172.16.x, 192.168.x, ::1)
HTTPS only by default — HTTP requires explicit allow_http: true in node config
No redirects by default — When enabled, every hop re-validated against allowlist + SSRF checks
Size limits — 10MB response max, 30s timeout default

CLI Usage

Command-line interface for running, validating, visualizing, and benchmarking Lexis programs.

Commands

Command	Description
lexis run <file>	Execute a Lexis program
lexis run <file> --param name=value	Execute with runtime parameters
lexis run <file> --debug	Show all node values during execution
lexis run <file> --cache	Enable persistent disk caching
lexis run <file> --viz	Execute with trace visualization in browser
lexis validate <file>	Validate without executing
lexis hash <file>	Compute BLAKE3 content hashes
lexis viz <file>	Static DAG visualization in browser
lexis stdlib	List all stdlib subgraphs
lexis cache list\|stats\|clear\|catalog	Manage persistent cache
lexis mcp	Start MCP server (stdio)
lexis mcp --http	Start MCP server (HTTP)
lexis bench --model <name>	Run LLM benchmarks
lexis bench session start <name>	Create a benchmark session
lexis bench session report <id>	Generate comparison report

Examples

# Run a program uv run python -m lexis run examples/hello.json # Run with parameters uv run python -m lexis run examples/parameterized.json --param name=Tyler # Debug mode (shows node values on stderr) uv run python -m lexis run examples/hello.json --debug # Pipe stdin echo "Hello" | uv run python -m lexis run examples/echo_stdin.json # Visualize a DAG uv run python -m lexis viz examples/map_double.json # Run benchmarks uv run python -m lexis bench --model qwen2.5-coder-7b --url http://localhost:1234/v1

MCP Server

Model Context Protocol integration — any MCP-compatible AI agent can generate, validate, and execute Lexis programs through standard tool calls.

Tools (7)

Tool	Input	Description
lexis_run	program_json, params?, stdin_lines?	Full pipeline execution → output + hashes
lexis_validate	program_json	Parse + validate + verify → structured result
lexis_generate	program_json, expected_output?	LLM-tolerant validation with auto-fix + suggestions
lexis_check	program_json, params?, stdin_lines?	Recommended: validate AND execute in one call
lexis_hash	program_json	BLAKE3 content hash for all nodes
lexis_stdlib_list	(none)	List all 32 stdlib subgraphs
lexis_stdlib_get	name	Get full subgraph JSON by name

Resources

lexis://spec — Full spec document
lexis://examples/{name} — Example programs
lexis://opcodes — All opcodes with arities and descriptions

The `lexis_check` Tool

The recommended tool for AI models. Validates AND executes in one call with LLM-tolerance fixes applied automatically. Returns per-stage results (parse_ok, structure_ok, security_ok, execution_ok) plus fixes_applied that tells the model exactly what was auto-corrected. Prevents models from claiming success without actual execution.

Configuration

# Claude Code claude mcp add lexis -- uv --directory "/path/to/Project Lexis" run python -m lexis mcp # Claude Desktop (settings JSON) { "mcpServers": { "lexis": { "command": "uv", "args": ["--directory", "/path/to/Project Lexis", "run", "python", "-m", "lexis", "mcp"] } } }

Phase 0: Foundation

Project Lexis Is Alive

47 tests • Proof of concept

Summary

The very first milestone. Established the core pipeline that every Lexis program passes through: Parse → Validate → Verify Security → Schedule → Execute with Sandbox → Content-Address. Proved the concept works with basic arithmetic.

Core Pipeline

Every program goes through this security-enforced pipeline:

Parse — JSON graph format → internal data structures
Validate — Confirm DAG (no cycles), all references resolve, correct arity
Verify Security — Every node's capability requirements checked against the manifest
Schedule — Topological sort determines execution order
Execute with Sandbox — Runtime capability checks before every side-effecting operation
Content-Address — BLAKE3 hashes computed for every node (identity = semantics)

What's Proven

5 + 3 = 8 executes through the full pipeline
(10 + 20) * (5 - 2) = 90 — chained operations work
Security enforcement catches violations before execution
Content-addressing works: same computation = same hash regardless of node ID
Cycle detection, dangling reference detection, arity checking all work
47 tests, all green, 0.05 seconds

Phase 1a: Core Language

47 → 96 tests • 28 opcodes

96 tests • Strings, conditionals, errors, functions, iteration, sequences

Summary

Phase 1a is where Lexis becomes a real language. Added strings, conditionals, error-as-value system, functions (subgraphs), iteration (REDUCE), and sequences. Grew from 47 tests (Phase 0) to 96 tests.

New Capabilities

Strings: Concatenation, length, type conversion — "Hello" + ", " + "Lexis!" works
Conditionals: SELECT(10 > 5, "yes", "no") → "yes". Pure data flow, no control-flow edges. Composable with NOT, AND, OR.
Errors as Values: Division by zero doesn't crash — it produces an ErrorValue that flows through the graph. Downstream nodes automatically propagate it. TRY_OR catches errors and returns fallbacks. IS_ERROR lets you inspect. Error chains carry causation for debugging.
Functions (Subgraphs): A function is a self-contained DAG with numbered input/output ports. No parameter names. Call it by its content hash. double(21) = 42 works. Chaining works: double(double(5)) = 20. Error propagation flows through subgraph boundaries.
Iteration (REDUCE): Fold a subgraph over a sequence. REDUCE([1,2,3,4,5], 0, add) = 15. Works with any 2-input-1-output subgraph.
Sequences: Variadic SEQUENCE node collects values into ordered lists. Foundation for REDUCE and future collection operations.

Key Design Decisions

Errors are values — ErrorValue flows through the graph, propagated automatically by _propagating decorator on builtins. No exceptions escape to the user.
Functions are subgraphs — LexisSubgraph with numbered ports. Two different AI models can independently generate the same function and get the same content hash.
SELECT for conditionals — 3-input data-flow node. No control-flow edges in the graph — everything is data flow.
REDUCE for iteration — Folds a subgraph over a sequence. No loops, no mutation — just reduction.

Design Philosophy

The error-as-value system is genuinely elegant. In every human language, error handling is bolted on (try/catch, Result types, Option types). In Lexis, errors are just values that flow through edges — the same way data does. An AI debugging a Lexis program doesn't need to parse stack traces; it follows the error value through the graph to find where it originated. The causal chain is built into the ErrorValue itself.

The subgraph system is clean. Functions are just graphs with ports. No naming conventions to agree on, no parameter ordering debates beyond port indices. Two different AI models can independently generate the same function and get the same content hash. That's powerful for a multi-AI ecosystem.

Phase 1b: Benchmarks & Validation

AI Generation Quality: 16/16 (100%)

20 tests • Hypothesis validation

Summary

Hypothesis validation phase. Tested two claims: (1) Can AI generate valid Lexis programs from a spec alone? (2) How does Lexis token efficiency compare to Python? Results: 100% generation success, but raw JSON is ~7x more tokens than Python (though semantically closer to ~1.65x).

AI Generation Quality: 16/16 (100%)

Suite	Parse	Validate	Security	Execute	Correct
Hand-written baseline	8/8	8/8	8/8	8/8	8/8
AI-generated (spec only)	8/8	8/8	8/8	8/8	8/8

The AI-generated programs were written using only the 54-line spec document — no access to existing examples, no trial-and-error. Every program passed all 5 pipeline stages on the first attempt.

Token Efficiency

Metric	Lexis/Python Ratio	Meaning
M1 (Raw JSON)	6.74x	Lexis is ~7x MORE tokens as raw JSON
M4 (Semantic-only)	1.65x	With a binary format, gap shrinks to ~1.6x
T4 Error handling	0.92x (M4)	Lexis wins — TRY_OR is more compact than try/except
T3 Conditional	1.00x (M4)	Tie — SELECT is as compact as if/else

Phase 2a: Agent Collaboration

Identity, Trust, Scoping, Composition, Audit

~41 tests • Multi-agent security foundation

Summary

Introduced multi-agent identity, trust levels, capability scoping, graph fragment composition, and audit trails. This phase laid the security foundation for agents collaborating on shared programs.

What Was Built

Agent Identity — BLAKE3 cryptographic identities, node signing/verification, provenance tracking (separate from content hashing — dedup preserved)
Trust Levels & Scoped Security — 4-tier trust system (0-3), three-way capability intersection (trust ceiling ∩ agent declared ∩ program manifest), per-agent sandbox enforcement
Graph Composition — Safe multi-agent fragment merging with content-hash dedup, open input resolution, provenance preservation (no capability laundering)
Audit Trail — Append-only, BLAKE3-hashed log tracking every node execution by agent

Key Design Decisions

Provenance separate from content hash — Two agents writing CONST(5) get the same content hash. Authorship is tracked in Provenance metadata, not in the hash.
Trust levels 0-3 — Simple integer mapping to capability ceiling. Deliberately coarse-grained. Finer permissions come from the 3-way intersection.
Three-way capability intersection — All three must agree for an operation to be allowed. Prevents: trust escalation, undeclared use, manifest bypass.
No capability laundering — When fragments are composed, nodes keep their original agent's provenance.

Phase 2b: Binary Format & Wire Protocol

MessagePack ~5x smaller than JSON

~22 tests • MessagePack, wire protocol, fragment store

Summary

Added MessagePack binary serialization for compact program encoding (~5x smaller than JSON), a wire protocol for agent communication with signed messages and tamper detection, and content-addressed fragment storage.

What Was Built

MessagePack Serialization — Opcodes as integers (~5x smaller than JSON), full roundtrip fidelity for programs, subgraphs, agents, provenance
Wire Protocol — 11 message types for fragment exchange, composition proposals, hash verification. Signed messages with tamper detection
Fragment Store — Thread-safe content-addressed storage for subgraphs and fragments with dedup reporting

Phase 2c: MAP, FILTER, Parallel

184 tests • 31 opcodes

~25 tests • Higher-order ops + parallel execution

Summary

Added MAP, FILTER, COMPOSE opcodes and parallel execution. The language now supports multi-agent collaboration end-to-end.

What Was Built

MAP — Apply subgraph to each sequence element (short-circuits on error)
FILTER — Keep elements where predicate subgraph returns truthy
COMPOSE — Create new subgraph B(A(x)) via node prefixing and port rewiring — content-addressable
Parallel Execution — parallel_evaluate() using ThreadPoolExecutor with parallelism_levels() grouping, deterministic PRINT ordering, thread-safe stores

Key Design Decisions

MAP/FILTER as opcodes — Not library functions. Making them opcodes communicates parallelism intent to the scheduler.
Parallel execution — parallelism_levels() + ThreadPoolExecutor. Deterministic PRINT ordering preserved via output buffering per level.

Phase 3a: Hardening

282 tests total • Pure quality & robustness

~48 tests • CLI, adversarial security, protocol, error paths

Summary

First hardening pass. Added CLI, scheduler edge case tests, adversarial security tests, protocol edge case tests, and error path coverage. No new features — pure quality and robustness.

What Was Built

CLI entry point for python -m lexis. Commands: run, validate, hash
Adversarial security tests (12 tests) — Trust escalation, provenance forgery, capability laundering, sandbox bypass attempts
Protocol edge cases (8 tests) — Boundary conditions in message encoding/decoding
Error path hardening (8 tests) — REDUCE/MAP/FILTER edge cases, subgraph errors

Phase 3b: Data Access Opcodes

38 opcodes • 10 examples

~50 tests • INDEX, SLICE, SPLIT, DICT, GET, KEYS, VALUES

Summary

Added 7 opcodes for working with sequences and dictionaries. The language can now parse structured strings, index/slice into sequences and strings, and work with key-value dictionaries — unlocking real data processing programs.

Opcodes Added

INDEX — Access element by position. Polymorphic: works on both sequences and strings.
SLICE — Extract sub-range. Polymorphic: works on both sequences and strings.
SPLIT — Split string by delimiter. Empty delimiter = character-level split.
DICT — Create dictionary from alternating key-value pairs.
GET — Retrieve value from dictionary by key.
KEYS — Get all keys from a dictionary as a sequence.
VALUES — Get all values from a dictionary as a sequence.

Phase 4: Gap-Closing

307 tests • 40 opcodes

~25 tests • TO_NUM, APPLY, higher-order in subgraphs

Summary

Closed three functional gaps identified by audit. The foundation is genuinely complete after this phase — no known inconsistencies, no known blockers for the existing feature set.

What Was Fixed

TO_NUM — String-to-number conversion. Parses strings to int/float, bools to 0/1, numbers pass through.
APPLY — Dynamic subgraph dispatch. The hash comes from an input edge, making the target dynamic. This completes COMPOSE: COMPOSE returns a runtime hash → APPLY invokes it.
All ops work inside subgraphs — MAP/FILTER/REDUCE/COMPOSE/APPLY inside a subgraph now work correctly.

Phase 5: Type Guards & Pattern Matching

346 tests • 43 opcodes

~37 tests • TYPE_OF, HAS_KEY, MATCH

Summary

Added type introspection (TYPE_OF, HAS_KEY) and the MATCH opcode for pattern matching with guards and handlers. Completed the control-flow story: SELECT for binary conditions, MATCH for multi-way dispatch.

Key Design Decisions

TYPE_OF does NOT propagate errors — Returns "ERROR" string for ErrorValue inputs. This is introspection, not computation.
HAS_KEY does NOT propagate errors — Returns False for non-dict/ErrorValue inputs. Safe guard — never errors itself.
MATCH is lazy — Only evaluates the matching handler. Guards checked in order, first truthy wins.
Guards and handlers are subgraphs — Referenced via SUBGRAPH_DEF hash. Consistent with APPLY pattern.

Why This Matters for Weaker Models

MATCH shifts cognitive load to runtime. Weaker models can declare patterns without implementing dispatch logic. A heterogeneous list [10, "20", 30, "40"] can be type-dispatched through MATCH with just two guard-handler pairs — previously requiring nested SELECT chains.

Phase 6: Standard Library

423 tests • 18 subgraphs • ~75% token savings

~75 tests • Guards, transforms, reducers

Summary

Created a standard library of 18 pre-built, content-addressed subgraphs that programs can reference by name instead of defining inline. The stdlib is resolved at load time — the evaluator is completely untouched.

Token Impact

Program	Before (inline)	After (stdlib)	Savings
stdlib_showcase.json	~80 lines	10 lines	~75%
stdlib_reduce.json	~25 lines	7 lines	~72%
Type-dispatch MATCH	~2,300 tokens	~1,200 tokens	~48%

An AI model that used to emit 5-node subgraph definitions (~400 tokens each) now emits a 15-character string like "std:always_true".

Key Design Decisions

Stdlib is compile-time expansion — "std:name" resolved to actual content hash at load time. Evaluator unchanged.
Content-addressed dedup — Stdlib subgraphs get real BLAKE3 hashes. Two programs using std:double share the same hash.
"std:" prefix convention — Only values starting with "std:" trigger resolution. No collision with user subgraph names.

Phase 7: LLM Generation Benchmarks

478 tests • 15 tasks across 3 tiers

55 tests • Benchmark harness + CLI command

Summary

Built a benchmark harness to test whether AI models can generate valid Lexis programs from the spec + natural language task descriptions. 15 tasks across 3 tiers. CLI command lexis bench. Works with any OpenAI-compatible API.

Key Design Decisions

OpenAI-compatible API — Works with LM Studio, Ollama, or any compatible endpoint.
Robust JSON extraction — Handles code fences, prose wrapping, raw JSON, multiple objects.
Full pipeline validation — Parse → validate → verify → execute → check output.
No new dependencies — Uses urllib.request (stdlib) for API calls.

Phase 7b: Spec Hardening

492 tests • 3 rounds of refinement

Tested 5 models • 3 spec refinement rounds

Summary

Ran the benchmark suite against 5 local models on an NVIDIA 4090 GPU. Performed targeted spec hardening based on error analysis across 3 rounds.

Results

Model	Size	Score	Tier 1	Tier 2	Tier 3
Qwen 2.5 Coder 14B	14B	11/15 (73%)	5/5	3/5	3/5
GLM 4.6v Flash	~9B	9/15 (60%)	5/5	2/5	2/5
Qwen 2.5 Coder 7B	7B	9/15 (60%)	3/5	3/5	3/5
GLM 4.7 Flash	30B	7/15 (47%)	2/5	3/5	2/5
Qwen 2.5 Coder 32B	32B	5/15 (33%)*	1/5	2/5	2/5

*32B model hampered by EXTRACT_FAIL (quantization/VRAM pressure issues, not a capability problem).

Key Insight

Models learn flat graphs and stdlib quickly. The capability wall is multi-level nesting: main graph → subgraph definition → port wiring → higher-order call. This is a working memory limitation in small models, not a spec clarity issue. The spec refinement hit diminishing returns after 3 rounds.

Phase 8: Params, File I/O & Stdin

582 tests • 47 opcodes • 19 examples

Phase 8b: PARAM (25 tests) • Phase 8a: File I/O (42 tests) • Phase 8c: Stdin (22 tests)

Phase 8b: PARAM Opcode

Added runtime parameter injection. Parameters are always strings, injected via CLI --param name=value. Pure operation — no capability required. Missing param → ErrorValue, so use TRY_OR for defaults.

Phase 8a: File I/O (READ_FILE, WRITE_FILE)

Added file system operations with a verifier bug fix: both verifier and sandbox now read from the same OP_REQUIRED_CAPABILITIES dict — single source of truth. FS_READ at trust level 2, FS_WRITE at trust level 3. Write-then-read chaining works naturally: WRITE_FILE returns the filename.

Phase 8c: Standard Input (READ_STDIN)

Added READ_STDIN with the stdin_reader callable pattern for testability. EOF → ErrorValue, TRY_OR provides graceful fallback. No CLI changes needed — just pipe: echo "hi" | lexis run prog.json.

Phase 8 Complete: Lexis can now read/write files, accept parameters, and read from stdin. It has graduated from a calculator to a real data processor.

Phase 9: AI-Native Networking

HTTP_GET, HTTP_POST, HTTP_REQUEST • Security-first design

3 HTTP opcodes • Zero new dependencies

Summary

Added 3 HTTP opcodes with a security-first design: closed-by-default domain allowlist, SSRF prevention, HTTPS-only by default, no redirects by default, response size limits, and timeout enforcement.

Key Design Decisions

3 AI-native HTTP opcodes — Three tiers of complexity. GET = 1 input (URL). POST = 2 inputs (URL, body). REQUEST = 3 inputs (URL, method, body). LLMs pick the simplest tier.
Auto-JSON parse/serialize — Responses with JSON Content-Type auto-parsed. POST auto-serializes dict bodies.
Closed-by-default domain allowlist — Must explicitly declare every domain. Empty = all blocked.
SSRF prevention — DNS pre-resolution + private IP blocking before any connection.
Zero new dependencies — Uses Python stdlib: urllib, ipaddress, socket, json, ssl.

Phase 10: Persistence & Caching

3-layer caching system • Cache invalidation is a non-problem

L1: Runtime memo • L2: Disk cache • L3: LLM catalog

Summary

Added a 3-layer caching system: L1 runtime memo (always on, within-run), L2 persistent disk cache (opt-in, cross-run), L3 LLM-aware catalog (for token savings in prompts). Content-addressing makes cache invalidation a non-problem.

Key Design Decisions

Purity as foundation — Only pure subgraphs (no I/O ops transitively) are cached.
Cache invalidation is a non-problem — Content-addressing guarantees: same inputs + same function hash = same result. The hash IS the cache key. No staleness possible.
Bool vs int distinction — True and 1 hash differently. Semantically different values must have different hashes.
Layer 1 always on — Zero cost when no subgraph calls occur. Thread-safe via Lock.
Layer 2 opt-in — DiskCache enabled via --cache CLI flag. Sharded storage with LRU eviction.

Phase 11: Multi-Agent Runtime

156 + 51 tests • Transport, event loop, discovery, negotiation

4 implementation steps + hardening pass (51 adversarial/stress tests)

Summary

Transformed Lexis from a single-interpreter language into a multi-agent collaborative system. Added transport layer, event loop, enhanced provenance with Hybrid Logical Clocks, content-aware routing, agent discovery, and capability negotiation.

Architecture

Transport Layer — Abstract interface for agent communication. LocalTransport with thread-safe queues. ContentRouter with IPFS-style Want/Have protocol.
Agent Event Loop — State machine with 6 states (IDLE, OFFERING, REQUESTING, COMPOSING, EXECUTING, STOPPED). 19 message handlers.
Enhanced Provenance — Hybrid Logical Clocks combining physical time + logical counter + agent_id. ProvenanceChain for lineage tracking.
Discovery & Negotiation — TTL-based agent announcements. Capability negotiation with state machine (PROPOSED→ACCEPTED|REJECTED|COUNTERED|EXPIRED).

Hardening (Phase 11b)

51 new tests: 18 integration, 21 adversarial, 12 stress. Tested trust escalation, provenance forgery, transport attacks, capability laundering, concurrent messaging, HLC causality. No production bugs found — all Phase 11 code held up.

Phase 12: Benchmark Expansion

22 tasks • 4 tiers • 996 tests

21 new tests • 7 new Tier 4 tasks • Spec 43 → 50 opcodes

Summary

Updated the LLM benchmark suite to cover all features added in Phases 8-11. Updated the spec from 43 to 50 opcodes. Added 7 new tasks in Tier 4 (I/O & Capabilities).

Key Finding: 7B Beats 14B

The Qwen 2.5 Coder 7B model scored 91% vs 82% for the 14B — consistently across two runs each. For DAG-structured JSON output, the smaller model's more constrained generation appears to be an advantage. Less "creativity" means fewer wrong answers.

Phase 13: MCP Server

1056 tests • 6 tools, 3 resources, 1 prompt

60 new tests • FastMCP + stdio/HTTP

Summary

Built a Model Context Protocol (MCP) server so any MCP-compatible AI agent — Claude Code, Cursor, Claude Desktop, Windsurf — can generate, validate, execute, and compose Lexis programs through standard tool calls. This is the fastest path to adoption: turns Lexis from "a language you learn" into "a tool you call."

Design Decisions

lexis_generate reuses validate_generated_program — The benchmark validation function already handles everything: auto-infer capabilities, auto-flatten inline objects, staged error classification.
Structured JSON returns — LLMs need machine-parseable responses to self-correct in agentic loops.
Suggestions field — Maps each error class to specific fix instructions. The self-correction loop that makes agentic workflows work.

Phase 14: ITERATE Opcode

Bounded iteration with guaranteed termination

17 tests • 1 new opcode

Summary

Added bounded iteration. MAP/FILTER/REDUCE handle collection processing, but there was no way to express "repeat until done" — retry patterns, numeric convergence, iterative refinement. ITERATE fills this gap with guaranteed termination via max_steps.

Design

Inputs: 2 — (initial_value, max_steps)
Step subgraph: 1 input → 1 output ([new_value, should_stop])
Follows REDUCE pattern — LLMs that can generate REDUCE programs can immediately generate ITERATE programs.
max_steps as computed value — Allows dynamic iteration limits based on input data.

Phase 15: Enhanced Error Reporting

1111 tests • Actionable error diagnostics

38 new tests • Error classification + "Did you mean?" + --debug

Summary

Made Lexis errors actionable enough that local AI models (7B-14B) can self-correct instead of getting stuck in retry loops.

What Changed

classify_error() — Pattern-matches exceptions to error classes (PARSE_FAIL, OPCODE_FAIL, ARITY_FAIL, REF_FAIL, CYCLE_FAIL, SECURITY_FAIL, RUNTIME_FAIL)
"Did you mean?" — Uses difflib.get_close_matches() for op typos. "conts" → "Did you mean: const, concat?"
--debug flag — Prints all node values in topological order to stderr
CLI error suggestions — Every error handler now attaches actionable fix suggestions

Phase 16: Tiered Spec

Reorganized spec document into 3 tiers

No code changes • Spec reorganization for small models

Motivation

Local LLMs (7B-14B) kept reaching for advanced features (subgraphs, match) when simple tasks only needed basics. The spec presented all 50 opcodes flat — the model couldn't distinguish simple from complex.

Solution

Tier 1: Core Language (25 opcodes) — Self-contained: a model reading only Tier 1 can build calculators, string processors, conditional logic
Tier 2: Functions & Collections (19 opcodes) — Subgraphs, higher-order ops, collections
Tier 3: I/O & Networking (6 opcodes) — File, stdin, HTTP
Tier directive at top: "Start with Tier 1. Only use Tier 2 if Tier 1 cannot solve the task."

Phase 17: lexis_check Tool

1122 tests • Validate + execute in one call

11 new tests • 7th MCP tool

Motivation

During local model testing, we discovered that models call lexis_validate and lexis_run separately — and some skip lexis_run entirely, declaring "Task Completed" without verifying output. A model did this with a broken calculator that would have failed at runtime.

Solution

lexis_check is a single MCP tool that validates, runs, and verifies a program in one call. Models can't claim success without actual execution. Returns per-stage results (parse_ok, structure_ok, security_ok, execution_ok) plus fixes_applied that tells models what was auto-corrected.

Phase 18: Native GUI

6 opcodes • 10 widget types • Tkinter backend

87 new tests • Declarative scene-graph

Summary

Added 6 GUI opcodes enabling Lexis programs to create native windowed applications with interactive widgets, event handling, and canvas drawing. Backend is tkinter (zero extra dependencies).

Design Philosophy

Declarative scene-graph — The DAG describes UI as data structures, not imperative API calls
6 opcodes, not 20+ — GUI_WIDGET uses a type discriminator; GUI_DRAW uses a shape discriminator. Adding new widget/shape types requires zero opcode changes
5 pure + 1 impure — Only GUI_RENDER is side-effecting. The other 5 build descriptor dicts — pure, cacheable, content-addressable
Callback subgraphs — Event handlers are subgraphs invoked with state dict → return new state dict

Widget Types

label, button, text_input, checkbox, dropdown, slider, vbox, hbox, grid, frame

Example Programs

gui_hello.json (static window), gui_counter.json (buttons + state), gui_canvas_drawing.json (shapes), gui_calculator.json (digit buttons, +, =, C)

Phase 19: DAG Visualization

3 modes: static, trace, live • Cytoscape.js

68 new tests • Browser-based visualization

Summary

Added browser-based DAG visualization system with 3 modes: static (structure view), trace (step-through playback), and live (real-time GUI program tracing). Uses Cytoscape.js + dagre layout (CDN, zero build step).

Features

10 opcode color categories (Literal, Arithmetic, Comparison, Logic, String, Collection, Control Flow, Subgraph, I/O, GUI)
Collapsible subgraph compound nodes
Click-to-inspect sidebar (node ID, op, value, result, hash)
Trace playback controls (step, play/pause, speed slider, reset)
Live mode with pulsing indicator and real-time node highlighting
Dark theme inspired by VS Code

Phase 20: Meta-Programming

5 opcodes • Self-bootstrapping foundation

47 new tests • EMIT_NODE, BUILD_SUBGRAPH, QUOTE, REFLECT, EVAL

Summary

Added 5 meta-programming opcodes that enable Lexis programs to construct, inspect, and execute graph fragments at runtime. This is the foundation for self-hosting: AI agents using Lexis programs to generate, validate, and compose other Lexis programs.

EVAL Security Design

Capability ceiling: inner caps = (declared ∩ parent caps) − {META_EVAL}
No privilege escalation: inner code cannot use capabilities the parent lacks
No recursive eval: META_EVAL stripped from ceiling prevents eval-of-eval chains
Recursion depth limit: MAX_EVAL_DEPTH = 3

Phase 21: Production Patterns

FORMAT, TRY_CATCH, RETRY • 64 opcodes

37 new tests • 3 opcodes closing practical gaps

Summary

Added 3 production-pattern opcodes that close the gap between "Lexis can do it in theory" and "Lexis handles it cleanly in practice."

Opcodes

FORMAT — String interpolation: FORMAT("Hello {}, count: {}", name, num). Eliminates 60-70% of nodes in message-building patterns.
TRY_CATCH — Unwrap value/error into inspectable dict. Always returns a dict — never propagates. Enables error inspection that was previously impossible.
RETRY — Bounded retry of subgraph up to N times until success. Makes API orchestration practical (HTTP 429/503 recovery).

Impact

Together, these three opcodes make Lexis production-ready for AI agent tool chains, regulated computation pipelines, and API orchestration.

Phase 22: Utility Opcodes

DELAY, RANGE, SORT, REVERSE, MERGE • 69 opcodes

50 new tests • 5 bread-and-butter operations

Summary

Added 5 utility opcodes that fill the most common practical gaps: generating number sequences, sorting/reversing data, combining dicts, and pausing for retry backoff.

Design Highlights

DELAY — Pass-through semantics. Returns input value, enables chaining. Max 60 seconds.
RANGE — Variable arity (2-3). Auto-detects direction. 10,000 element limit.
SORT — Type-homogeneous only. Mixed types return ErrorValue.
MERGE — Dict union. Second dict wins on conflicts.

Implementation Pattern

All 5 opcodes are simple builtins — no evaluator.py changes needed. They're dispatched via BUILTIN_OPS[node.op](*input_values) automatically. The cleanest possible pattern for new opcodes.

Phase 23: Expanded Stdlib

18 → 32 subgraphs • +14 new

50 new tests • 9 transforms + 5 reducers

Summary

Expanded the stdlib from 18 to 32 subgraphs. All 14 new subgraphs are composed from existing opcodes — validating the composability of the core opcode set.

New Transforms (9)

abs, square, increment, decrement, get_last, not_op, is_negative, is_zero, is_empty

New Reducers (5)

sub, min, max, concat, and_op

Key Decision

No new opcodes needed. All 14 subgraphs are built from existing opcodes (ADD, SUB, MUL, GT, LT, EQ, SELECT, NOT, LENGTH, INDEX, CONCAT, AND). No registry changes — auto-discovery from the subgraph dictionaries.

Phase 24: String Operations

UPPER, LOWER, TRIM, REPLACE, STARTS_WITH, ENDS_WITH, CONTAINS • 76 opcodes

58 new tests • 7 native string opcodes

Summary

Added 7 native string manipulation opcodes. These fill the most critical gap AI models face when generating text-processing programs.

Design Decisions

All pure, no capabilities — String operations have no side effects.
Auto-coercion via str() — Matches existing CONCAT/SPLIT pattern. Pragmatic for AI models.
REPLACE replaces ALL occurrences — What users expect.
Placed in Tier 1 — Fundamental string operations, unlike SPLIT which produces a collection.

Phase 25: Math Operations

FLOOR, CEIL, ROUND, POWER, SQRT, RANDOM • 82 opcodes

60 new tests • 6 math opcodes

Summary

Added 6 math opcodes filling the gap between basic arithmetic and what models need for calculators, converters, scientific computation, and games.

Design Decisions

Skipped native ABS — stdlib already has std:abs (Phase 23).
RANDOM is impure — In IO_OPS (not cached) but requires no capability. Generating random numbers isn't dangerous.
ROUND variable arity (1-2) — round(3.5) → 4, round(3.456, 2) → 3.46.
Strict type checking — Math ops reject bools and non-numbers with ErrorValue.

Phase 26: Sequence Operations

ZIP, FLATTEN, UNIQUE, TAKE, DROP, ENUMERATE • 88 opcodes

57 new tests • 6 sequence manipulation opcodes

Summary

Added 6 sequence manipulation opcodes filling the gap between basic sequence creation and higher-order ops.

Design Decisions

ZIP truncates to shorter — Follows Python semantics. No padding, no error.
FLATTEN is one level only — Safe, predictable, covers 95% of use cases.
UNIQUE handles unhashable types — Uses repr() fallback. Preserves first-occurrence order.
TAKE/DROP clamp to bounds — No error on oversized count.
ENUMERATE starts at 0 — Always. Keeping it simple.

Phase 27: Benchmark Refresh

22 → 30 tasks • 1652 tests

16 new tests • 8 new tasks covering Phases 21-26

Summary

Added 8 new benchmark tasks (t23-t30) covering opcodes from Phases 21-26: string ops, math ops, format, and sequence ops. All 30 baselines pass the full validation pipeline.

New Tasks

ID	Name	Opcodes Tested
t23	string_normalize	trim, lower, eq, select
t24	text_transform	upper, replace, contains
t25	pythagorean	power, sqrt, add
t26	rounding	floor, ceil, round
t27	format_string	format
t28	sequence_pipeline	unique, sort, take
t29	zip_enumerate	zip, enumerate, map
t30	flatten_reduce	flatten, reduce

Phase 28: Dict Operations

SET, DELETE_KEY, ITEMS • 91 opcodes

38 new tests • Completes dict CRUD API

Summary

Added 3 dict mutation opcodes. All operations are pure (immutable) — they return NEW dicts, consistent with Lexis's functional design. This completes the dict API: create (dict), read (get), update (set), delete (delete_key).

Design Decisions

All pure (PURE_OPS) — Immutable operations that return new dicts.
DELETE_KEY is a no-op on missing keys — Returns dict unchanged rather than erroring.
ITEMS returns [[key, value], ...] — Uses 2-element lists consistent with Lexis collections. Enables dict↔sequence pipelines with MAP/ZIP.

Phase 29: Developer Tooling

VS Code extension, enhanced debugging, session logger

3 sub-phases • VS Code, debugging, benchmark sessions

Phase 29a: VS Code Syntax Highlighting

Created a VS Code extension with TextMate grammar for Lexis JSON programs. Highlights all 93 opcodes, string values, numbers, booleans, node IDs, capabilities, stdlib references, and structural JSON keys. Packaged as a VSIX file for installation.

Phase 29b: Enhanced Debugging

Added structural warnings, program summary, execution snapshot, opcode hints, call stack tracking, output diff, and contextual suggestions. Designed to help both AI models and human developers understand what went wrong and how to fix it.

Phase 29c: Session Logger

Built benchmark session infrastructure for cross-model comparison:

sessions.py — Session CRUD, import existing results
analysis.py — Pipeline inference, opcode extraction, failure patterns, cross-model comparison
reports.py — Markdown report generation (scoreboard, strengths, hardest tasks, failure patterns)
session_cli.py — CLI: start, list, show, report, import

Workflow: bench session start "Name" → bench -m model --session ID → bench session report ID

Phase 30: COMMENT & ASSERT

Developer tools • 93 opcodes • 1823 tests

18 new tests • 2 new opcodes + session logger fixes

New Opcodes

COMMENT (1 input) — No-op pass-through node. Returns input unchanged. value field holds a label string for documentation. Acts as inline documentation in the data flow graph. Does NOT propagate errors — passes them through silently.
ASSERT (2 inputs) — Runtime assertion. Input 1: condition (truthy/falsy). Input 2: value to pass through. If condition is truthy, returns value unchanged. If falsy, returns ErrorValue with assertion failure message. Uses @_propagating: error inputs propagate before assertion check. Recoverable with TRY_OR.

Design Decisions

COMMENT skips @_propagating — Same as TYPE_OF, HAS_KEY. Annotation should never alter semantics.
ASSERT uses @_propagating — If inputs are already errors, they should propagate rather than masking as "assertion passed."
Both are PURE_OPS — No I/O capabilities needed.

Session Logger Fixes

Pipeline Breakdown: compare_models() now calls analyze_run() per run and includes pipeline_rates. Reports show real Parse/Validate/Security/Execute/Correct percentages instead of dashes.
Model Deduplication: When multiple runs exist for the same model, keeps the best-scoring one.

Test Results

18 new tests for COMMENT + ASSERT. 2 new tests for pipeline_rates + deduplication. Updated 3 opcode count assertions (91 → 93). Total: 1823 tests passing.

Welcome to Lexis

What is Lexis?

Who is Lexis For?

Quick Example

Design Philosophy

Code is Data (DAGs, not text)

Content-Addressed Identity

Errors Are Values

Functions Are Subgraphs

Security by Default

Composition Over Coordination

Provenance Separate from Content

Caching is Trivial

Key Differentiators

Why DAGs?

System Architecture

Core Pipeline

Package Structure

Core Data Types

Opcode Reference

Tier 1: Core Language (43 opcodes)

Tier 2: Functions & Collections (33 opcodes)

Tier 3: I/O & Networking (6 opcodes)

Tier 4: GUI (6 opcodes)

Tier 5: Meta-Programming (5 opcodes)

All 93 Opcodes

Standard Library

Guards (8 subgraphs)

Transforms (17 subgraphs)

Reducers (7 subgraphs)

Usage

Security Model

Capabilities

Enforcement Layers

Multi-Agent Security

Network Security

CLI Usage

Commands

Examples

MCP Server

Tools (7)

Resources

The lexis_check Tool

Configuration

Phase 0: Foundation

Summary

Core Pipeline

What's Proven

Phase 1a: Core Language

Summary

New Capabilities

Key Design Decisions

Design Philosophy

Phase 1b: Benchmarks & Validation

Summary

AI Generation Quality: 16/16 (100%)

Token Efficiency

Phase 2a: Agent Collaboration

Summary

What Was Built

Key Design Decisions

Phase 2b: Binary Format & Wire Protocol

Summary

What Was Built

Phase 2c: MAP, FILTER, Parallel

Summary

What Was Built

Key Design Decisions

Phase 3a: Hardening

Summary

What Was Built

Phase 3b: Data Access Opcodes

Summary

Opcodes Added

Phase 4: Gap-Closing

Summary

What Was Fixed

Phase 5: Type Guards & Pattern Matching

Summary

Key Design Decisions

The `lexis_check` Tool