Context Engineering

Purpose

Rules for curating LLM context windows — instructions, tools, state, and retrieval

Rules

CE-001: Organize system prompts into distinct sections with clear boundaries. Use YAML structure or XML tags to separate background, instructions, tools, constraints, and output format.

Anthropic Engineering — "Organize prompts into distinct sections using XML tags or Markdown headers." Distinct sections reduce ambiguity and improve instruction following.

Verification: Code review — system prompts must have named sections

CE-002: User prompts must provide symptom, location, and definition of done. Vague instructions like "fix this" or "make it better" waste context tokens on clarification cycles.

Osmani — "Good specs have: problem statement, non-goals, data contracts, acceptance tests." Anthropic — specificity in user prompts reduces back-and-forth by 60%.

Verification: ETVX tasks must have entry/exit criteria (not vague descriptions)

CE-003: Compact conversation state before context limits. Preserve architectural decisions, file paths, and task progress during compaction. Discard verbose tool outputs and intermediate reasoning.

Anthropic — "Summarize and reinitialize before context limits, preserving architectural decisions." Inkeep — compaction retains signal, summarization loses structure.

Verification: Session memory files retain key identifiers after compaction

CE-004: Use external persistence for knowledge that survives session resets. CLAUDE.md for project-wide constants, memory/ files for evolving state, ETVX task files for cross-session progress.

Anthropic — "Structured note-taking: external memory systems (markdown files) enable persistence across resets." Session context is ephemeral; external files are durable.

Verification: Critical state survives session reset (check memory files after /compact)

CE-005: Use just-in-time retrieval — load context via tools at the moment it is needed, not pre-loaded into the system prompt. Maintain lightweight references (file paths, IDs), dynamically load full content.

Anthropic — "Just-in-time retrieval: maintain lightweight references, dynamically load context via tools." Pre-loading wastes tokens on information that may never be needed.

Verification: System prompts contain references (paths), not full file contents

CE-006: Provide fewer focused tools rather than many overlapping ones. Namespace related tools under common prefixes. Return semantic identifiers (entity names), not opaque UUIDs.

Anthropic Writing Tools — "Right tools: fewer, focused tools outperform many overlapping ones. Namespacing groups related tools. Meaningful context returns semantic identifiers."

Verification: Tool descriptions explain purpose like onboarding a team member

CE-007: Use YAML for knowledge artifacts and LLM-to-LLM communication. Use JSON for API responses and machine-to-machine contracts. All outputs must conform to their declared schema.

arxiv 2602.05447 — YAML 75.4% accuracy, optimal token efficiency, best grep-ability. Anthropic — schema conformance ensures outputs are machine-processable.

Verification: Knowledge artifacts validate against their protocol schema

CE-008: Minimize token consumption by using high-signal content only. Prune verbose tool outputs, remove redundant examples, and compress repeated patterns. Every token must earn its place.

Adaline Labs — "Context rot: model accuracy degrades as context fills — n-squared token relationships exhaust finite attention budget." Anthropic — minimal high-signal tokens.

Verification: CLAUDE.md and task files stay concise — under 200 lines for MEMORY.md

CE-009: Use specialized sub-agents with clean contexts for independent research tasks. Return condensed summaries (1000-2000 tokens) to the orchestrating agent. Never pollute the main context with raw research output.

Anthropic — "Sub-agent architectures: specialized agents in clean contexts, return condensed summaries." Inkeep — multi-agent isolation prevents cross-contamination.

Verification: Task agents return structured summaries, not raw tool outputs

CE-010: At session start, check working directory, read progress from external files, and review specifications before creating new work. Never start cold — always initialize from persisted state.

Anthropic Harnesses — "Initialization ritual: check directory, read progress, review feature list before new work." Cold starts without context lead to duplicated work or conflicting changes.

Verification: First actions in any session read CLAUDE.md, memory files, and active task

CE-011: Design YAML artifacts with grep-able structure, domain-partitioned schemas, and flat nesting (max 3 levels). Use consistent key naming and ID prefixes for cross-referencing.

arxiv 2602.05447 — "Grep tax phenomenon: compact formats consume 138-740% MORE tokens at scale due to format-unfamiliar search patterns." Grep-able structure eliminates the grep tax.

Verification: YAML files are grep-able — unique ID prefixes resolve to single artifacts

CE-012: Include diverse, canonical examples in every knowledge artifact. For an LLM, examples are pictures worth a thousand words. Show both APPLIED (correct) and VIOLATED (incorrect) patterns.

Anthropic — "Diverse, canonical examples — for an LLM, examples are the pictures worth a thousand words." Examples convey intent more precisely than prose descriptions.

Verification: Every rule in RICH-tier artifacts has at least one example pair

CE-013: Divide specifications into focused components with hierarchical summarization. Use table-of-contents with summaries at the top level, detailed content in separate files. Never put everything in one file.

Osmani — "As requirements multiply, model adherence to individual directives drops significantly. Solutions: divide specs into focused components, use extended table-of-contents with summaries." Anthropic — modular prompt design.

Verification: No single YAML artifact exceeds 1000 lines — split into components

CE-014: Build incremental verification into every workflow. Test after each meaningful change, not at the end. Use the LLM-as-Judge pattern for quality assessment of generated artifacts.

Anthropic Harnesses — "Incremental testing: validate as human user would, not just unit tests." Osmani — "LLM-as-a-Judge pattern: secondary agent reviews outputs against style guidelines."

Verification: ETVX tasks have exit criteria checked after each step, not only at task end

Tags

Overview