Context Engineering
Rules for curating LLM context windows — instructions, tools, state, and retrieval
Tags
Overview
Purpose
Rules for curating LLM context windows — instructions, tools, state, and retrieval
Rules
CE-001: Organize system prompts into distinct sections with clear boundaries. Use YAML structure or XML tags to separate background, instructions, tools, constraints, and output format.
Anthropic Engineering — "Organize prompts into distinct sections using XML tags or Markdown headers." Distinct sections reduce ambiguity and improve instruction following.
Verification: Code review — system prompts must have named sections
CE-002: User prompts must provide symptom, location, and definition of done. Vague instructions like "fix this" or "make it better" waste context tokens on clarification cycles.
Osmani — "Good specs have: problem statement, non-goals, data contracts, acceptance tests." Anthropic — specificity in user prompts reduces back-and-forth by 60%.
Verification: ETVX tasks must have entry/exit criteria (not vague descriptions)
CE-003: Compact conversation state before context limits. Preserve architectural decisions, file paths, and task progress during compaction. Discard verbose tool outputs and intermediate reasoning.
Anthropic — "Summarize and reinitialize before context limits, preserving architectural decisions." Inkeep — compaction retains signal, summarization loses structure.
Verification: Session memory files retain key identifiers after compaction
CE-004: Use external persistence for knowledge that survives session resets. CLAUDE.md for project-wide constants, memory/ files for evolving state, ETVX task files for cross-session progress.
Anthropic — "Structured note-taking: external memory systems (markdown files) enable persistence across resets." Session context is ephemeral; external files are durable.
Verification: Critical state survives session reset (check memory files after /compact)
CE-005: Use just-in-time retrieval — load context via tools at the moment it is needed, not pre-loaded into the system prompt. Maintain lightweight references (file paths, IDs), dynamically load full content.
Anthropic — "Just-in-time retrieval: maintain lightweight references, dynamically load context via tools." Pre-loading wastes tokens on information that may never be needed.
Verification: System prompts contain references (paths), not full file contents
CE-006: Provide fewer focused tools rather than many overlapping ones. Namespace related tools under common prefixes. Return semantic identifiers (entity names), not opaque UUIDs.
Anthropic Writing Tools — "Right tools: fewer, focused tools outperform many overlapping ones. Namespacing groups related tools. Meaningful context returns semantic identifiers."
Verification: Tool descriptions explain purpose like onboarding a team member
CE-007: Use YAML for knowledge artifacts and LLM-to-LLM communication. Use JSON for API responses and machine-to-machine contracts. All outputs must conform to their declared schema.
arxiv 2602.05447 — YAML 75.4% accuracy, optimal token efficiency, best grep-ability. Anthropic — schema conformance ensures outputs are machine-processable.
Verification: Knowledge artifacts validate against their protocol schema
CE-008: Minimize token consumption by using high-signal content only. Prune verbose tool outputs, remove redundant examples, and compress repeated patterns. Every token must earn its place.
Adaline Labs — "Context rot: model accuracy degrades as context fills — n-squared token relationships exhaust finite attention budget." Anthropic — minimal high-signal tokens.
Verification: CLAUDE.md and task files stay concise — under 200 lines for MEMORY.md
CE-009: Use specialized sub-agents with clean contexts for independent research tasks. Return condensed summaries (1000-2000 tokens) to the orchestrating agent. Never pollute the main context with raw research output.
Anthropic — "Sub-agent architectures: specialized agents in clean contexts, return condensed summaries." Inkeep — multi-agent isolation prevents cross-contamination.
Verification: Task agents return structured summaries, not raw tool outputs
CE-010: At session start, check working directory, read progress from external files, and review specifications before creating new work. Never start cold — always initialize from persisted state.
Anthropic Harnesses — "Initialization ritual: check directory, read progress, review feature list before new work." Cold starts without context lead to duplicated work or conflicting changes.
Verification: First actions in any session read CLAUDE.md, memory files, and active task
CE-011: Design YAML artifacts with grep-able structure, domain-partitioned schemas, and flat nesting (max 3 levels). Use consistent key naming and ID prefixes for cross-referencing.
arxiv 2602.05447 — "Grep tax phenomenon: compact formats consume 138-740% MORE tokens at scale due to format-unfamiliar search patterns." Grep-able structure eliminates the grep tax.
Verification: YAML files are grep-able — unique ID prefixes resolve to single artifacts
CE-012: Include diverse, canonical examples in every knowledge artifact. For an LLM, examples are pictures worth a thousand words. Show both APPLIED (correct) and VIOLATED (incorrect) patterns.
Anthropic — "Diverse, canonical examples — for an LLM, examples are the pictures worth a thousand words." Examples convey intent more precisely than prose descriptions.
Verification: Every rule in RICH-tier artifacts has at least one example pair
CE-013: Divide specifications into focused components with hierarchical summarization. Use table-of-contents with summaries at the top level, detailed content in separate files. Never put everything in one file.
Osmani — "As requirements multiply, model adherence to individual directives drops significantly. Solutions: divide specs into focused components, use extended table-of-contents with summaries." Anthropic — modular prompt design.
Verification: No single YAML artifact exceeds 1000 lines — split into components
CE-014: Build incremental verification into every workflow. Test after each meaningful change, not at the end. Use the LLM-as-Judge pattern for quality assessment of generated artifacts.
Anthropic Harnesses — "Incremental testing: validate as human user would, not just unit tests." Osmani — "LLM-as-a-Judge pattern: secondary agent reviews outputs against style guidelines."
Verification: ETVX tasks have exit criteria checked after each step, not only at task end