The AI Coding Dictionary

The vocabulary of AI coding, in plain English. Source: aihero.dev.

The Model

AI: A moving label, not a technology. Points at whatever computers can newly, impressively do — right now, large language models.
Cache tokens: Input tokens the provider has cached from a previous request via its prefix cache, billed at a much lower rate.
Harness: Everything around the model that turns it into an agent: tools, system prompt, context-window management, permissions, hooks.
Inference: Running a trained model to generate output — what happens on every model provider request. Parameters stay fixed.
Input tokens: Tokens the harness sends on each model provider request. Billed at a lower rate than output tokens.
Model provider request: One round-trip from the harness to the model provider. The harness sends context; the provider returns one response.
Model provider: Whatever serves a model for inference. Usually remote (Anthropic, OpenAI, Google), but can also be local (Ollama, llama.cpp).
Model: The parameters. Stateless — does next-token prediction and nothing else. Cannot do anything agentic on its own.
Next-token prediction: What the model actually does. Samples one next token from the context, appends it, and runs again. Its only mode of operation.
Non-determinism: The same input can produce different output. A property of how models generate text and how providers serve requests.
Output tokens: Tokens the model generates back. Billed at a higher rate than input tokens, since they cost more compute to produce.
Parameters: The numbers inside a model — often billions — tuned during training. Everything the model knows lives in them. Also called weights.
Prefix cache: The provider-side store that lets consecutive requests skip re-processing a shared prefix, billing those tokens at a lower rate.
Token: The atomic unit a model reads and writes. Roughly word-sized but not exactly. Context window size, cost, and latency all count tokens.
Training: The process that sets a model's parameters by exposing it to vast amounts of text and adjusting to improve next-token prediction.

Sessions, Context Windows & Turns

Agent: A model harnessed with tools, a system prompt, and a context window, that takes turns with a user. The model in motion.
Context window: Everything the model sees on each model provider request. Finite, model-specific, the only surface through which the model perceives.
Context: The relevant information the agent has access to right now — what the agent knows that's pertinent to the task.
Session: One bounded run of interaction with an agent. Starts empty, accumulates, ends when cleared, closed, or compacted into a fresh session.
Stateful: Carries information forward. Sessions are stateful across turns; agents can be made stateful across sessions via a memory system.
Stateless: Carries no information forward. The model is stateless across requests; an agent is stateless across sessions by default.
System prompt: The instructions the harness prepends to every model provider request — the agent's standing brief. Usually stable across a session.
Turn: One user message plus everything the agent does in response, up until it yields back to the user. Contains one or more provider requests.

Tools & Environment

Agent mode: A preset bundling a permission mode with behavioral instructions injected into the system prompt. Can flip mid-session.
Environment: The world the agent acts on — anything outside the harness that the agent perceives via tool results and changes via tool calls.
Filesystem: A tree of files and directories the agent reads from, writes to, and executes within — the default environment for a coding agent.
MCP: A protocol for plugging external tool servers into a harness — how an agent gets tools beyond what the harness ships with.
Permission mode: The permission-gating slice of an agent mode — which tool calls trigger a permission request and which run automatically.
Permission request: What the harness shows the user before executing a tool call that isn't pre-approved. The mechanism for putting a human in the loop.
Sandbox: An isolated environment the agent runs inside — container, VM, or restricted shell. Limits the blast radius of agent actions.
Tool call: The model's output naming a tool and its arguments — just structured text. The harness has to read it and execute.
Tool result: What the harness sends back after executing a tool call — file contents, output, or error. The agent's only view of the environment.
Tool: A function the harness exposes for the agent to call — Read, Write, Bash, Search. How an agent perceives and acts on the environment.

Failure Modes

Attention budget: Each token has a finite amount of influence to distribute across the rest of the context. Per-token, doesn't grow when context does.
Attention degradation: As a session grows, each token's attention budget spreads across more competitors; signal on meaningful relationships shrinks.
Attention relationship: The pairing between two tokens — meaningful pairs influence each other more than unrelated ones. A context of N tokens has ~N² of these.
Contextual knowledge: Facts the agent can read directly from the context right now. Counterpart to parametric knowledge.
Hallucination: Confidently-wrong model output. Two flavors: factuality (invented facts) and faithfulness (drift from loaded context).
Knowledge cutoff: The date past which a model has no parametric knowledge. Post-cutoff libraries and APIs are fabrication traps unless docs are loaded.
Parametric knowledge: What the model knows from training, stored in its parameters. Frozen at training time. Counterpart to contextual knowledge.
Smart zone: Early in a session the agent is sharp and focused. As the session grows it drifts into a dumb zone: sloppier, forgetful, more mistakes.
Sycophancy: Confidently agreeable model output. Caused by training that shaped the model to favor answers humans liked — including agreement.

Handoffs

Autocompact: Compaction triggered automatically by the harness when the context window approaches full.
Clearing: Ending the current session and starting a fresh one. The next message begins with an empty session and an empty context window.
Compaction: A handoff done in-memory: the previous session's history is summarised and seeds a fresh session. Lossy — detail traded for headroom.
Handoff artifact: A document used as the carry mechanism for a handoff — written by one session to be read by another.
Handoff: Transferring agent context from one session to another, with no return path. Carry mechanism varies — artifact, compaction, others.
Primary source: The thing itself — code, transcripts, raw data. Complete and authoritative, but expensive to load into context.
Secondary source: An account of a primary source, one step removed — summaries, docs, compaction summaries. Cheap to load, lossy by construction.
Spec: A handoff artifact describing a multi-session piece of work — what's being built, not how each session does its share. Made of tickets.
Ticket: A handoff artifact scoping one session of work. Stands alone or hangs off a spec. Can block or be blocked by sibling tickets.

Memory and Steering

AGENTS.md: A file in the environment that the harness loads into the context window at session start — the project's standing brief to the agent.
Context pointer: A mention in one document that points to another, so the agent can pull it into context only when the task calls for it.
Memory system: A system that attempts to make an agent stateful across sessions by persisting to the environment and reloading at session start.
Progressive disclosure: Loading only the context an agent needs right now, with context pointers to the rest. Borrowed from UI design.
Skill: A teachable capability bundled as a unit — kept out of the context window until a context pointer pulls it in for the task at hand.
Subagent: An agent spawned by another agent via a tool call. Runs in its own session, reports a single tool result. Cannot spawn further subagents.

Patterns of Work

AFK: A working pattern where the user kicks off a session and leaves the agent to run unattended (away from keyboard).
AX: Agent experience: how well the environment is set up for an agent to do good work — checks, architecture, and free context.
Automated check: A deterministic verification that runs in the environment — tests, type checks, lints, build, pre-commit hooks. Pass/fail, no judgement.
Automated review: An agent reviewing another agent's work, often with a different model or system prompt. Non-deterministic: it forms a judgement.
DX: Developer experience: how easy a codebase and its toolchain make it for humans to do good work — docs, feedback speed, errors.
Design concept: The shared understanding of what's being built, held in common between user and agent but separate from any asset.
Grilling: A technique for developing a design concept: the agent interviews the user Socratically, one decision at a time.
Human review: The user reading the code the agent produced and forming a judgement on it. Reading the diff counts; reading the summary doesn't.
Human-in-the-loop: A working pattern where one or more humans pair with the agent during a session — reviewing, redirecting, or collaborating in real time.
Prototyping: Having the agent build a quick, rough version when conversation is too low-fidelity and you need a real artifact to talk about.
Vibe coding: A working pattern where the user accepts the agent's code without human review. The diff is treated as opaque.

7 sections describing how agentic coding works.

TheAICodingDictionary

The AI Coding Dictionary

The Model

Sessions, Context Windows & Turns

Tools & Environment

Failure Modes

Handoffs

Memory and Steering

Patterns of Work