LLM Context Management Principles

A comprehensive index of techniques to prevent context pollution and optimize LLM agent performance.

Overview

LLM Context Management

The Problem: Context Degradation

LLMs suffer from several context-related issues:

Problem Description Impact
Context Rot Performance degrades as context grows Quality drops at ~25% of max window
Lost in Middle Information in middle of context is poorly recalled Beginning/end bias
Context Pollution Irrelevant data crowds out useful information Degraded reasoning
Token Limits Hard cap on context window size Truncation, errors

Strategy Categories

1. Isolation Strategies

Prevent pollution by separating concerns:

Subagents

Each subagent gets its own clean context window:

Lead Agent (high-level plan)
├── Research Agent (isolated 100k context)
├── Code Agent (isolated 100k context)
└── Review Agent (isolated 100k context)

Benefits:

Tool Sandboxing

Tool outputs don’t pollute main conversation:

Scoped Context

Each model call sees minimum required context:

2. Reduction Strategies

Shrink context while preserving information:

Compaction (Preferred)

Reversible - strips redundant data that exists elsewhere:

Before: [full file contents in context]
After:  [reference to file path - can re-read if needed]

Summarization (Lossy)

LLM condenses history when compaction isn’t enough:

Priority Order:

  1. Raw (keep original)
  2. Compaction (reversible)
  3. Summarization (only when necessary)

Tool Result Clearing

Remove raw tool outputs deep in history:

Observation Masking

Target environment observations only:

3. External Memory (RAG)

Move knowledge outside the context window:

Component Purpose
Vector Store Semantic similarity search
Knowledge Base Structured fact storage
Long-term Store Persistent memory across sessions

When to use RAG vs Context:

4. Windowing Strategies

Process long content in manageable chunks:

Sliding Window

[Window 1: tokens 0-4000    ]
      [Window 2: tokens 3000-7000    ]
            [Window 3: tokens 6000-10000   ]

Overlap Regions

Maintain coherence across windows:

Chunking

Break content at semantic boundaries:

5. Hierarchical Memory

Multi-tier storage with different characteristics:

Tier Content Retention
Short-term Verbatim recent turns 8-10 exchanges
Medium-term Compressed summaries Session duration
Long-term RAG/database Persistent

Implementation:

Query → Check short-term → Check medium-term → RAG long-term
                ↓                   ↓                ↓
           Full detail      Summary context    Retrieved facts

6. Monitoring & Auto-Management

Proactive context management:

Token Counting

Track usage before hitting limits:

if token_count > (max_context * 0.75):
    trigger_compaction()

Rot Threshold

Don’t wait for API errors:

Auto-Compaction

Automatic context management:

  1. Monitor token count
  2. At threshold, analyze context
  3. Apply compaction first
  4. Fall back to summarization if needed

Implementation Patterns

Pattern 1: Customer Service Bot

Sliding Window (recent 8-10 messages)
    + RAG (knowledge base)
    + Periodic summarization (>20 exchanges)

Pattern 2: Code Agent

Subagents (isolated contexts)
    + Tool result clearing
    + File reference compaction
    + External memory for docs

Pattern 3: Research Agent

Hierarchical memory
    + RAG for sources
    + Summarization for synthesis
    + Scoped context per subtask

Best Practices

  1. Compress tool outputs - Don’t add 100 rows when 5 suffice
  2. Use subagents for depth - Isolated windows for specialized tasks
  3. Prefer compaction over summarization - Reversibility matters
  4. Monitor proactively - Don’t wait for errors
  5. Scope by default - Minimum context per call
  6. Keep recent full-detail - Summarize older history

Anti-Patterns

Anti-Pattern Problem Solution
Stuffing entire codebase Exceeds limits, rot Use RAG + file references
No summarization strategy Quality degrades Implement thresholds
Single monolithic context Pollution spreads Use subagents
Ignoring token count Sudden failures Monitor proactively
Pattern Relationship
LLM Tool Call Tool outputs need clearing
Agent Orchestration Subagent isolation
Agentic RAG External memory retrieval
Skills Pattern Scoped context per skill

Sources