Context Management
Strategies for efficiently using an LLM's limited context window, including what information to include, how to compress it, and when to summarize or truncate.
Why It Matters
Effective context management can double the useful capacity of an LLM. Poor management wastes tokens on irrelevant information while missing critical details.
Example
Summarizing older conversation turns, keeping only the most recent messages in full, and including a compressed summary of key facts from the entire conversation.
Think of it like...
Like packing a suitcase with a weight limit — you need to be strategic about what goes in, keeping essentials and leaving out things you can live without.
Related Terms
Context Window
The maximum amount of text (measured in tokens) that a language model can process in a single interaction. It includes both the input prompt and the generated output. Larger context windows allow models to handle longer documents.
Token
The basic unit of text that language models process. A token can be a word, part of a word, or a punctuation mark. Text is broken into tokens before being fed into an LLM, and the model generates output one token at a time.
Summarization
The NLP task of condensing a longer text into a shorter version while preserving the key information and main points. Summarization can be extractive (selecting key sentences) or abstractive (generating new text).
Retrieval-Augmented Generation
A technique that enhances LLM outputs by first retrieving relevant information from external knowledge sources and then using that information as context for generation. RAG combines the power of search with the fluency of language models.