This page exists so the docs and product routes use terms consistently. Most entries cluster around three layers — prompting, evaluation, and infrastructure — and glossary terms tend to land better when you can see them used in a real workflow. After the glossary, the natural next route is /docs/skills for product context.
Read definitions here, then validate them against a real route. Terms like rubric, gold item, tool use, RAG, or context window stay abstract until you see them inside a challenge, prompt, or workspace workflow. Use the glossary as a reset, not as the only place you learn the concept. Read challenge docs.
Core LLM concepts
These are the terms you will see most often when reading the public docs and using tool, prompt, or challenge surfaces across the product.
Agentic workflow
A multi-step system where a model reasons, uses tools, and decides what to do next instead of producing a single isolated answer.
Context window
The total amount of text the model can process in one request, including your prompt, supporting context, and generated output.
MCP
Model Context Protocol. A standard way for editors or agent hosts to talk to external tools and data systems through a consistent interface.
Tool use
A model capability that lets the system call deterministic tools or APIs while solving a broader task.
Prompting and retrieval terms
These concepts matter most when you are using the prompt library, public skill tracks, or challenge workflows that depend on prompt quality and grounded output.
System prompt
The higher-order instruction that sets the behavior, role, and boundaries for the model across a conversation or workflow.
Few-shot prompting
Providing a small number of examples in the prompt so the model can imitate the expected style or output shape.
Chain of thought
A prompting technique that asks for intermediate reasoning steps when the task benefits from more explicit decomposition.
RAG
Retrieval-augmented generation. Relevant documents are fetched first, then inserted into context so answers are better grounded.
Model and runtime parameters
These terms show up when you are thinking about cost, performance, determinism, or how a provider should be used in a challenge or workspace flow.
Token
The unit models use to process text. Cost, context-window size, and output length are usually expressed in tokens.
Temperature
A sampling parameter that affects how deterministic or varied the output is. Lower values are more stable; higher values are more creative.
Embedding
A vector representation of text or other content used for semantic search, retrieval, or similarity-based grouping.
Hallucination
When a model produces confident-sounding output that is unsupported, incorrect, or not grounded in the provided context.
Infrastructure and operations
These terms matter when the work moves beyond a single prompt and becomes something you want to run, monitor, or maintain over time.
Inference
Running a trained model against live inputs to generate output. In practice, this is the “actual usage” phase of the model.
Fine-tuning
Further training a base model on task-specific data so it performs better in a narrower domain or format.
LLMOps
The operational layer around deploying, evaluating, monitoring, and governing LLM-based systems.
Vector database
A storage system optimized for embeddings and similarity search, commonly used in retrieval-heavy applications.
Use the glossary as a map, then move into a route where the term becomes concrete. If you just learned a term like RAG, MCP, rubric, or gold item, open the related docs, skill, or challenge page next. The concept becomes much more useful once you can see it inside a real product surface. Open skills docs.