AI Vocabulary
4 exercises — master the essential terms every developer needs to discuss AI: LLM, RAG, tokens, context window, fine-tuning, and more.
0 / 4 completed
Core AI vocabulary quick reference
- LLM — Large Language Model (e.g. GPT-4, Claude, Gemini)
- Token — sub-word unit; ~0.75 words; cost and context measured in tokens
- Context window — max tokens the model can process at once
- Temperature — 0 = deterministic, 1 = creative/random
- Embedding — vector representation of text for similarity search
- RAG — Retrieval-Augmented Generation (search + generate)
- Fine-tuning — retraining a model on domain data (modifies weights)
- Inference — using a trained model to generate output
1 / 4
What is a Large Language Model (LLM)?
An LLM (Large Language Model) is a type of deep learning model trained on very large text datasets. It learns statistical patterns in language and can:
• Generate text — write code, emails, articles, summaries
• Answer questions — based on patterns learned during training
• Translate languages
• Reason over context — analyse documents, compare options, explain concepts
Examples: GPT-4 (OpenAI), Claude (Anthropic), Gemini (Google), Llama (Meta), Mistral.
Key vocabulary:
• Parameters — the numbers inside the model that were trained. GPT-4 has an estimated ~1 trillion parameters.
• Pre-training — the initial training phase on massive text data
• Fine-tuning — additional training on a smaller dataset to specialise behaviour
• Inference — using the trained model to generate a response (the "running" phase, as opposed to training)
• Generate text — write code, emails, articles, summaries
• Answer questions — based on patterns learned during training
• Translate languages
• Reason over context — analyse documents, compare options, explain concepts
Examples: GPT-4 (OpenAI), Claude (Anthropic), Gemini (Google), Llama (Meta), Mistral.
Key vocabulary:
• Parameters — the numbers inside the model that were trained. GPT-4 has an estimated ~1 trillion parameters.
• Pre-training — the initial training phase on massive text data
• Fine-tuning — additional training on a smaller dataset to specialise behaviour
• Inference — using the trained model to generate a response (the "running" phase, as opposed to training)