Artificial Intelligence (AI)
The field of building machines that perform tasks normally requiring human intelligence.
196 terms · 15 categories
All AI terms in English (the standard usage) with simple definitions.
The field of building machines that perform tasks normally requiring human intelligence.
A subfield of AI where a system learns from data rather than being explicitly programmed.
ML based on neural networks with many layers.
AI specialized in a single task (e.g. playing chess).
A hypothetical AI capable of performing any intellectual task a human can.
Used in: ch.16
An AI that would vastly surpass human intelligence across all domains.
A classical approach based on explicit rules and formal logic.
A mathematical representation learned from data.
The phase where a trained model makes predictions.
The phase where the model learns from data.
Learning from labeled data (input → known output).
Learning without labels; the model finds hidden structure in data.
The model generates its own labels from the data itself.
Used in: ch.06
Learning by trial and error with rewards.
Used in: ch.08
Reusing a model trained on one task for a different but related task.
Learning from very few examples.
Succeeding at a task without having seen any examples during training.
'Learning to learn' — training a model to adapt quickly to new tasks.
Distributed training without centralizing data.
The model learns new tasks without forgetting previous ones.
A basic computation unit in a network, inspired by biological neurons.
A set of neurons processed in parallel.
Parameters that scale the connections between neurons.
A constant term added to the weighted sum.
A non-linear function applied to a neuron's output.
A very common activation function: max(0, x).
Converts a vector into a probability distribution.
Computing the output from a given input.
Computing gradients to update the weights.
Used in: ch.06
An algorithm that adjusts weights to minimize error.
Used in: ch.01, ch.03, ch.04, ch.05, ch.06, ch.08, ch.14, ch.21
Measures the error between a prediction and the ground truth.
The standard loss function for classification tasks.
An algorithm that drives weight updates (Adam, SGD, etc.).
The step size during gradient descent.
Used in: ch.06
One full pass through the training dataset.
A subset of data processed together.
A network where information flows in one direction only.
A classic feedforward network with dense layers.
Specialized for images using convolutional filters.
Processes sequences by maintaining an internal state.
Used in: ch.04
An improved RNN designed to remember information over long sequences.
Used in: ch.04
An attention-based architecture dominant in AI since 2017.
Used in: ch.01, ch.02, ch.04, ch.05, ch.07, ch.08, ch.10, ch.15, ch.17, ch.18, ch.20, ch.21
Each position in a sequence attends to all other positions.
Used in: ch.01, ch.03, ch.04, ch.05, ch.07, ch.08, ch.09, ch.15, ch.17, ch.18, ch.21
Multiple attention mechanisms running in parallel.
A classic architecture for translation and generation tasks.
Used in: ch.15
Injects word order information into a Transformer.
Used in: ch.05
A modern rotary positional encoding scheme, common in LLMs.
Used in: ch.05
A shortcut connection that skips over layers to stabilize training.
Normalization applied across a layer to stabilize training.
Multiple specialized sub-models activated selectively.
Used in: ch.05
A generative model that learns to denoise an image step by step.
Two competing networks: a generator and a discriminator.
A Transformer architecture applied to images.
An alternative architecture to the Transformer for long sequences.
The field of processing and understanding human language with machines.
Used in: ch.15
Splitting text into units called tokens.
The basic unit processed by a model (word, subword, or character).
Used in: ch.01, ch.02, ch.03, ch.04, ch.05, ch.06, ch.07, ch.08, ch.09, ch.10, ch.11, ch.12, ch.13, ch.14, ch.15, ch.17, ch.18, ch.19, ch.20, ch.21
A widely used tokenization algorithm.
Used in: ch.02
Other common tokenization algorithms.
Used in: ch.02
The set of all tokens known to a model.
A dense vector representation of a word or token.
Used in: ch.01, ch.03, ch.04, ch.05, ch.10, ch.11, ch.15, ch.21
An older technique for producing word embeddings.
An embedding that depends on the surrounding context.
Identifying named entities in text (people, places, etc.).
Detecting the emotional tone of a text.
Converting spoken audio into text.
Converting text into spoken audio.
Used in: ch.15
A model that predicts the probability of a sequence of words.
A massive language model trained on billions of words.
Used in: ch.01, ch.02, ch.03, ch.04, ch.05, ch.06, ch.07, ch.08, ch.09, ch.10, ch.11, ch.12, ch.13, ch.15, ch.16, ch.17, ch.18, ch.20
A general-purpose model reusable across many tasks.
AI capable of producing content (text, images, audio, etc.).
Initial training on large amounts of raw data.
Adapting a pre-trained model for a specific task.
Used in: ch.01, ch.08, ch.09, ch.11, ch.13, ch.14, ch.17, ch.19, ch.20
Fine-tuning to follow human instructions.
Refining a model using human feedback.
A simplified alternative to RLHF.
An alignment method based on a set of guiding principles.
Used in: ch.08
The input text given to a model.
Used in: ch.01, ch.07, ch.08, ch.09, ch.10, ch.11, ch.12, ch.17, ch.18, ch.21
General instructions given to the model before the conversation begins.
The art of crafting effective prompts.
Used in: ch.12
Prompting the model to reason step by step.
Providing a few examples within the prompt.
The model's ability to learn from examples provided in the prompt.
The amount of text a model can process at once.
Controls the randomness of outputs (low = deterministic, high = creative).
Sampling from the k most probable tokens.
Sampling from the smallest set of tokens whose cumulative probability reaches p.
Always choosing the most probable next token.
Used in: ch.07
Exploring several candidate sequences in parallel.
Used in: ch.07
The model generates false but plausible-sounding information.
The model retrieves external documents before generating a response.
A database that stores and searches embeddings by similarity.
Searching by meaning rather than exact keyword match.
An LLM optimized to reason at length before answering.
A model that handles multiple input types (text, image, audio, etc.).
A model that understands both text and images.
Generating an image from a text description.
A lightweight fine-tuning technique based on low-rank matrix decomposition.
Used in: ch.14
LoRA combined with quantization to reduce memory usage.
A family of parameter-efficient fine-tuning methods.
Used in: ch.14
An AI system that pursues a goal through multiple steps and tools.
The model's ability to call external functions.
Used in: ch.11
A pattern that alternates reasoning and action.
Used in: ch.11
An agent's ability to decompose a goal into subtasks.
Multiple agents that collaborate or coordinate.
A standard protocol for connecting tools to an LLM.
Used in: ch.11
An agent's ability to operate a computer.
Used in: ch.11
AI that acts autonomously within an environment.
A collection of data used to train or evaluate a model.
Data splits for training, tuning, and evaluation.
The correct answer associated with a training example.
The act of assigning labels to data.
Artificially generating more training data.
Data generated artificially, for example by an AI.
An input variable used by the model.
Used in: ch.20
Manually creating relevant input variables.
Encoding a category as a binary vector.
A gradual shift in data distribution compared to training time.
When test information leaks into training, causing inflated results.
A parameter set before training (learning rate, batch size, etc.).
Evaluating a model with multiple data splits.
The model memorizes training data and generalizes poorly.
Used in: ch.06
The model is too simple to capture the signal in the data.
Techniques that prevent overfitting.
Used in: ch.06
Randomly disabling neurons during training.
Stopping training when validation error starts increasing.
Gradients become too small to train early layers effectively.
The model forgets old tasks when learning new ones.
Transferring knowledge from a large model into a smaller one.
Used in: ch.21
Removing redundant weights to make the model smaller.
Reducing the numerical precision of weights.
Relationships between model size, data, compute, and performance.
The percentage of correct predictions.
Of all positive predictions, how many are actually correct.
Of all true positives, how many were correctly identified.
The harmonic mean of precision and recall.
A table comparing predictions against ground truth.
A curve and area under the curve for evaluating a classifier.
A measure of a language model's uncertainty (lower is better).
Metrics for evaluating machine translation and summarization.
A measure of the quality of generated images.
A standardized test for comparing models (MMLU, HumanEval, etc.).
Used in: ch.16
Adversarial testing to find weaknesses in a model.
An entity that makes decisions within an environment.
The world in which an agent operates.
The agent's strategy for choosing actions given a state.
A numerical signal indicating how good an action was.
A RL algorithm based on estimating action-value functions (Q).
A widely used RL algorithm, especially in RLHF.
The trade-off between trying new actions and using known good ones.
The agent finds unintended ways to maximize the reward signal.
Used in: ch.11
The AI field concerned with processing images and video.
Assigning a category label to an image.
Locating and classifying objects within an image.
Labeling every pixel in an image with a class.
Recognizing text in an image.
Classic object detection model architectures.
A universal segmentation model.
A learned 3D representation of a scene from 2D images.
Ensuring AI systems pursue goals that match human values.
The field of making AI reliable, controllable, and harmless.
An input crafted to fool a model.
Bypassing the safety guardrails of an LLM.
Used in: ch.08
Maliciously injecting instructions into a prompt.
Used in: ch.12
Systematic errors in data or model predictions.
Equity in algorithmic decisions.
The ability to understand why a model made a given decision.
Studying the internal workings of neural networks.
A formal method for protecting individual privacy in data.
AI-generated synthetic media designed to appear real.
A gap between a model's actual objectives and its intended ones.
A model that predicts how humans would rate a response.
The frameworks and regulations for overseeing AI development.
A graphics processing unit, widely used for AI workloads.
A specialized AI accelerator chip designed by Google.
Nvidia's parallel computing platform for GPU programming.
Floating point operations per second — a measure of compute.
Training spread across multiple machines.
Same weights, different data on each GPU.
The model is split across multiple GPUs.
The main deep learning frameworks.
A platform for models, datasets, and ML tools.
A standard format for exchanging models between frameworks.
AI running on embedded or on-device hardware.
The response time of a model.
Used in: ch.18
The number of inferences processed per second.
DevOps practices applied to machine learning.
Capabilities that appear suddenly beyond a certain model scale.
Used in: ch.19
Increasing model size, data, or compute to improve performance.
A model whose weights are publicly available.
A model at the cutting edge of current capabilities.
A compact language model optimized for efficiency.
Including humans in the learning or decision-making process.
An agent's internal model for predicting how the environment evolves.
AI situated in a physical body, such as a robot.
A document describing a model's capabilities, limitations, and biases.