196 terms · 15 categories

The complete glossary

All AI terms in English (the standard usage) with simple definitions.

Fundamentals

10 terms

Artificial Intelligence (AI)

The field of building machines that perform tasks normally requiring human intelligence.

Used in: ch.01, ch.08, ch.10, ch.11, ch.13, ch.19

Machine Learning (ML)

A subfield of AI where a system learns from data rather than being explicitly programmed.

Deep Learning

ML based on neural networks with many layers.

Narrow AI

AI specialized in a single task (e.g. playing chess).

AGI (Artificial General Intelligence)

A hypothetical AI capable of performing any intellectual task a human can.

Used in: ch.16

Superintelligence

An AI that would vastly surpass human intelligence across all domains.

Symbolic AI

A classical approach based on explicit rules and formal logic.

Model

A mathematical representation learned from data.

Used in: ch.08, ch.11, ch.13

Inference

The phase where a trained model makes predictions.

Training

The phase where the model learns from data.

Learning Types

10 terms

Supervised Learning

Learning from labeled data (input → known output).

Used in: ch.08, ch.14

Unsupervised Learning

Learning without labels; the model finds hidden structure in data.

Self-supervised Learning

The model generates its own labels from the data itself.

Used in: ch.06

Reinforcement Learning

Learning by trial and error with rewards.

Used in: ch.08

Transfer Learning

Reusing a model trained on one task for a different but related task.

Few-shot Learning

Learning from very few examples.

Zero-shot Learning

Succeeding at a task without having seen any examples during training.

Meta-learning

'Learning to learn' — training a model to adapt quickly to new tasks.

Federated Learning

Distributed training without centralizing data.

Continual Learning

The model learns new tasks without forgetting previous ones.

Neural Networks

16 terms

Neuron / Unit

A basic computation unit in a network, inspired by biological neurons.

Layer

A set of neurons processed in parallel.

Weights

Parameters that scale the connections between neurons.

Bias

A constant term added to the weighted sum.

Activation Function

A non-linear function applied to a neuron's output.

ReLU

A very common activation function: max(0, x).

Softmax

Converts a vector into a probability distribution.

Used in: ch.01, ch.04, ch.05, ch.07

Forward Pass

Computing the output from a given input.

Backpropagation

Computing gradients to update the weights.

Used in: ch.06

Gradient Descent

An algorithm that adjusts weights to minimize error.

Used in: ch.01, ch.03, ch.04, ch.05, ch.06, ch.08, ch.14, ch.21

Loss Function

Measures the error between a prediction and the ground truth.

Used in: ch.01, ch.06, ch.08

Cross-Entropy Loss

The standard loss function for classification tasks.

Used in: ch.06, ch.13

Optimizer

An algorithm that drives weight updates (Adam, SGD, etc.).

Learning Rate

The step size during gradient descent.

Used in: ch.06

Epoch

One full pass through the training dataset.

Batch

A subset of data processed together.

Used in: ch.06, ch.18

Architectures

18 terms

Feedforward Network

A network where information flows in one direction only.

Used in: ch.04, ch.05

MLP (Multi-Layer Perceptron)

A classic feedforward network with dense layers.

Used in: ch.05, ch.15

CNN (Convolutional Neural Network)

Specialized for images using convolutional filters.

RNN (Recurrent Neural Network)

Processes sequences by maintaining an internal state.

Used in: ch.04

LSTM

An improved RNN designed to remember information over long sequences.

Used in: ch.04

Transformer

An attention-based architecture dominant in AI since 2017.

Used in: ch.01, ch.02, ch.04, ch.05, ch.07, ch.08, ch.10, ch.15, ch.17, ch.18, ch.20, ch.21

Self-Attention

Each position in a sequence attends to all other positions.

Used in: ch.01, ch.03, ch.04, ch.05, ch.07, ch.08, ch.09, ch.15, ch.17, ch.18, ch.21

Multi-Head Attention

Multiple attention mechanisms running in parallel.

Used in: ch.04, ch.05

Encoder-Decoder

A classic architecture for translation and generation tasks.

Used in: ch.15

Positional Encoding

Injects word order information into a Transformer.

Used in: ch.05

RoPE

A modern rotary positional encoding scheme, common in LLMs.

Used in: ch.05

Residual Connection

A shortcut connection that skips over layers to stabilize training.

Used in: ch.04, ch.05

Layer Normalization

Normalization applied across a layer to stabilize training.

Used in: ch.04, ch.05, ch.19

Mixture of Experts (MoE)

Multiple specialized sub-models activated selectively.

Used in: ch.05

Diffusion Model

A generative model that learns to denoise an image step by step.

GAN (Generative Adversarial Network)

Two competing networks: a generator and a discriminator.

Vision Transformer (ViT)

A Transformer architecture applied to images.

Used in: ch.05, ch.09, ch.15

State Space Model / Mamba

An alternative architecture to the Transformer for long sequences.

Natural Language Processing

13 terms

NLP

The field of processing and understanding human language with machines.

Used in: ch.15

Tokenization

Splitting text into units called tokens.

Used in: ch.01, ch.02, ch.08, ch.15

Token

The basic unit processed by a model (word, subword, or character).

Used in: ch.01, ch.02, ch.03, ch.04, ch.05, ch.06, ch.07, ch.08, ch.09, ch.10, ch.11, ch.12, ch.13, ch.14, ch.15, ch.17, ch.18, ch.19, ch.20, ch.21

BPE (Byte Pair Encoding)

A widely used tokenization algorithm.

Used in: ch.02

WordPiece / SentencePiece

Other common tokenization algorithms.

Used in: ch.02

Vocabulary

The set of all tokens known to a model.

Used in: ch.01, ch.02, ch.05, ch.07, ch.20

Embedding

A dense vector representation of a word or token.

Used in: ch.01, ch.03, ch.04, ch.05, ch.10, ch.11, ch.15, ch.21

Word2Vec

An older technique for producing word embeddings.

Contextual Embedding

An embedding that depends on the surrounding context.

Named Entity Recognition

Identifying named entities in text (people, places, etc.).

Sentiment Analysis

Detecting the emotional tone of a text.

Speech Recognition (ASR)

Converting spoken audio into text.

Text-to-Speech (TTS)

Converting text into spoken audio.

Used in: ch.15

Language Models & Generative AI

33 terms

Language Model

A model that predicts the probability of a sequence of words.

Used in: ch.01, ch.02, ch.04, ch.16

LLM (Large Language Model)

A massive language model trained on billions of words.

Used in: ch.01, ch.02, ch.03, ch.04, ch.05, ch.06, ch.07, ch.08, ch.09, ch.10, ch.11, ch.12, ch.13, ch.15, ch.16, ch.17, ch.18, ch.20

Foundation Model

A general-purpose model reusable across many tasks.

Generative AI

AI capable of producing content (text, images, audio, etc.).

Pre-training

Initial training on large amounts of raw data.

Used in: ch.06, ch.08, ch.12, ch.13, ch.14, ch.15

Fine-tuning

Adapting a pre-trained model for a specific task.

Used in: ch.01, ch.08, ch.09, ch.11, ch.13, ch.14, ch.17, ch.19, ch.20

Instruction Tuning

Fine-tuning to follow human instructions.

Used in: ch.08, ch.14

RLHF

Refining a model using human feedback.

Used in: ch.01, ch.08, ch.13, ch.14, ch.20

DPO (Direct Preference Optimization)

A simplified alternative to RLHF.

Used in: ch.08, ch.14

Constitutional AI

An alignment method based on a set of guiding principles.

Used in: ch.08

Prompt

The input text given to a model.

Used in: ch.01, ch.07, ch.08, ch.09, ch.10, ch.11, ch.12, ch.17, ch.18, ch.21

System Prompt

General instructions given to the model before the conversation begins.

Used in: ch.09, ch.18

Prompt Engineering

The art of crafting effective prompts.

Used in: ch.12

Chain-of-Thought (CoT)

Prompting the model to reason step by step.

Used in: ch.12, ch.17

Few-shot Prompting

Providing a few examples within the prompt.

In-Context Learning

The model's ability to learn from examples provided in the prompt.

Used in: ch.12, ch.20

Context Window

The amount of text a model can process at once.

Used in: ch.01, ch.02, ch.04, ch.08, ch.09, ch.12

Temperature

Controls the randomness of outputs (low = deterministic, high = creative).

Used in: ch.01, ch.07, ch.12

Top-k Sampling

Sampling from the k most probable tokens.

Used in: ch.01, ch.07

Top-p / Nucleus Sampling

Sampling from the smallest set of tokens whose cumulative probability reaches p.

Used in: ch.01, ch.07

Greedy Decoding

Always choosing the most probable next token.

Used in: ch.07

Beam Search

Exploring several candidate sequences in parallel.

Used in: ch.07

Hallucination

The model generates false but plausible-sounding information.

Used in: ch.08, ch.10, ch.11, ch.13, ch.17

RAG (Retrieval-Augmented Generation)

The model retrieves external documents before generating a response.

Used in: ch.01, ch.03, ch.08, ch.10, ch.11, ch.13, ch.14

Vector Database

A database that stores and searches embeddings by similarity.

Semantic Search

Searching by meaning rather than exact keyword match.

Reasoning Model

An LLM optimized to reason at length before answering.

Multimodal Model

A model that handles multiple input types (text, image, audio, etc.).

Vision-Language Model

A model that understands both text and images.

Text-to-Image

Generating an image from a text description.

LoRA

A lightweight fine-tuning technique based on low-rank matrix decomposition.

Used in: ch.14

QLoRA

LoRA combined with quantization to reduce memory usage.

Used in: ch.14, ch.18

PEFT

A family of parameter-efficient fine-tuning methods.

Used in: ch.14

Agents & Tools

8 terms

AI Agent

An AI system that pursues a goal through multiple steps and tools.

Tool Use / Function Calling

The model's ability to call external functions.

Used in: ch.11

ReAct

A pattern that alternates reasoning and action.

Used in: ch.11

Planning

An agent's ability to decompose a goal into subtasks.

Used in: ch.11, ch.21

Multi-Agent System

Multiple agents that collaborate or coordinate.

MCP (Model Context Protocol)

A standard protocol for connecting tools to an LLM.

Used in: ch.11

Computer Use

An agent's ability to operate a computer.

Used in: ch.11

Agentic AI

AI that acts autonomously within an environment.

Data

11 terms

Dataset

A collection of data used to train or evaluate a model.

Used in: ch.08, ch.14, ch.16, ch.21

Training / Validation / Test Set

Data splits for training, tuning, and evaluation.

Used in: ch.06, ch.19

Label / Ground Truth

The correct answer associated with a training example.

Annotation

The act of assigning labels to data.

Data Augmentation

Artificially generating more training data.

Synthetic Data

Data generated artificially, for example by an AI.

Feature

An input variable used by the model.

Used in: ch.20

Feature Engineering

Manually creating relevant input variables.

One-Hot Encoding

Encoding a category as a binary vector.

Data Drift

A gradual shift in data distribution compared to training time.

Data Leakage

When test information leaks into training, causing inflated results.

Training & Optimization

13 terms

Hyperparameter

A parameter set before training (learning rate, batch size, etc.).

Cross-Validation

Evaluating a model with multiple data splits.

Overfitting

The model memorizes training data and generalizes poorly.

Used in: ch.06

Underfitting

The model is too simple to capture the signal in the data.

Regularization

Techniques that prevent overfitting.

Used in: ch.06

Dropout

Randomly disabling neurons during training.

Early Stopping

Stopping training when validation error starts increasing.

Vanishing Gradient

Gradients become too small to train early layers effectively.

Catastrophic Forgetting

The model forgets old tasks when learning new ones.

Distillation

Transferring knowledge from a large model into a smaller one.

Used in: ch.21

Pruning

Removing redundant weights to make the model smaller.

Quantization

Reducing the numerical precision of weights.

Used in: ch.07, ch.14, ch.18

Scaling Laws

Relationships between model size, data, compute, and performance.

Used in: ch.01, ch.06, ch.08, ch.17, ch.19, ch.21

Evaluation & Metrics

11 terms

Accuracy

The percentage of correct predictions.

Precision

Of all positive predictions, how many are actually correct.

Recall

Of all true positives, how many were correctly identified.

F1 Score

The harmonic mean of precision and recall.

Confusion Matrix

A table comparing predictions against ground truth.

ROC / AUC

A curve and area under the curve for evaluating a classifier.

Perplexity

A measure of a language model's uncertainty (lower is better).

BLEU / ROUGE

Metrics for evaluating machine translation and summarization.

Used in: ch.01, ch.06

FID

A measure of the quality of generated images.

Benchmark

A standardized test for comparing models (MMLU, HumanEval, etc.).

Used in: ch.16

Red Teaming

Adversarial testing to find weaknesses in a model.

Reinforcement Learning

8 terms

Agent

An entity that makes decisions within an environment.

Used in: ch.11, ch.18

Environment

The world in which an agent operates.

Policy

The agent's strategy for choosing actions given a state.

Reward

A numerical signal indicating how good an action was.

Used in: ch.06, ch.08, ch.11, ch.13

Q-Learning

A RL algorithm based on estimating action-value functions (Q).

PPO

A widely used RL algorithm, especially in RLHF.

Used in: ch.08, ch.14

Exploration vs Exploitation

The trade-off between trying new actions and using known good ones.

Reward Hacking

The agent finds unintended ways to maximize the reward signal.

Used in: ch.11

Computer Vision

8 terms

Computer Vision

The AI field concerned with processing images and video.

Image Classification

Assigning a category label to an image.

Object Detection

Locating and classifying objects within an image.

Semantic Segmentation

Labeling every pixel in an image with a class.

OCR

Recognizing text in an image.

YOLO / R-CNN

Classic object detection model architectures.

SAM (Segment Anything)

A universal segmentation model.

NeRF

A learned 3D representation of a scene from 2D images.

Safety, Alignment & Ethics

14 terms

AI Alignment

Ensuring AI systems pursue goals that match human values.

Used in: ch.07, ch.08, ch.13, ch.21

AI Safety

The field of making AI reliable, controllable, and harmless.

Adversarial Attack

An input crafted to fool a model.

Jailbreak

Bypassing the safety guardrails of an LLM.

Used in: ch.08

Prompt Injection

Maliciously injecting instructions into a prompt.

Used in: ch.12

Bias (algorithmic)

Systematic errors in data or model predictions.

Fairness

Equity in algorithmic decisions.

Explainability

The ability to understand why a model made a given decision.

Mechanistic Interpretability

Studying the internal workings of neural networks.

Differential Privacy

A formal method for protecting individual privacy in data.

Deepfake

AI-generated synthetic media designed to appear real.

Misalignment

A gap between a model's actual objectives and its intended ones.

Reward Model

A model that predicts how humans would rate a response.

Used in: ch.08, ch.13

AI Governance

The frameworks and regulations for overseeing AI development.

Infrastructure & MLOps

14 terms

GPU

A graphics processing unit, widely used for AI workloads.

Used in: ch.06, ch.09, ch.14, ch.18, ch.21

TPU

A specialized AI accelerator chip designed by Google.

CUDA

Nvidia's parallel computing platform for GPU programming.

FLOPS

Floating point operations per second — a measure of compute.

Used in: ch.18, ch.19

Distributed Training

Training spread across multiple machines.

Data Parallelism

Same weights, different data on each GPU.

Model Parallelism

The model is split across multiple GPUs.

PyTorch / TensorFlow / JAX

The main deep learning frameworks.

Hugging Face

A platform for models, datasets, and ML tools.

Used in: ch.14, ch.18

ONNX

A standard format for exchanging models between frameworks.

Edge AI

AI running on embedded or on-device hardware.

Latency

The response time of a model.

Used in: ch.18

Throughput

The number of inferences processed per second.

MLOps

DevOps practices applied to machine learning.

Emerging Concepts

9 terms

Emergent Abilities

Capabilities that appear suddenly beyond a certain model scale.

Used in: ch.19

Scaling

Increasing model size, data, or compute to improve performance.

Used in: ch.06, ch.17, ch.19, ch.20

Open Weights

A model whose weights are publicly available.

Frontier Model

A model at the cutting edge of current capabilities.

Small Language Model (SLM)

A compact language model optimized for efficiency.

Human-in-the-Loop

Including humans in the learning or decision-making process.

World Model

An agent's internal model for predicting how the environment evolves.

Embodied AI

AI situated in a physical body, such as a robot.

Model Card

A document describing a model's capabilities, limitations, and biases.