Intermediate 15 terms

Databases & SQL

Core vocabulary for backend engineers and data engineers: indexes, transactions, ACID properties, query optimisation, JOIN types, and ORM patterns.

All vocabulary sets

Part of: Backend Developer Full-Stack Developer QA Engineer Data Scientist Solution Architect Technical Writer

Index /ˈɪndeks/

A data structure (commonly B-tree or hash) that the database engine uses to locate rows quickly without a full table scan. Improves read performance at the cost of extra storage and slower writes.

"Adding a composite index on (user_id, created_at) reduced the dashboard query from 8 seconds to 40ms — the planner can now satisfy the WHERE and ORDER BY without a table scan."
Transaction /trænˈzækʃən/

A unit of work that groups one or more SQL statements into an atomic operation. Either all statements commit or all are rolled back, leaving the database in a consistent state.

"We wrap the order creation and inventory decrement in a single transaction — if the payment fails, both changes are rolled back and no stock is deducted."
ACID /ˈæsɪd/

Atomicity, Consistency, Isolation, Durability — the four properties that guarantee reliable transaction processing. Atomicity: all-or-nothing. Consistency: valid state before and after. Isolation: concurrent transactions don't interfere. Durability: committed data survives crashes.

"Our financial ledger requires full ACID guarantees — we chose PostgreSQL over a NoSQL store precisely because every debit must be matched by a credit in the same atomic transaction."
Normalisation /ˌnɔːrməlaɪˈzeɪʃən/

The process of structuring a relational database to reduce data redundancy and improve data integrity. Achieved through a series of normal forms (1NF, 2NF, 3NF, BCNF). Opposite: denormalisation (used for read performance).

"During the schema review we found customer address duplicated in three tables — normalising it into a separate addresses table with a foreign key eliminated the inconsistency risk."
Foreign Key /ˈfɒrən kiː/

A column (or set of columns) in one table that references the primary key of another table, enforcing referential integrity — you cannot insert a row with a foreign key value that doesn't exist in the parent table.

"The orders.customer_id column is a foreign key referencing customers.id — the database prevents inserting an order for a customer that doesn't exist and blocks deleting a customer who has active orders."
Primary Key /ˈpraɪmeri kiː/

A column or combination of columns that uniquely identifies each row in a table. Must be NOT NULL and unique. The database automatically creates a unique index on the primary key.

"We use UUIDs as primary keys instead of auto-increment integers — this avoids ID collision when merging data from multiple database shards into a single system."
JOIN /dʒɔɪn/

An SQL operation that combines rows from two or more tables based on a related column. Types: INNER JOIN (only matching rows), LEFT JOIN (all from left + matching from right), RIGHT JOIN, FULL OUTER JOIN, CROSS JOIN.

"The report uses a LEFT JOIN between projects and time_entries — projects with no logged time still appear in the output with NULL hours, which INNER JOIN would have excluded."
Aggregate Function /ˈæɡrɪɡət ˈfʌŋkʃən/

An SQL function that computes a single result from a set of rows: COUNT(), SUM(), AVG(), MIN(), MAX(). Used with GROUP BY to compute metrics per group. Commonly paired with HAVING to filter aggregated results.

"SELECT department_id, AVG(salary) FROM employees GROUP BY department_id HAVING AVG(salary) > 80000 — this aggregate query finds departments where the average salary exceeds 80,000."
Query Plan / EXPLAIN /ˈkwɪəri plæn / ɪkˈspleɪn/

The execution strategy the query planner chooses to retrieve data: which indexes to use, join order, scan type (sequential vs index). EXPLAIN (or EXPLAIN ANALYZE in PostgreSQL) shows the plan, estimated vs actual row counts, and cost.

"Running EXPLAIN ANALYZE on the slow query revealed a Seq Scan on a 10M-row table — the planner was ignoring the index because the column had outdated statistics. After ANALYZE, it switched to an Index Scan and the query dropped from 12s to 0.2s."
N+1 Problem /ˈen plʌs wʌn ˈprɒbləm/

A common ORM performance anti-pattern: one query fetches N parent records, then N additional queries are made to fetch related child records — one per parent. Results in N+1 total queries instead of 1 or 2 with proper JOINs or eager loading.

"The API was making 101 queries per request — 1 to fetch 100 orders, then 1 per order to fetch the customer. Switching the ORM to eager-load customers fixed the N+1 problem and reduced it to 2 queries."
Deadlock /ˈdedlɒk/

A situation where two or more transactions are each waiting for the other to release a lock, resulting in a circular dependency that neither can resolve. The database detects this and kills one transaction (the deadlock victim) to break the cycle.

"After the batch job started running concurrently, we saw deadlock errors in the logs — transaction A locked table payments then tried to lock orders, while transaction B held orders and waited for payments."
Migration /maɪˈɡreɪʃən/

A versioned script (usually SQL or DSL) that describes a change to the database schema: creating tables, adding columns, modifying indexes, seeding data. Migrations are applied in sequence and tracked by a migration runner (Flyway, Liquibase, Rails Active Record).

"We never modify the database schema by hand — every change is written as a migration file, reviewed in a PR, and applied automatically during deployment by the CI pipeline."
ORM /ˌoʊ ɑːr ˈem/

Object-Relational Mapping — a library that maps database tables to objects in application code, allowing developers to query and manipulate data using the programming language instead of raw SQL. Examples: SQLAlchemy (Python), Hibernate (Java), Prisma (JS/TS), ActiveRecord (Ruby).

"We use Prisma as our ORM — it auto-generates a type-safe client from the schema, so TypeScript catches mismatched query arguments at compile time before they reach the database."
Connection Pool /kəˈnekʃən puːl/

A cache of pre-established database connections that are reused by application threads instead of creating a new connection for each query. Connection creation is expensive; a pool limits overhead and controls concurrency.

"Under load testing, the app ran out of connections — we had 200 Node.js workers each creating their own connections. Adding PgBouncer as a connection pooler reduced actual DB connections from 200 to 20 while handling the same throughput."
Sharding /ˈʃɑːrdɪŋ/

Horizontal partitioning of a database — splitting data across multiple database instances (shards) based on a shard key (e.g., user_id % N). Each shard holds a subset of the data. Used to scale beyond what a single database server can handle.

"We shard messages by conversation_id — all messages in a conversation live on the same shard, which keeps conversation queries local and avoids cross-shard JOINs that would be expensive."

Databases & SQL

Quick Quiz — Databases & SQL

Quiz complete!