Database Vocabulary: 80 Terms Every DBA Must Know

Database Administrators keep data available, consistent, and performant at scale. Their vocabulary spans SQL, performance tuning, high availability, backup strategies, security, and monitoring. This guide covers the 80 terms you need to discuss, document, and troubleshoot database systems in professional English.

Core ACID Properties

ACID

ACID defines the four properties that guarantee reliable transactions in a relational database:

Atomicity — all operations in a transaction succeed, or none do
Consistency — the database moves from one valid state to another
Isolation — concurrent transactions do not interfere with each other
Durability — once committed, changes persist even after a system failure

Transaction

A transaction is a unit of work that includes one or more database operations. Transactions begin with BEGIN, end with COMMIT (success) or ROLLBACK (failure).

Commit

A commit finalises a transaction — writing all changes permanently to storage.

Rollback

A rollback undoes all changes made within a transaction, restoring the database to its state before the transaction began.

Savepoint

A savepoint is a named point within a transaction. You can rollback to a savepoint without abandoning the entire transaction.

Isolation Levels

Isolation Level

The isolation level controls how much a running transaction is visible to other concurrent transactions. Standard SQL isolation levels (from weakest to strongest):

Level	Dirty Read	Non-Repeatable Read	Phantom Read
Read Uncommitted	Possible	Possible	Possible
Read Committed	Prevented	Possible	Possible
Repeatable Read	Prevented	Prevented	Possible (in theory)
Serializable	Prevented	Prevented	Prevented

Dirty Read

A dirty read occurs when a transaction reads data written by another transaction that has not yet committed — can read data that will be rolled back.

Non-Repeatable Read

A non-repeatable read occurs when a transaction re-reads a row and finds it has changed because another transaction committed an update.

Phantom Read

A phantom read occurs when a transaction re-executes a query and finds new rows that weren’t there before — because another transaction committed an insert.

MVCC (Multiversion Concurrency Control)

MVCC allows multiple transactions to read consistent snapshots of the data without blocking writers. PostgreSQL uses MVCC extensively — readers never block writers and writers never block readers.

Indexing

Index

An index is a data structure that speeds up data retrieval at the cost of additional storage and write overhead. Without an index, a query requires a sequential full-table scan.

B-Tree Index

The B-tree (balanced tree) index is the default index type in most relational databases. Supports equality (=) and range queries (<, >, BETWEEN).

Hash Index

A hash index is optimised for exact equality lookups only — not ranges. Faster than B-tree for equality but rarely used in practice (PostgreSQL hash indexes don’t support < or >).

Composite Index

A composite index covers multiple columns. The order matters: (last_name, first_name) can serve queries on last_name alone or (last_name, first_name), but not first_name alone (left-prefix rule).

Covering Index

A covering index includes all columns needed by a query — the database can satisfy the query entirely from the index without accessing the main table (heap) at all.

Index Bloat

Index bloat is accumulated dead tuples (from updates and deletes) that inflate index size and reduce performance. Resolved by VACUUM or index rebuild.

Partial Index

A partial index is built on a subset of rows matching a WHERE condition — smaller, faster, and more targeted for common query patterns.

“A partial index on (status) where status = 'pending' is tiny compared to a full index — most orders are in a terminal state.”

Query Performance

Execution Plan / Query Plan

The execution plan (or query plan) describes how the database engine will execute a query — which indexes to use, join methods, and scan types. Use EXPLAIN ANALYZE in PostgreSQL.

Sequential Scan

A sequential scan (Seq Scan) reads every row in a table. Appropriate for small tables or when a large fraction of rows must be returned. Slow for targeted lookups on large tables.

Index Scan

An index scan uses an index to jump directly to matching rows. Much faster than a sequential scan for selective queries; slower for bulk reads.

Bitmap Heap Scan

A bitmap heap scan uses an index to collect matching row locations in bulk, then reads them in physical order — efficient when many rows match but not all.

Nested Loop / Hash Join / Merge Join

These are PostgreSQL join strategies:

Nested Loop — for each row in the outer table, scan the inner table. Fast for small inputs.
Hash Join — build a hash table from the smaller input; probe it with rows from the larger input. Good for equijoins on large tables.
Merge Join — merge two pre-sorted inputs. Efficient when both sides have compatible indexes.

Statistics / pg_statistics

The query planner uses statistics about table data (row counts, value distributions, null fractions) to choose execution plans. Out-of-date statistics cause bad plans. Updated by ANALYZE.

Vacuuming

VACUUM reclaims storage from dead tuples (rows marked as deleted). AUTOVACUUM runs automatically. VACUUM ANALYZE also updates statistics.

High Availability

Replication

Replication copies data from a primary server to one or more replicas. Types:

Synchronous — primary waits for at least one replica to confirm write before acknowledging client. Zero data loss; higher latency.
Asynchronous — primary acknowledges immediately; replica catches up. Low latency; potential data loss on failover.

Primary / Standby

In a replicated setup, the primary (or master) handles read/write traffic. The standby (or replica, secondary) receives replicated changes and is promoted in a failover.

Failover

Failover is the process of switching from a failed primary to a standby. Automatic failover (using tools like Patroni, Pacemaker, or AWS RDS Multi-AZ) happens without manual intervention.

Switchover

A switchover is a planned, controlled role change — the primary gracefully hands off to a standby. Used for maintenance.

RPO (Recovery Point Objective)

RPO is the maximum acceptable amount of data loss measured in time — how far back the data can be recovered to. RPO = 0 means zero data loss; RPO = 1 hour means up to 1 hour of transactions could be lost.

RTO (Recovery Time Objective)

RTO is the maximum acceptable time to restore service after a failure. Short RTO requires automated failover; long RTO allows manual recovery.

“Our SLA requires RPO ≤ 5 minutes and RTO ≤ 15 minutes — that means synchronous streaming replication and automated failover.”

WAL (Write-Ahead Log)

The WAL (Write-Ahead Log) is the PostgreSQL transaction log — every change is written to the WAL before it is applied to data files. Used for durability, replication, and point-in-time recovery.

Backup & Recovery

Logical Backup

A logical backup exports data as SQL statements or CSV — human-readable, portable, but slow for large databases. Tools: pg_dump, mysqldump.

Physical Backup

A physical backup copies the raw data files — faster for large databases, but database-version-specific. Tools: pg_basebackup, Barman, pgBackRest.

PITR (Point-in-Time Recovery)

PITR restores a database to its state at any specific moment — using a base backup plus WAL replay. Critical for recovering from logical errors (accidental deletes).

“A developer accidentally dropped the users table at 14:23. We used PITR to restore to 14:22.”

Backup Window

The backup window is the scheduled maintenance time for running backups. Full backups are typically weekly; differential or incremental backups daily.

Partitioning & Sharding

Partitioning

Partitioning divides a single large table into smaller sub-tables (partitions) managed transparently by the database. Types:

Range partitioning — by date range (e.g., monthly partitions)
List partitioning — by discrete values (e.g., region)
Hash partitioning — by hash of a key (even distribution)

Benefits: faster queries on partitioned columns, efficient partition pruning, easier archiving.

Sharding

Sharding distributes data across multiple independent database instances (shards). Unlike partitioning (single database), sharding is a distributed architecture requiring application-level coordination. Used for massive write throughput.

Horizontal Scaling / Vertical Scaling

Vertical scaling — adding more CPU/RAM to the existing server. Simpler but limited.
Horizontal scaling — adding more database nodes (read replicas for reads; sharding for writes).

Connection & Resource Management

Connection Pool

A connection pool manages a set of reusable database connections — applications borrow and return connections rather than opening a new connection for each request. Examples: PgBouncer (PostgreSQL), HikariCP (Java).

Max Connections

Max connections is the limit on simultaneous client connections. PostgreSQL default is 100. Exceeding it causes FATAL: too many connections — connection pooling is the solution.

Lock / Deadlock

A lock prevents concurrent modification of the same data. A deadlock occurs when two transactions each hold a lock the other needs — both wait indefinitely. Databases detect and resolve deadlocks by aborting one transaction.

“We have a deadlock pattern: process A locks orders then customers; process B locks customers then orders. Consistent lock ordering resolves it.”

Statement Timeout / Lock Timeout

Statement timeout aborts a query that has run too long. Lock timeout aborts a statement that has waited too long to acquire a lock. Both prevent long-running operations from blocking others indefinitely.

Security

TDE (Transparent Data Encryption)

TDE encrypts the database storage files at rest — protecting data if the physical media is stolen. Transparent to applications.

Row-Level Security (RLS)

RLS enforces access policies at the row level — a user can only see rows matching a policy (e.g., only their own data).

Least Privilege

The principle of least privilege means each database user should have only the permissions needed for their function — no blanket SUPERUSER access in application accounts.

Audit Logging

Audit logging records who accessed or modified what data and when — required for compliance (GDPR, SOC 2, HIPAA). Extensions: pgaudit for PostgreSQL.

Useful Phrases

In performance reviews:

“The slow query log shows this join is doing a sequential scan on the orders table — we’re missing an index on customer_id.”
“After updating statistics and adding the composite index, query time dropped from 4.2 seconds to 12 milliseconds.”

In HA design discussions:

“With asynchronous replication, our RPO is approximately 30 seconds — is that acceptable for the business?”
“Patroni handles automatic failover — when the primary fails, the replica with the most recent WAL is promoted within 15 seconds.”

In incident response:

“We’re investigating a deadlock storm on the inventory table — I’m pulling the lock wait graph from pg_stat_activity.”

Practice

Test your DBA vocabulary with the Database Administration exercise set — 5 exercises covering ACID, performance, replication, and backup terminology.

Explore the DBA learning path for exercises, interview prep, and incident report writing practice.