Database Vocabulary: SQL, NoSQL, Indexing, and Transactions Explained

Essential database vocabulary for developers: SQL vs NoSQL, ACID properties, indexing, transactions, normalization, sharding, replication, and 25 more terms.

Databases are at the core of nearly every application — but database vocabulary is scattered between relational theory, SQL syntax, distributed systems concepts, and vendor-specific terminology. This guide covers the essential terms you need for database discussions, design reviews, and technical interviews.


Relational Databases & SQL

Relational Database

A relational database organises data into tables (relations) with rows and columns. Relationships between tables are defined by foreign keys. Examples: PostgreSQL, MySQL, SQLite, Oracle, SQL Server.

SQL (Structured Query Language)

SQL is the standard language for querying and managing relational databases. Pronounced “sequel” or “S-Q-L” (both are correct). Core SQL operations:

  • SELECT — read data
  • INSERT — add data
  • UPDATE — modify data
  • DELETE — remove data
  • JOIN — combine data from multiple tables

Primary Key

A primary key is a column (or combination of columns) that uniquely identifies each row in a table. Every table should have one. Often an auto-incrementing integer (id) or a UUID.

Foreign Key

A foreign key is a column in one table that references the primary key in another table. It establishes a relationship between the two tables.

“The orders table has a user_id foreign key that references users.id.”

Index

An index is a data structure that speeds up queries on a column. Instead of scanning every row, the database uses the index to jump directly to matching rows. Indexes speed up reads but slow down writes (because they must be updated on insert/update/delete).

“Without an index on email, every login query does a full table scan — running on millions of rows.”

Query Optimisation / Query Plan

The query plan (or execution plan) is the strategy the database engine chooses to execute a query. The EXPLAIN command in PostgreSQL and MySQL shows the plan and helps identify slow queries.

Normalisation

Normalisation is the process of structuring a database to reduce redundancy and improve data integrity. It involves splitting data into separate tables and using foreign keys. Normal forms: 1NF, 2NF, 3NF, BCNF.

Denormalisation

Denormalisation is intentionally adding redundancy for performance — for example, storing a pre-computed total in the orders table instead of calculating it from individual items on every read.

JOIN

A JOIN combines rows from two tables based on a related column. Types:

  • INNER JOIN — only rows that match in both tables
  • LEFT JOIN — all rows from the left table, and matched rows from the right
  • RIGHT JOIN — all rows from the right table, and matched rows from the left
  • FULL OUTER JOIN — all rows from both tables

ACID Properties

ACID defines the properties that guarantee reliable database transactions:

Atomicity

A transaction is atomic: either all operations complete successfully, or none of them do. If a payment fails halfway through, all changes are rolled back.

Consistency

The database moves from one consistent state to another — all data integrity rules (constraints, foreign keys, cascades) are upheld.

Isolation

Isolation means concurrent transactions do not interfere with each other. The result of parallel transactions should be the same as if they ran sequentially.

Durability

Once a transaction is committed, it will survive system failures. The data is written to persistent storage.


Transactions

Transaction

A transaction is a group of operations executed as a single unit. Either all succeed (commit) or all fail (rollback).

BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;

Commit / Rollback

  • Commit — save the transaction permanently
  • Rollback — undo all changes since the transaction began

Deadlock

A deadlock occurs when two transactions each hold a resource the other needs, and both are waiting. Neither can proceed. Databases detect deadlocks and abort one transaction.


NoSQL Databases

NoSQL

NoSQL (“Not only SQL”) refers to databases that do not use the relational table model. They are designed for scalability, flexibility, or specific data access patterns.

Types:

  • Document — stores JSON-like documents. Examples: MongoDB, Firestore
  • Key-value — stores arbitrary values by key. Examples: Redis, DynamoDB
  • Column-family — stores data in column groups. Examples: Cassandra, HBase
  • Graph — stores nodes and edges. Examples: Neo4j, ArangoDB

Document Database

A document database stores data as documents (typically JSON or BSON). Each document can have a different structure — no fixed schema required.

“In MongoDB, each user document can have different optional fields — no schema migration needed.”

Key-Value Store

A key-value store associates a key with a value — similar to a hash map, but at database scale. Best for caching, session storage, leaderboards, and simple look-up scenarios.


Scaling & Distribution

Sharding

Sharding is horizontal partitioning — splitting data across multiple database instances (shards) based on a shard key. Allows scaling beyond a single machine’s capacity.

“We shard by user_id — users 0–1M go to shard 1, 1M–2M go to shard 2.”

Replication

Replication copies data from one database (primary) to one or more replicas (secondaries). Provides high availability and can distribute read load.

  • Primary (leader) — accepts writes
  • Replica (follower) — syncs from primary, serves reads

CAP Theorem

The CAP theorem states that a distributed database can guarantee only two of three properties simultaneously:

  • Consistency — every read reflects the latest write
  • Availability — every request gets a response
  • Partition Tolerance — the system operates despite network partitions

In practice, network partitions happen — so the real choice is between CP and AP.

Eventual Consistency

In eventually consistent systems, all replicas will converge to the same state eventually — but there may be a window where different replicas return different data. Used in many distributed NoSQL systems.

Connection Pool

A connection pool is a cache of database connections that can be reused by incoming requests. Opening a connection is expensive; a pool keeps a set of connections ready.


Performance & Monitoring

Query Performance / Slow Query Log

Most databases can log slow queries — queries that exceed a configurable time threshold. Identifying and optimising slow queries is a common database tuning task.

N+1 Query Problem

The N+1 problem occurs when code fetches a list of N items, then makes an additional query for each item — resulting in N+1 total queries. Solution: use a JOIN or eager loading.

Caching

Database results can be cached in memory (using Redis or Memcached) to avoid repeated expensive queries. Cache invalidation — knowing when to expire the cache — is one of the hardest problems in software engineering.


Quick Reference

TermOne-liner
Primary keyUnique identifier for each row
Foreign keyReferences a primary key in another table
IndexData structure that speeds up queries
NormalisationReducing redundancy via related tables
AtomicityAll-or-nothing transaction execution
TransactionGroup of operations executed as one unit
DeadlockTwo transactions blocking each other indefinitely
ShardingSplitting data across multiple database instances
ReplicationCopying data from primary to replica(s)
CAP theoremConsistency, Availability, Partition tolerance — choose two
N+1 problemFetching list then querying per item — too many queries