SQL DDL, DML, and Constraints: CREATE, ALTER, DROP, INSERT, UPDATE, DELETE, MERGE, and Database Design Fundamentals
Every SQL tutorial teaches you to SELECT data. But someone had to CREATE the table first. Someone had to INSERT the data. Someone had to define the constraints that prevent bad data from entering. That someone is the data engineer.
DDL (Data Definition Language) builds the structure — tables, schemas, columns. DML (Data Manipulation Language) fills and changes the data — inserts, updates, deletes. Constraints are the guardrails that protect data integrity — primary keys, foreign keys, checks, defaults.
Think of DDL like architecture — designing the building (tables), rooms (columns), and doors (relationships). DML is the moving company — bringing in furniture (INSERT), rearranging it (UPDATE), and removing it (DELETE). Constraints are the building codes — “this room must have a fire exit” (NOT NULL), “no two apartments can have the same number” (UNIQUE), “the rent must be positive” (CHECK).
Table of Contents
- DDL vs DML vs DCL vs TCL — The Four SQL Categories
- CREATE TABLE — Building the Structure
- Data Types Reference
- Constraints: PK, FK, UNIQUE, CHECK, DEFAULT, NOT NULL
- Primary Key — The Unique Identifier
- Foreign Key — The Relationship Enforcer
- UNIQUE — No Duplicates Allowed
- CHECK — Value Validation
- DEFAULT — Automatic Values
- NOT NULL — Required Fields
- Composite Keys
- ALTER TABLE — Changing the Structure
- DROP and TRUNCATE — Removing Structure and Data
- DROP vs TRUNCATE vs DELETE
- INSERT — Adding Data
- UPDATE — Modifying Data
- DELETE — Removing Data
- MERGE — Upsert (Insert + Update in One Statement)
- IDENTITY / AUTO_INCREMENT — Auto-Generated IDs
- Creating Tables from Queries (SELECT INTO / CREATE AS)
- Schema Design Best Practices
- Common Mistakes
- Interview Questions
- Wrapping Up
DDL vs DML vs DCL vs TCL — The Four SQL Categories
| Category | Stands For | Commands | What It Does |
|---|---|---|---|
| DDL | Data Definition Language | CREATE, ALTER, DROP, TRUNCATE | Defines database structure |
| DML | Data Manipulation Language | INSERT, UPDATE, DELETE, MERGE | Manipulates data inside tables |
| DCL | Data Control Language | GRANT, REVOKE | Controls access permissions |
| TCL | Transaction Control Language | BEGIN, COMMIT, ROLLBACK | Manages transactions |
Real-life analogy: DDL is the architect (designs the building). DML is the residents (live in and modify the interior). DCL is the security guard (controls who enters). TCL is the contract (commit to the lease or cancel).
CREATE TABLE — Building the Structure
Basic Syntax
CREATE TABLE employees (
emp_id INT PRIMARY KEY,
first_name VARCHAR(50) NOT NULL,
last_name VARCHAR(50) NOT NULL,
email VARCHAR(100) UNIQUE,
department VARCHAR(50) DEFAULT 'Unassigned',
salary DECIMAL(10,2) CHECK (salary > 0),
hire_date DATE NOT NULL,
is_active BIT DEFAULT 1,
manager_id INT NULL,
created_at DATETIME DEFAULT GETDATE()
);
Every column has three parts: name, data type, and optional constraints.
CREATE TABLE with Foreign Key
-- Create parent table first
CREATE TABLE departments (
dept_id INT PRIMARY KEY IDENTITY(1,1),
dept_name VARCHAR(100) NOT NULL UNIQUE,
budget DECIMAL(15,2) DEFAULT 0,
created_at DATETIME DEFAULT GETDATE()
);
-- Create child table with foreign key
CREATE TABLE employees (
emp_id INT PRIMARY KEY IDENTITY(1,1),
first_name VARCHAR(50) NOT NULL,
last_name VARCHAR(50) NOT NULL,
email VARCHAR(100) UNIQUE,
dept_id INT NOT NULL,
salary DECIMAL(10,2) CHECK (salary >= 0),
hire_date DATE NOT NULL DEFAULT CAST(GETDATE() AS DATE),
manager_id INT NULL,
-- Foreign key constraint
CONSTRAINT FK_emp_department FOREIGN KEY (dept_id) REFERENCES departments(dept_id),
CONSTRAINT FK_emp_manager FOREIGN KEY (manager_id) REFERENCES employees(emp_id)
);
CREATE SCHEMA
-- Create a schema (namespace for tables)
CREATE SCHEMA sales;
CREATE SCHEMA hr;
-- Create table in a schema
CREATE TABLE sales.orders (
order_id INT PRIMARY KEY,
customer_id INT NOT NULL,
order_date DATE NOT NULL
);
CREATE TABLE hr.employees (
emp_id INT PRIMARY KEY,
first_name VARCHAR(50)
);
-- Access: schema.table
SELECT * FROM sales.orders;
SELECT * FROM hr.employees;
Data Types Reference
String Types
| Type | Description | Max Size | Example |
|---|---|---|---|
CHAR(n) |
Fixed-length string (padded with spaces) | 8,000 chars | CHAR(10) — always 10 chars |
VARCHAR(n) |
Variable-length string | 8,000 chars | VARCHAR(100) — up to 100 |
VARCHAR(MAX) |
Variable-length, large | 2 GB | Long text, JSON |
NVARCHAR(n) |
Unicode variable-length | 4,000 chars | Multi-language text |
TEXT |
Legacy large text (deprecated) | 2 GB | Use VARCHAR(MAX) instead |
When to use what: VARCHAR for most strings. NVARCHAR for international characters (Hindi, Chinese, Arabic). CHAR only for fixed-length codes (country codes, state codes).
Numeric Types
| Type | Description | Range | Example |
|---|---|---|---|
INT |
Integer | -2.1 billion to 2.1 billion | emp_id, counts |
BIGINT |
Large integer | -9.2 quintillion to 9.2 quintillion | IDs in large systems |
SMALLINT |
Small integer | -32,768 to 32,767 | Status codes |
TINYINT |
Tiny integer | 0 to 255 | Flags, ratings |
DECIMAL(p,s) |
Exact numeric | Precision p, scale s | DECIMAL(10,2) = 12345678.99 |
FLOAT |
Approximate numeric | 15 digits precision | Scientific calculations |
BIT |
Boolean (0 or 1) | 0 or 1 | is_active, is_deleted |
When to use what: INT for IDs and counts. DECIMAL for money (never FLOAT for money — rounding errors). BIT for true/false flags.
Date Types
| Type | Description | Format | Example |
|---|---|---|---|
DATE |
Date only | YYYY-MM-DD | ‘2026-05-25’ |
TIME |
Time only | HH:MM:SS | ’14:30:00′ |
DATETIME |
Date + time | YYYY-MM-DD HH:MM:SS | ‘2026-05-25 14:30:00’ |
DATETIME2 |
Higher precision | Nanoseconds | ‘2026-05-25 14:30:00.1234567’ |
DATETIMEOFFSET |
With timezone | +timezone | ‘2026-05-25 14:30:00 -04:00’ |
When to use what: DATE for dates without time (hire_date, birth_date). DATETIME2 for timestamps (created_at, updated_at). DATETIMEOFFSET for global applications.
Constraints: PK, FK, UNIQUE, CHECK, DEFAULT, NOT NULL
Constraints are rules enforced by the database. If data violates a constraint, the database REJECTS the INSERT or UPDATE.
Primary Key — The Unique Identifier
-- Every row must have a unique, non-null primary key
CREATE TABLE customers (
customer_id INT PRIMARY KEY, -- Unique + NOT NULL (automatic)
name VARCHAR(100)
);
-- Named constraint (better for error messages)
CREATE TABLE customers (
customer_id INT,
name VARCHAR(100),
CONSTRAINT PK_customers PRIMARY KEY (customer_id)
);
-- Composite primary key (two columns together)
CREATE TABLE order_items (
order_id INT,
product_id INT,
quantity INT,
CONSTRAINT PK_order_items PRIMARY KEY (order_id, product_id)
);
Rules: One primary key per table. Automatically UNIQUE and NOT NULL. Can be single or composite (multiple columns).
Real-life analogy: The primary key is like a passport number — unique to each person, cannot be blank, and identifies you anywhere in the system.
Foreign Key — The Relationship Enforcer
-- Foreign key: dept_id in employees must exist in departments
CREATE TABLE employees (
emp_id INT PRIMARY KEY,
name VARCHAR(50),
dept_id INT,
CONSTRAINT FK_emp_dept FOREIGN KEY (dept_id) REFERENCES departments(dept_id)
);
-- What foreign key prevents:
INSERT INTO employees VALUES (1, 'Naveen', 999);
-- ERROR: dept_id 999 does not exist in departments table!
DELETE FROM departments WHERE dept_id = 1;
-- ERROR: employees reference dept_id 1! Cannot delete parent while children exist.
CASCADE options:
-- ON DELETE CASCADE: if department is deleted, delete all its employees
CONSTRAINT FK_emp_dept FOREIGN KEY (dept_id)
REFERENCES departments(dept_id)
ON DELETE CASCADE;
-- ON DELETE SET NULL: if department is deleted, set employee dept_id to NULL
CONSTRAINT FK_emp_dept FOREIGN KEY (dept_id)
REFERENCES departments(dept_id)
ON DELETE SET NULL;
-- ON UPDATE CASCADE: if department ID changes, update all employee references
CONSTRAINT FK_emp_dept FOREIGN KEY (dept_id)
REFERENCES departments(dept_id)
ON UPDATE CASCADE;
Real-life analogy: A foreign key is like a hotel reservation referencing a valid room. You cannot book Room 999 if it does not exist. You cannot demolish Room 101 while a guest is staying there (unless CASCADE — all guests automatically checked out).
UNIQUE — No Duplicates Allowed
-- Email must be unique across all employees
CREATE TABLE employees (
emp_id INT PRIMARY KEY,
email VARCHAR(100) UNIQUE, -- No two employees can have same email
ssn VARCHAR(11) UNIQUE -- No two employees can have same SSN
);
-- Named unique constraint
CONSTRAINT UQ_employee_email UNIQUE (email)
-- Composite unique (combination must be unique)
CONSTRAINT UQ_dept_title UNIQUE (department, job_title)
-- Engineering + Senior is unique, Engineering + Junior is unique
-- But two Engineering + Senior would fail
Key difference from PK: A table can have multiple UNIQUE constraints. UNIQUE allows ONE NULL (PK does not allow NULL).
CHECK — Value Validation
-- Salary must be positive
CREATE TABLE employees (
salary DECIMAL(10,2) CHECK (salary > 0),
age INT CHECK (age BETWEEN 18 AND 100),
status VARCHAR(20) CHECK (status IN ('Active', 'Inactive', 'On Leave')),
email VARCHAR(100) CHECK (email LIKE '%@%.%'),
start_date DATE,
end_date DATE,
CONSTRAINT CHK_dates CHECK (end_date > start_date OR end_date IS NULL)
);
INSERT INTO employees (..., salary) VALUES (..., -5000);
-- ERROR: CHECK constraint violated! salary must be > 0
Real-life analogy: CHECK is like a form validation. “Age must be between 18 and 100.” “Status must be Active, Inactive, or On Leave.” The form rejects invalid entries before saving.
DEFAULT — Automatic Values
CREATE TABLE employees (
emp_id INT PRIMARY KEY IDENTITY(1,1),
status VARCHAR(20) DEFAULT 'Active',
is_active BIT DEFAULT 1,
created_at DATETIME DEFAULT GETDATE(),
country VARCHAR(50) DEFAULT 'Canada',
bonus DECIMAL(10,2) DEFAULT 0.00
);
-- INSERT without specifying defaults — they are filled automatically
INSERT INTO employees (emp_id) VALUES (1);
-- status='Active', is_active=1, created_at=2026-05-25 14:30:00, country='Canada', bonus=0.00
NOT NULL — Required Fields
CREATE TABLE employees (
emp_id INT PRIMARY KEY,
first_name VARCHAR(50) NOT NULL, -- Required
last_name VARCHAR(50) NOT NULL, -- Required
email VARCHAR(100) NULL, -- Optional (NULL allowed)
phone VARCHAR(20) -- NULL by default (optional)
);
INSERT INTO employees (emp_id, email) VALUES (1, 'test@email.com');
-- ERROR: first_name and last_name cannot be NULL!
ALTER TABLE — Changing the Structure
-- Add a column
ALTER TABLE employees ADD middle_name VARCHAR(50) NULL;
ALTER TABLE employees ADD updated_at DATETIME DEFAULT GETDATE();
-- Drop a column
ALTER TABLE employees DROP COLUMN middle_name;
-- Rename a column (SQL Server)
EXEC sp_rename 'employees.email', 'email_address', 'COLUMN';
-- Change data type
ALTER TABLE employees ALTER COLUMN phone VARCHAR(30);
-- Add a constraint
ALTER TABLE employees ADD CONSTRAINT CHK_salary CHECK (salary >= 0);
ALTER TABLE employees ADD CONSTRAINT DF_status DEFAULT 'Active' FOR status;
-- Drop a constraint
ALTER TABLE employees DROP CONSTRAINT CHK_salary;
-- Add a foreign key to existing table
ALTER TABLE employees ADD CONSTRAINT FK_emp_dept
FOREIGN KEY (dept_id) REFERENCES departments(dept_id);
DROP and TRUNCATE — Removing Structure and Data
DROP vs TRUNCATE vs DELETE
| Command | What It Removes | Table Still Exists? | Can Rollback? | Resets Identity? | Speed |
|---|---|---|---|---|---|
DROP TABLE |
Table structure + ALL data + constraints | No (table gone) | No | N/A | Instant |
TRUNCATE TABLE |
ALL data only | Yes (empty table remains) | No (usually) | Yes | Very fast |
DELETE FROM |
Specific rows (with WHERE) or all rows | Yes (table remains) | Yes (within transaction) | No | Slow on large tables |
-- DROP: remove table entirely
DROP TABLE IF EXISTS temp_results;
-- TRUNCATE: remove all rows, keep table structure
TRUNCATE TABLE staging_data;
-- Cannot TRUNCATE if foreign keys reference the table
-- DELETE: remove specific rows (or all)
DELETE FROM employees WHERE is_active = 0;
DELETE FROM staging_data; -- All rows, but slower than TRUNCATE
Real-life analogy: DROP is demolishing the building (everything gone). TRUNCATE is emptying every room but keeping the building (structure stays, furniture gone). DELETE is removing specific pieces of furniture from specific rooms (selective, slower, reversible).
INSERT — Adding Data
-- Insert single row (all columns)
INSERT INTO employees (emp_id, first_name, last_name, email, dept_id, salary, hire_date)
VALUES (1001, 'Naveen', 'Vuppula', 'naveen@email.com', 1, 105000, '2022-01-15');
-- Insert single row (relying on defaults for unspecified columns)
INSERT INTO employees (first_name, last_name, dept_id, salary, hire_date)
VALUES ('Shrey', 'Patil', 1, 95000, '2023-03-20');
-- emp_id auto-generated (IDENTITY), status defaults to 'Active', created_at defaults to GETDATE()
-- Insert multiple rows
INSERT INTO departments (dept_name, budget) VALUES
('Data Engineering', 500000),
('Analytics', 400000),
('Sales', 300000),
('Marketing', 250000);
-- Insert from a SELECT (copy data from another table)
INSERT INTO employees_archive (emp_id, first_name, last_name, salary)
SELECT emp_id, first_name, last_name, salary
FROM employees
WHERE is_active = 0;
-- Insert from another table with transformation
INSERT INTO monthly_summary (department, month, total_salary, headcount)
SELECT department, FORMAT(GETDATE(), 'yyyy-MM'), SUM(salary), COUNT(*)
FROM employees
WHERE is_active = 1
GROUP BY department;
UPDATE — Modifying Data
-- Update specific rows
UPDATE employees SET salary = 110000 WHERE emp_id = 1001;
-- Update multiple columns
UPDATE employees
SET salary = 110000, department = 'Data Engineering', updated_at = GETDATE()
WHERE emp_id = 1001;
-- Update with calculation
UPDATE employees SET salary = salary * 1.10 WHERE department = 'Engineering';
-- 10% raise for all engineers
-- Update using another table (correlated update)
UPDATE e
SET e.dept_name = d.dept_name
FROM employees e
JOIN departments d ON e.dept_id = d.dept_id;
-- Update with CASE
UPDATE employees
SET salary_band = CASE
WHEN salary >= 100000 THEN 'Executive'
WHEN salary >= 80000 THEN 'Senior'
ELSE 'Standard'
END;
ALWAYS use WHERE with UPDATE. UPDATE employees SET salary = 0 without WHERE sets EVERYONE’s salary to zero.
DELETE — Removing Data
-- Delete specific rows
DELETE FROM employees WHERE is_active = 0;
DELETE FROM employees WHERE hire_date < '2020-01-01';
-- Delete using subquery
DELETE FROM employees
WHERE dept_id IN (SELECT dept_id FROM departments WHERE status = 'Closed');
-- Delete using JOIN (SQL Server)
DELETE e
FROM employees e
JOIN departments d ON e.dept_id = d.dept_id
WHERE d.status = 'Closed';
-- Delete all rows (use TRUNCATE instead for performance)
DELETE FROM staging_table;
ALWAYS use WHERE with DELETE. DELETE FROM employees without WHERE deletes ALL employees.
MERGE — Upsert (Insert + Update in One Statement)
MERGE is the most powerful DML statement — it combines INSERT, UPDATE, and DELETE based on a match condition:
-- MERGE: Update existing employees, insert new ones
MERGE INTO employees AS target
USING staging_employees AS source
ON target.emp_id = source.emp_id
WHEN MATCHED AND target.salary != source.salary THEN
UPDATE SET
target.salary = source.salary,
target.department = source.department,
target.updated_at = GETDATE()
WHEN NOT MATCHED BY TARGET THEN
INSERT (emp_id, first_name, last_name, salary, department, hire_date)
VALUES (source.emp_id, source.first_name, source.last_name,
source.salary, source.department, source.hire_date)
WHEN NOT MATCHED BY SOURCE THEN
DELETE;
-- Semicolon is REQUIRED at the end of MERGE
The three MERGE clauses:
| Clause | When It Fires | Action |
|---|---|---|
WHEN MATCHED |
Row exists in BOTH target and source | UPDATE the target row |
WHEN NOT MATCHED BY TARGET |
Row in source but NOT in target | INSERT into target |
WHEN NOT MATCHED BY SOURCE |
Row in target but NOT in source | DELETE from target (optional) |
This is exactly what our SCD Type 1 pipelines do — update changed records and insert new ones. MERGE does it in a single atomic statement.
Real-life analogy: MERGE is like an employee census update. Employees on both the old and new list (MATCHED) get their info updated. New employees (NOT MATCHED BY TARGET) get added. Employees on the old list but not the new one (NOT MATCHED BY SOURCE) are removed. One operation handles all three scenarios.
IDENTITY / AUTO_INCREMENT — Auto-Generated IDs
-- SQL Server: IDENTITY(seed, increment)
CREATE TABLE employees (
emp_id INT IDENTITY(1,1) PRIMARY KEY, -- Starts at 1, increments by 1
first_name VARCHAR(50)
);
-- Insert without specifying ID (auto-generated)
INSERT INTO employees (first_name) VALUES ('Naveen'); -- emp_id = 1
INSERT INTO employees (first_name) VALUES ('Shrey'); -- emp_id = 2
-- Get the last generated ID
SELECT SCOPE_IDENTITY() AS last_id; -- 2
SELECT @@IDENTITY AS last_id; -- 2 (use SCOPE_IDENTITY instead — safer)
-- MySQL: AUTO_INCREMENT
CREATE TABLE employees (
emp_id INT AUTO_INCREMENT PRIMARY KEY,
first_name VARCHAR(50)
);
-- PostgreSQL: SERIAL or GENERATED ALWAYS
CREATE TABLE employees (
emp_id INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
first_name VARCHAR(50)
);
Creating Tables from Queries (SELECT INTO / CREATE AS)
-- SQL Server: SELECT INTO (creates new table from query)
SELECT emp_id, first_name, last_name, salary
INTO employees_backup
FROM employees
WHERE is_active = 1;
-- Creates employees_backup with the same schema and data
-- Create empty table with same structure
SELECT * INTO employees_empty FROM employees WHERE 1 = 0;
-- WHERE 1=0 returns no rows — creates structure only
-- PostgreSQL / Fabric: CREATE TABLE AS
CREATE TABLE employees_backup AS
SELECT emp_id, first_name, last_name, salary
FROM employees WHERE is_active = 1;
Schema Design Best Practices
- Always define a PRIMARY KEY — every table needs a unique identifier
- Use IDENTITY/SERIAL for surrogate keys — auto-generated, no business meaning
- Define NOT NULL on required columns — prevent missing critical data
- Use CHECK constraints for validation — salary > 0, status IN list
- Use DEFAULT for standard values — created_at = GETDATE(), is_active = 1
- Name your constraints —
FK_emp_deptis easier to debug thanFK__employees__dept___2A4B4B5E - Use appropriate data types — DECIMAL for money (not FLOAT), DATE for dates (not VARCHAR)
- Add created_at and updated_at columns — every table should track when rows were created and modified
Common Mistakes
-
UPDATE/DELETE without WHERE — affects ALL rows. Always double-check your WHERE clause. Run a SELECT first to verify which rows match.
-
Wrong data type for money — FLOAT introduces rounding errors ($100.10 might store as $100.0999999). Always use DECIMAL for financial data.
-
Not naming constraints — auto-generated names like
CK__emp__sal__2A4B4B5Eare impossible to debug. Name everything. -
Forgetting the semicolon after MERGE — MERGE requires a terminating semicolon. Omitting it causes cryptic errors.
-
CASCADE DELETE without understanding impact —
ON DELETE CASCADEon a parent table deletes ALL child records. One accidental parent delete can wipe thousands of rows. -
Using TRUNCATE when DELETE is needed — TRUNCATE cannot be rolled back (in most databases) and does not fire triggers. Use DELETE inside a transaction for safety.
-
Not using IDENTITY for surrogate keys — manually managing sequential IDs leads to race conditions and gaps. Let the database handle it.
Interview Questions
Q: What is the difference between DDL and DML? A: DDL (Data Definition Language) defines database structure — CREATE, ALTER, DROP, TRUNCATE. DML (Data Manipulation Language) manipulates data — INSERT, UPDATE, DELETE, MERGE. DDL changes the container. DML changes the contents.
Q: What is the difference between DROP, TRUNCATE, and DELETE? A: DROP removes the table entirely (structure + data). TRUNCATE removes all data but keeps the table structure and resets the identity counter. DELETE removes specific rows (with WHERE) and can be rolled back within a transaction. TRUNCATE is faster than DELETE for clearing all rows.
Q: What is the difference between PRIMARY KEY and UNIQUE? A: Both enforce uniqueness. PRIMARY KEY also enforces NOT NULL and allows only ONE per table. UNIQUE allows NULL values (one NULL) and a table can have multiple UNIQUE constraints. Every table should have exactly one PRIMARY KEY.
Q: What does the MERGE statement do? A: MERGE combines INSERT, UPDATE, and DELETE in a single atomic statement. It matches source rows against target rows using a condition. Matched rows are updated. Unmatched source rows are inserted. Optionally, unmatched target rows are deleted. It is the SQL equivalent of an upsert operation used in SCD patterns.
Q: What is a foreign key and why is it important? A: A foreign key enforces referential integrity — the value in the child table must exist in the parent table. It prevents orphan records (an employee referencing a department that does not exist) and maintains data consistency across related tables. CASCADE options control what happens when parent records are updated or deleted.
Wrapping Up
DDL builds the house. DML fills it with life. Constraints keep it standing. Together, they form the foundation of every database — from a single prototype table to a production data warehouse with hundreds of tables and millions of rows.
The key takeaways: always define a primary key, always use NOT NULL on required columns, always name your constraints, and never run UPDATE or DELETE without a WHERE clause. These habits prevent 90% of database disasters.
Related posts: – SQL Execution Order & WHERE Clauses – SQL Functions – SQL Joins – SCD Type 1 & 2 with PySpark MERGE
Naveen Vuppula is a Senior Data Engineering Consultant and app developer based in Ontario, Canada. He writes about Python, SQL, AWS, Azure, and everything data engineering at DriveDataScience.com.