SQL Functions Every Data Engineer Must Know: String, Date, Numeric, Null Handling, Conversion, and Conditional Functions

You know how to filter with WHERE, aggregate with GROUP BY, and rank with window functions. But between retrieving data and presenting results, there is a layer of work that nobody talks about in interviews until they ask: “How would you extract the year from a date? How would you handle NULLs in a calculation? How would you clean a phone number that has dashes and spaces?”

The answer is always a function. SQL has hundreds of built-in functions for manipulating strings, dates, numbers, and NULLs. But you do not need hundreds — you need about 50 that cover 95% of real-world data engineering work.

This post is your function reference — organized by category, with real examples on every function, and the traps that catch people in production and interviews.

Think of SQL functions like a toolbox. You do not use every tool every day, but when you need a specific one — extracting a domain from an email (string functions), calculating days between dates (date functions), or replacing NULLs with defaults (null functions) — you need to know exactly which tool to grab. This post is that toolbox, organized so you can find any tool in seconds.

The Sample Data
String Functions
Date and Time Functions
Numeric and Math Functions
Null Handling Functions
Conversion and Casting Functions
Conditional Functions
Aggregate Functions (Quick Reference)
Functions for Data Quality and Cleaning
SQL Server vs PostgreSQL vs MySQL Differences
Common Mistakes with Functions
Interview Questions
Wrapping Up

The Sample Data

CREATE TABLE employees (
    emp_id INT,
    first_name VARCHAR(50),
    last_name VARCHAR(50),
    email VARCHAR(100),
    phone VARCHAR(20),
    department VARCHAR(50),
    salary DECIMAL(10,2),
    bonus DECIMAL(10,2),
    hire_date DATE,
    last_login DATETIME,
    notes VARCHAR(500)
);

INSERT INTO employees VALUES
(1001, '  Naveen ', 'Vuppula', 'Naveen.Vuppula@Email.COM', '(416) 555-1234', 'Data Engineering', 105000.00, 10500.00, '2022-01-15', '2026-05-24 14:30:00', 'Senior consultant'),
(1002, 'shrey', 'PATIL', 'shrey@email.com', '416-555-5678', 'Data Engineering', 95000.00, NULL, '2023-03-20', '2026-05-23 09:15:00', NULL),
(1003, 'Vrushab', 'Sharma', 'vrushab@email.com', '604.555.9012', 'Engineering', 92000.50, 4600.00, '2023-06-10', NULL, ''),
(1004, 'Vishnu', 'Kumar', NULL, '613 555 3456', 'Analytics', 91000.00, 9100.00, '2023-09-01', '2026-05-22 16:45:00', 'Transferred from Delhi'),
(1005, 'Ravi', 'Desai', 'ravi@email.com', NULL, 'Analytics', 85000.00, 0.00, '2024-01-05', '2026-05-20 08:00:00', 'New hire - probation');

Notice the intentional issues: leading/trailing spaces, mixed case, NULL values, empty strings, different phone formats, NULL bonus, zero bonus, NULL email.

String Functions

LEN / LENGTH — Count Characters

-- LEN (SQL Server) / LENGTH (PostgreSQL, MySQL)
SELECT first_name, LEN(first_name) AS name_length FROM employees;
-- '  Naveen ' → 9 (includes leading spaces, LEN trims trailing in SQL Server)
-- 'shrey'     → 5

-- DATALENGTH — count bytes (includes trailing spaces)
SELECT first_name, DATALENGTH(first_name) AS byte_length FROM employees;

TRIM / LTRIM / RTRIM — Remove Whitespace

-- TRIM both sides (SQL Server 2017+, PostgreSQL, MySQL)
SELECT TRIM(first_name) AS cleaned FROM employees;
-- '  Naveen ' → 'Naveen'

-- LTRIM — left side only
SELECT LTRIM(first_name) AS left_trimmed FROM employees;
-- '  Naveen ' → 'Naveen '

-- RTRIM — right side only
SELECT RTRIM(first_name) AS right_trimmed FROM employees;
-- '  Naveen ' → '  Naveen'

-- TRIM specific characters (SQL Server 2017+)
SELECT TRIM('.' FROM '...hello...') AS cleaned;
-- 'hello'

Production rule: ALWAYS trim string columns from external sources. Vendors send leading/trailing spaces constantly. WHERE name = 'Naveen' will NOT match ' Naveen '.

UPPER / LOWER — Change Case

-- UPPER — all uppercase
SELECT UPPER(first_name) AS upper_name FROM employees;
-- 'shrey' → 'SHREY', 'Vrushab' → 'VRUSHAB'

-- LOWER — all lowercase
SELECT LOWER(email) AS lower_email FROM employees;
-- 'Naveen.Vuppula@Email.COM' → 'naveen.vuppula@email.com'

Production rule: Always LOWER or UPPER before comparing strings. WHERE email = 'naveen@email.com' will NOT match 'Naveen@Email.COM' in case-sensitive databases.

CONCAT / CONCAT_WS — Combine Strings

-- CONCAT — join strings
SELECT CONCAT(first_name, ' ', last_name) AS full_name FROM employees;
-- 'Naveen Vuppula', 'shrey PATIL'

-- CONCAT with NULL — returns NULL in some databases!
SELECT CONCAT(first_name, ' ', email) FROM employees WHERE emp_id = 1004;
-- SQL Server: 'Vishnu ' (SQL Server CONCAT ignores NULLs)
-- PostgreSQL: NULL (standard SQL — NULL + anything = NULL)

-- CONCAT_WS — concat with separator (handles NULLs gracefully)
SELECT CONCAT_WS(' ', first_name, last_name) AS full_name FROM employees;
-- Skips NULL values automatically
-- CONCAT_WS(', ', city, state, country) → 'Toronto, ON, Canada' (skips NULLs)

Real-life analogy: CONCAT is like taping pieces of paper together. If one piece is blank (NULL), different tape brands (databases) handle it differently — some leave a gap, others refuse to tape. CONCAT_WS is the smart tape — it skips blank pieces and puts the separator only between real pieces.

SUBSTRING / LEFT / RIGHT — Extract Parts

-- SUBSTRING(string, start, length)
SELECT SUBSTRING(phone, 2, 3) AS area_code FROM employees WHERE emp_id = 1001;
-- '(416) 555-1234' → '416' (start at position 2, take 3 characters)

-- LEFT — first N characters
SELECT LEFT(email, 5) AS email_prefix FROM employees;
-- 'Navee', 'shrey', 'vrush'

-- RIGHT — last N characters
SELECT RIGHT(email, 3) AS email_suffix FROM employees;
-- 'COM', 'com', 'com'

-- Extract domain from email
SELECT SUBSTRING(email, CHARINDEX('@', email) + 1, LEN(email)) AS domain
FROM employees WHERE email IS NOT NULL;
-- 'Email.COM', 'email.com', 'email.com'

CHARINDEX / POSITION — Find Character Position

-- CHARINDEX (SQL Server) — find position of substring
SELECT email, CHARINDEX('@', email) AS at_position FROM employees;
-- 'Naveen.Vuppula@Email.COM' → 15

-- POSITION (PostgreSQL)
SELECT email, POSITION('@' IN email) AS at_position FROM employees;

-- Find if substring exists (returns 0 if not found)
SELECT first_name, CHARINDEX('av', first_name) AS found FROM employees;
-- 'Naveen' → 2 (found at position 2), 'shrey' → 0 (not found)

REPLACE — Swap Substrings

-- Clean phone numbers — remove all non-numeric characters
SELECT phone,
    REPLACE(REPLACE(REPLACE(REPLACE(phone, '(', ''), ')', ''), '-', ''), ' ', '') AS clean_phone
FROM employees;
-- '(416) 555-1234' → '4165551234'
-- '416-555-5678'   → '4165555678'
-- '604.555.9012'   → '604.555.9012' (dots not removed — need another REPLACE)

-- Replace dots too
SELECT REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(phone, '(', ''), ')', ''), '-', ''), ' ', ''), '.', '') AS clean_phone
FROM employees;

STUFF — Insert/Replace at Position

-- STUFF(string, start, length_to_delete, replacement)
-- Format phone: insert dashes
SELECT STUFF(STUFF('4165551234', 4, 0, '-'), 8, 0, '-') AS formatted;
-- '416-555-1234'

-- Mask credit card: show last 4 only
SELECT STUFF('4111222233334444', 1, 12, '****-****-****-') AS masked;
-- '****-****-****-4444'

REVERSE — Reverse a String

SELECT REVERSE('Hello') AS reversed;
-- 'olleH'

-- Check palindrome
SELECT name, CASE WHEN name = REVERSE(name) THEN 'Yes' ELSE 'No' END AS is_palindrome
FROM (VALUES ('racecar'), ('hello'), ('madam')) AS t(name);

REPLICATE / REPEAT — Repeat String

-- REPLICATE (SQL Server) / REPEAT (PostgreSQL, MySQL)
SELECT REPLICATE('*', 5) AS stars;
-- '*****'

-- Pad employee ID to 6 digits
SELECT REPLICATE('0', 6 - LEN(CAST(emp_id AS VARCHAR))) + CAST(emp_id AS VARCHAR) AS padded_id
FROM employees;
-- '001001', '001002'

FORMAT / STRING_AGG

-- FORMAT (SQL Server) — format numbers and dates as strings
SELECT FORMAT(salary, 'N2') AS formatted_salary FROM employees;
-- '105,000.00'

SELECT FORMAT(hire_date, 'MMMM dd, yyyy') AS formatted_date FROM employees;
-- 'January 15, 2022'

-- STRING_AGG (SQL Server 2017+, PostgreSQL) — combine values from multiple rows
SELECT department, STRING_AGG(first_name, ', ') AS team_members
FROM employees GROUP BY department;
-- 'Data Engineering' → 'Naveen, shrey'
-- 'Analytics'        → 'Vishnu, Ravi'

Date and Time Functions

GETDATE / CURRENT_TIMESTAMP / SYSDATETIME

-- Current date and time
SELECT GETDATE() AS now;                    -- SQL Server: 2026-05-25 14:30:45.123
SELECT CURRENT_TIMESTAMP AS now;            -- Standard SQL (all databases)
SELECT SYSDATETIME() AS precise_now;        -- SQL Server: higher precision (nanoseconds)

-- Current date only
SELECT CAST(GETDATE() AS DATE) AS today;    -- 2026-05-25

DATEPART / YEAR / MONTH / DAY — Extract Components

-- Extract specific parts
SELECT hire_date,
    YEAR(hire_date) AS hire_year,
    MONTH(hire_date) AS hire_month,
    DAY(hire_date) AS hire_day
FROM employees;

-- DATEPART — more flexible
SELECT hire_date,
    DATEPART(YEAR, hire_date) AS yr,
    DATEPART(QUARTER, hire_date) AS qtr,
    DATEPART(WEEK, hire_date) AS wk,
    DATEPART(WEEKDAY, hire_date) AS day_of_week,    -- 1=Sunday, 7=Saturday
    DATEPART(DAYOFYEAR, hire_date) AS day_of_year,  -- 1-366
    DATEPART(HOUR, last_login) AS login_hour
FROM employees;

-- DATENAME — returns name instead of number
SELECT DATENAME(MONTH, hire_date) AS month_name,    -- 'January', 'March'
       DATENAME(WEEKDAY, hire_date) AS day_name      -- 'Saturday', 'Monday'
FROM employees;

DATEDIFF — Calculate Difference Between Dates

-- Days between hire date and today
SELECT first_name, hire_date,
    DATEDIFF(DAY, hire_date, GETDATE()) AS days_employed,
    DATEDIFF(MONTH, hire_date, GETDATE()) AS months_employed,
    DATEDIFF(YEAR, hire_date, GETDATE()) AS years_employed
FROM employees;

-- Days since last login
SELECT first_name,
    DATEDIFF(DAY, last_login, GETDATE()) AS days_since_login
FROM employees
WHERE last_login IS NOT NULL;

The trap: DATEDIFF counts BOUNDARIES, not complete units. DATEDIFF(YEAR, '2025-12-31', '2026-01-01') returns 1 even though only 1 day passed. It crossed a year boundary.

DATEADD — Add/Subtract from Dates

-- Add days, months, years
SELECT hire_date,
    DATEADD(DAY, 90, hire_date) AS probation_end,      -- 90 days after hire
    DATEADD(MONTH, 6, hire_date) AS six_month_review,   -- 6 months after
    DATEADD(YEAR, 1, hire_date) AS one_year_anniversary, -- 1 year after
    DATEADD(DAY, -30, GETDATE()) AS thirty_days_ago     -- 30 days before today
FROM employees;

-- Subtract: use negative number
SELECT DATEADD(MONTH, -3, GETDATE()) AS three_months_ago;

FORMAT / CONVERT — Format Dates as Strings

-- FORMAT (SQL Server 2012+) — flexible date formatting
SELECT hire_date,
    FORMAT(hire_date, 'yyyy-MM-dd') AS iso_format,       -- '2022-01-15'
    FORMAT(hire_date, 'dd/MM/yyyy') AS uk_format,         -- '15/01/2022'
    FORMAT(hire_date, 'MMMM dd, yyyy') AS long_format,   -- 'January 15, 2022'
    FORMAT(hire_date, 'MMM-yy') AS short_format           -- 'Jan-22'
FROM employees;

-- CONVERT with style codes (older, faster)
SELECT hire_date,
    CONVERT(VARCHAR, hire_date, 23) AS iso,        -- '2022-01-15'
    CONVERT(VARCHAR, hire_date, 103) AS uk,         -- '15/01/2022'
    CONVERT(VARCHAR, hire_date, 107) AS us_long     -- 'Jan 15, 2022'
FROM employees;

EOMONTH — End of Month

-- Last day of the hire month
SELECT hire_date,
    EOMONTH(hire_date) AS end_of_month,          -- '2022-01-31'
    EOMONTH(hire_date, 1) AS end_of_next_month,  -- '2022-02-28'
    EOMONTH(hire_date, -1) AS end_of_prev_month  -- '2021-12-31'
FROM employees;

-- First day of the month
SELECT DATEADD(DAY, 1, EOMONTH(hire_date, -1)) AS first_of_month FROM employees;
-- '2022-01-01'

ISDATE — Validate Date Strings

-- Check if a string is a valid date
SELECT ISDATE('2026-05-25') AS valid;      -- 1 (true)
SELECT ISDATE('2026-02-30') AS invalid;    -- 0 (false — Feb 30 does not exist)
SELECT ISDATE('not-a-date') AS invalid;    -- 0

Numeric and Math Functions

ROUND / CEILING / FLOOR — Rounding

-- ROUND(number, decimal_places)
SELECT ROUND(92000.567, 2) AS rounded;    -- 92000.570 (round to 2 decimals)
SELECT ROUND(92000.567, 0) AS rounded;    -- 92001.000 (round to integer)
SELECT ROUND(92000.567, -3) AS rounded;   -- 92000.000 (round to thousands)

-- CEILING — round UP to nearest integer
SELECT CEILING(92000.1) AS ceil;          -- 92001
SELECT CEILING(-92000.1) AS ceil;         -- -92000 (rounds toward positive infinity)

-- FLOOR — round DOWN to nearest integer
SELECT FLOOR(92000.9) AS floored;         -- 92000
SELECT FLOOR(-92000.9) AS floored;        -- -92001 (rounds toward negative infinity)

Real-life analogy: ROUND is a standard rounding (4.5 → 5, 4.4 → 4). CEILING is like a parking garage that charges by the hour — 2.1 hours = charged for 3 hours (always rounds up). FLOOR is like a cookie jar — 2.9 cookies = you have 2 whole cookies (always rounds down).

ABS — Absolute Value

SELECT ABS(-105000) AS absolute;          -- 105000
SELECT ABS(105000) AS absolute;           -- 105000

-- Useful for: difference regardless of direction
SELECT first_name, salary,
    ABS(salary - (SELECT AVG(salary) FROM employees)) AS diff_from_avg
FROM employees;

POWER / SQRT / LOG

SELECT POWER(2, 10) AS two_to_ten;        -- 1024
SELECT SQRT(144) AS square_root;           -- 12
SELECT LOG(100) AS natural_log;            -- 4.605 (ln)
SELECT LOG10(100) AS log_base10;           -- 2

SIGN — Positive, Negative, or Zero

SELECT bonus, SIGN(bonus) AS sign_val FROM employees;
-- 10500.00 → 1 (positive)
-- NULL     → NULL
-- 4600.00  → 1 (positive)
-- 0.00     → 0 (zero)

MOD / % — Remainder

-- Modulo (remainder after division)
SELECT emp_id, emp_id % 2 AS is_odd FROM employees;
-- 1001 → 1 (odd), 1002 → 0 (even)

-- Useful for: alternating row colors, distributing into groups
SELECT emp_id, emp_id % 3 AS group_number FROM employees;
-- Distributes into groups 0, 1, 2

Null Handling Functions

COALESCE — First Non-NULL Value

-- Return first non-NULL value
SELECT first_name,
    COALESCE(email, phone, 'No Contact') AS best_contact
FROM employees;
-- Vishnu: email is NULL → tries phone → '613 555 3456'
-- If both were NULL → 'No Contact'

-- Default for NULL bonus
SELECT first_name,
    salary + COALESCE(bonus, 0) AS total_compensation
FROM employees;
-- Shrey: salary + COALESCE(NULL, 0) = 95000 + 0 = 95000
-- Without COALESCE: 95000 + NULL = NULL (entire calculation becomes NULL!)

This is the most important null function. Any arithmetic with NULL produces NULL: 100 + NULL = NULL, 100 * NULL = NULL. COALESCE prevents this by providing a default.

NULLIF — Return NULL If Values Match

-- NULLIF(a, b) → returns NULL if a = b, otherwise returns a
SELECT first_name, bonus,
    NULLIF(bonus, 0) AS real_bonus
FROM employees;
-- Ravi: bonus = 0.00 → NULLIF(0, 0) → NULL
-- Others: bonus stays as-is

-- THE #1 USE CASE: Division by zero protection
SELECT first_name,
    salary / NULLIF(bonus, 0) AS salary_to_bonus_ratio
FROM employees;
-- Ravi: salary / NULLIF(0, 0) → salary / NULL → NULL (no error!)
-- Without NULLIF: salary / 0 → DIVIDE BY ZERO ERROR

Memorize this pattern: column / NULLIF(divisor, 0) prevents division by zero in every database.

ISNULL (SQL Server) / IFNULL (MySQL) — Replace NULL

-- ISNULL (SQL Server only) — replace NULL with a value
SELECT first_name,
    ISNULL(email, 'noemail@placeholder.com') AS email,
    ISNULL(bonus, 0) AS bonus,
    ISNULL(notes, 'No notes') AS notes
FROM employees;

-- IFNULL (MySQL)
SELECT IFNULL(email, 'noemail@placeholder.com') AS email FROM employees;

-- COALESCE works everywhere (preferred for portability)
SELECT COALESCE(email, 'noemail@placeholder.com') AS email FROM employees;

COALESCE vs ISNULL: – COALESCE takes multiple arguments: COALESCE(a, b, c, 'default') – ISNULL takes only two: ISNULL(a, 'default') – COALESCE is ANSI standard (works everywhere). ISNULL is SQL Server only. – Always prefer COALESCE for portability.

IS DISTINCT FROM — NULL-Safe Comparison

-- Standard comparison: NULL = NULL → UNKNOWN (not TRUE)
SELECT * FROM employees WHERE bonus = NULL;          -- Returns NOTHING
SELECT * FROM employees WHERE bonus IS NULL;          -- Returns Shrey

-- IS DISTINCT FROM (PostgreSQL, Spark SQL, Fabric SQL)
-- Treats NULL as a comparable value
SELECT * FROM employees WHERE bonus IS DISTINCT FROM 0;
-- Returns rows where bonus is NOT 0 (includes NULLs!)
-- Standard !=: WHERE bonus != 0 would EXCLUDE NULLs

SELECT * FROM employees WHERE bonus IS NOT DISTINCT FROM NULL;
-- Same as IS NULL but more explicit in joins and comparisons

Why IS DISTINCT FROM matters:

-- The NULL comparison trap in data quality checks
-- "Find rows where source and target values differ"

-- ❌ WRONG: Misses cases where one is NULL
SELECT * FROM source s JOIN target t ON s.id = t.id
WHERE s.value != t.value;
-- If s.value = 100 and t.value = NULL → NOT returned (NULL != 100 = UNKNOWN)

-- ✅ CORRECT: Catches NULL differences too
SELECT * FROM source s JOIN target t ON s.id = t.id
WHERE s.value IS DISTINCT FROM t.value;
-- If s.value = 100 and t.value = NULL → RETURNED (they are distinct!)

Real-life analogy: Standard = is like comparing two boxes where one might be empty (NULL). If you compare an empty box to a full box, the answer is “I don’t know” (UNKNOWN), not “they’re different.” IS DISTINCT FROM says: “An empty box IS different from a full box.” It treats emptiness as a real, comparable state.

Conditional Functions

Conditional functions return different values based on conditions — the decision-making layer of SQL. CASE WHEN (covered in our GROUP BY post) is the most powerful. Below are the shorthand alternatives:

IIF — Inline If/Else (SQL Server)

-- IIF(condition, true_value, false_value)
SELECT first_name, salary,
    IIF(salary >= 95000, 'Senior', 'Standard') AS level
FROM employees;

-- Equivalent CASE:
SELECT first_name, salary,
    CASE WHEN salary >= 95000 THEN 'Senior' ELSE 'Standard' END AS level
FROM employees;

-- IIF with NULL check
SELECT first_name,
    IIF(bonus IS NULL, 'No Bonus', FORMAT(bonus, 'C')) AS bonus_status
FROM employees;

IIF is shorthand CASE — simpler for two-outcome conditions. SQL Server only.

Conversion and Casting Functions

CAST — Convert Data Types

-- String to Integer
SELECT CAST('12345' AS INT) AS num;                    -- 12345

-- Integer to String
SELECT CAST(emp_id AS VARCHAR(10)) AS emp_str;         -- '1001'

-- String to Date
SELECT CAST('2026-05-25' AS DATE) AS dt;               -- 2026-05-25

-- String to Decimal
SELECT CAST('99.95' AS DECIMAL(10,2)) AS price;        -- 99.95

-- Date to String
SELECT CAST(hire_date AS VARCHAR(10)) AS date_str;     -- '2022-01-15'

-- Decimal to Int (truncates, does not round)
SELECT CAST(92000.99 AS INT) AS truncated;             -- 92000 (not 92001)

CONVERT — SQL Server Extended Casting

-- CONVERT(target_type, value, style_code)
SELECT CONVERT(VARCHAR, hire_date, 103) AS uk_date;     -- '15/01/2022'
SELECT CONVERT(VARCHAR, hire_date, 23) AS iso_date;     -- '2022-01-15'
SELECT CONVERT(VARCHAR, salary, 1) AS formatted;        -- '105,000.00'

TRY_CAST / TRY_CONVERT — Safe Conversion

-- CAST fails on invalid data
SELECT CAST('not-a-number' AS INT);                     -- ERROR!

-- TRY_CAST returns NULL instead of failing
SELECT TRY_CAST('not-a-number' AS INT) AS result;      -- NULL (no error)
SELECT TRY_CAST('12345' AS INT) AS result;              -- 12345

-- TRY_CONVERT (SQL Server)
SELECT TRY_CONVERT(DATE, '2026-13-45') AS bad_date;    -- NULL (invalid date)
SELECT TRY_CONVERT(DATE, '2026-05-25') AS good_date;   -- 2026-05-25

Production rule: ALWAYS use TRY_CAST or TRY_CONVERT when converting data from external sources. Source data is messy — a column that should be numeric might contain ‘N/A’, ‘#VALUE!’, or ‘null’. TRY_CAST handles these gracefully.

Real-life analogy: CAST is like a strict doorman — “Show me valid ID or you’re not getting in” (error). TRY_CAST is a polite doorman — “If you don’t have valid ID, I’ll note it and let you through with a ‘guest’ badge” (NULL instead of error).

Aggregate Functions (Quick Reference)

Aggregate functions are covered in depth in our GROUP BY post. Here is the quick reference for how each one handles NULLs — the #1 source of bugs:

Function	What It Does	NULL Behavior	Example
*COUNT()**	Count all rows	Includes NULLs	`COUNT(*)` → 5
COUNT(col)	Count non-NULL values	Skips NULLs	`COUNT(bonus)` → 4 (1 NULL skipped)
COUNT(DISTINCT col)	Count unique non-NULL values	Skips NULLs	`COUNT(DISTINCT department)` → 3
SUM(col)	Total of values	Skips NULLs	`SUM(bonus)` → 24200 (NULL skipped, not 0)
AVG(col)	Average of values	Skips NULLs (divides by non-NULL count)	`AVG(bonus)` → 6050 (24200 / 4, not / 5)
MIN(col)	Smallest value	Skips NULLs	`MIN(bonus)` → 0.00
MAX(col)	Largest value	Skips NULLs	`MAX(bonus)` → 10500.00
STRING_AGG(col, sep)	Concatenate values	Skips NULLs	`STRING_AGG(email, ', ')`

-- The NULL trap in AVG
-- Data: bonus values are 10500, NULL, 4600, 9100, 0
SELECT
    AVG(bonus) AS naive_avg,                    -- 6050 (skips NULL, divides by 4)
    AVG(COALESCE(bonus, 0)) AS zero_fill_avg,   -- 4840 (treats NULL as 0, divides by 5)
    SUM(bonus) / COUNT(*) AS manual_avg          -- 4840 (SUM skips NULL, COUNT(*) counts all)
FROM employees;
-- naive_avg and zero_fill_avg give DIFFERENT results
-- Choose based on business logic: is NULL "no bonus" (use COALESCE) or "unknown" (skip it)?

The golden rule: Every aggregate function except COUNT(*) ignores NULLs. This is usually correct — but if NULL means “zero” in your business context (missing bonus = no bonus), wrap the column in COALESCE before aggregating.

Functions for Data Quality and Cleaning

The Complete Cleaning Pipeline

-- Real-world data cleaning query using multiple functions
SELECT
    emp_id,

    -- Clean name: trim + proper case
    UPPER(LEFT(TRIM(first_name), 1)) + LOWER(SUBSTRING(TRIM(first_name), 2, LEN(TRIM(first_name))))
        AS clean_first_name,

    -- Clean email: trim + lowercase + validate
    LOWER(TRIM(email)) AS clean_email,
    CASE WHEN email LIKE '%@%.%' THEN 'Valid' ELSE 'Invalid' END AS email_status,

    -- Clean phone: remove all non-numeric
    REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
        COALESCE(phone, ''), '(', ''), ')', ''), '-', ''), ' ', ''), '.', '')
        AS clean_phone,

    -- Handle NULL/zero bonus
    COALESCE(NULLIF(bonus, 0), 0) AS effective_bonus,

    -- Calculate total comp safely
    salary + COALESCE(bonus, 0) AS total_compensation,

    -- Safe ratio calculation
    ROUND(salary / NULLIF(COALESCE(bonus, 0), 0), 2) AS salary_bonus_ratio,

    -- Days employed
    DATEDIFF(DAY, hire_date, GETDATE()) AS days_employed,

    -- Null tracking
    CASE 
        WHEN email IS NULL THEN 'Missing email'
        WHEN phone IS NULL THEN 'Missing phone'
        WHEN email IS NULL AND phone IS NULL THEN 'Missing both'
        ELSE 'Complete'
    END AS contact_status

FROM employees;

SQL Server vs PostgreSQL vs MySQL Differences

Function	SQL Server	PostgreSQL	MySQL
String length	`LEN()`	`LENGTH()`	`LENGTH()`
Current datetime	`GETDATE()`	`NOW()`	`NOW()`
Current date	`CAST(GETDATE() AS DATE)`	`CURRENT_DATE`	`CURDATE()`
Substring	`SUBSTRING(s, start, len)`	`SUBSTRING(s FROM start FOR len)`	`SUBSTRING(s, start, len)`
Find in string	`CHARINDEX(find, str)`	`POSITION(find IN str)`	`LOCATE(find, str)`
Replace NULL	`ISNULL(a, b)`	`COALESCE(a, b)`	`IFNULL(a, b)`
Inline if	`IIF(cond, t, f)`	No IIF (use CASE)	`IF(cond, t, f)`
Date diff	`DATEDIFF(unit, start, end)`	`end - start` (returns interval)	`DATEDIFF(end, start)`
Add to date	`DATEADD(unit, n, date)`	`date + INTERVAL 'n units'`	`DATE_ADD(date, INTERVAL n UNIT)`
Format date	`FORMAT(date, pattern)`	`TO_CHAR(date, pattern)`	`DATE_FORMAT(date, pattern)`
String concat	`CONCAT()` or `+`	`CONCAT()` or `\\|\\|`	`CONCAT()`
Combine rows	`STRING_AGG()`	`STRING_AGG()`	`GROUP_CONCAT()`
Try cast	`TRY_CAST()`	No built-in (use exception)	No built-in
IS DISTINCT FROM	Not in SQL Server (use workaround)	✅ Native	✅ Native (8.0+)

Common Mistakes with Functions

Arithmetic with NULL without COALESCE — salary + NULL = NULL. Always wrap nullable columns: salary + COALESCE(bonus, 0).
Division by zero without NULLIF — revenue / cost crashes if cost = 0. Always use revenue / NULLIF(cost, 0).
String comparison without trimming — 'Naveen' != ' Naveen '. Always TRIM external data before comparing.
Case-sensitive comparison without UPPER/LOWER — 'naveen' != 'Naveen' in case-sensitive databases. Always normalize case.
DATEDIFF counting boundaries not periods — DATEDIFF(YEAR, '2025-12-31', '2026-01-01') = 1 year (crossed boundary), not the elapsed time. Be careful with edge cases.
Using CAST instead of TRY_CAST on external data — CAST('N/A' AS INT) crashes. Use TRY_CAST for external data.
Confusing empty string with NULL — '' is NOT NULL. WHERE notes IS NULL will NOT match ''. Check both: WHERE notes IS NULL OR notes = ''.
Using ISNULL instead of COALESCE — ISNULL is SQL Server only and takes only 2 arguments. COALESCE is standard SQL, works everywhere, and takes unlimited arguments.

Interview Questions

Q: What is the difference between COALESCE, ISNULL, and NULLIF? A: COALESCE returns the first non-NULL value from a list of arguments — used to provide defaults for NULLs. ISNULL replaces a single NULL with a value (SQL Server only, 2 arguments). NULLIF returns NULL if two values are equal — primarily used for division by zero protection: revenue / NULLIF(cost, 0). COALESCE is the preferred choice because it is ANSI standard and accepts multiple arguments.

Q: How do you prevent division by zero in SQL? A: Use NULLIF: numerator / NULLIF(denominator, 0). If the denominator is 0, NULLIF converts it to NULL, and dividing by NULL returns NULL instead of throwing an error. Wrap in COALESCE if you want 0 instead: COALESCE(num / NULLIF(denom, 0), 0).

Q: What is IS DISTINCT FROM and when do you use it? A: IS DISTINCT FROM is a NULL-safe comparison operator. Standard != returns UNKNOWN when comparing with NULL (so the row is excluded). IS DISTINCT FROM treats NULL as a comparable value — NULL IS DISTINCT FROM 5 returns TRUE. Use it in data quality comparisons where one side might be NULL.

Q: What is the difference between CAST and TRY_CAST? A: CAST throws an error if the conversion fails. TRY_CAST returns NULL instead of an error. Use TRY_CAST when converting data from external sources where values might be invalid (strings in numeric columns, malformed dates).

Q: How do you handle the difference between NULL and empty string? A: They are different values. IS NULL catches NULL but not empty string. = '' catches empty string but not NULL. For comprehensive null/empty handling, use: WHERE column IS NULL OR column = '' or WHERE COALESCE(NULLIF(column, ''), 'default') = 'default'.

Wrapping Up

SQL functions are the transformation layer between raw data and clean results. String functions fix formatting. Date functions calculate time. Numeric functions handle rounding. Null functions prevent silent calculation errors. Conversion functions bridge data types safely.

The five functions you will use every single day: COALESCE (null defaults), TRIM (clean strings), LOWER/UPPER (normalize case), DATEDIFF (time calculations), and NULLIF (division safety). Master these five, reference the rest, and you can handle any data transformation SQL throws at you.

Previous SQL posts: – Execution Order, SELECT & WHERE Clauses – GROUP BY, HAVING, CASE WHEN – Subqueries & Performance – SQL Joins – Window Functions – CTEs & Subqueries

← Previous: Subqueries & Performance

SQL (4/15)

Next: SQL Joins →

Naveen Vuppula is a Senior Data Engineering Consultant and app developer based in Ontario, Canada. He writes about Python, SQL, AWS, Azure, and everything data engineering at DriveDataScience.com.

← Subqueries & Performance SQL Joins →