8 SQL Interview Questions Every Developer Should Master

Posted on February 15 2024 by Interview Zen Team

Introduction

SQL (Structured Query Language) remains the cornerstone of data management and backend development. Whether you’re interviewing for database administrator, backend developer, data analyst, or full-stack positions, SQL proficiency is often a key requirement.

According to the Stack Overflow Developer Survey 2024, SQL ranks as one of the most widely used programming languages, with over 50% of developers using it regularly. The ability to efficiently query, manipulate, and optimize database operations is crucial for building scalable applications and managing enterprise data systems.

This comprehensive guide presents essential SQL interview questions that test both fundamental concepts and advanced database management skills, helping hiring managers evaluate candidates’ ability to work with complex data scenarios.

Core SQL Concepts

SQL proficiency encompasses multiple areas:

Data Querying: Complex SELECT statements with multiple JOINs and conditions
Database Design: Normalization, relationships, and schema optimization
Performance Optimization: Indexing strategies and query optimization
Data Integrity: Constraints, transactions, and consistency management
Advanced Features: Window functions, CTEs, stored procedures, and triggers

Top 8 Essential SQL Interview Questions

1. Explain the different types of JOINs with practical examples.

JOIN operations are fundamental to relational database querying.

Example Answer: “SQL JOINs combine data from multiple tables based on relationships:

Database Schema:

-- Employees table
CREATE TABLE employees (
    id INT PRIMARY KEY,
    name VARCHAR(100),
    department_id INT,
    salary DECIMAL(10,2)
);

-- Departments table  
CREATE TABLE departments (
    id INT PRIMARY KEY,
    name VARCHAR(100),
    budget DECIMAL(12,2)
);

-- Projects table
CREATE TABLE projects (
    id INT PRIMARY KEY,
    name VARCHAR(100),
    employee_id INT
);

INNER JOIN: Returns records that have matching values in both tables

SELECT e.name, d.name as department
FROM employees e
INNER JOIN departments d ON e.department_id = d.id;
-- Result: Only employees who have a valid department

LEFT JOIN (LEFT OUTER JOIN): Returns all records from left table, matched records from right

SELECT e.name, d.name as department
FROM employees e
LEFT JOIN departments d ON e.department_id = d.id;
-- Result: All employees, including those without departments (NULL values)

RIGHT JOIN: Returns all records from right table, matched records from left

SELECT e.name, d.name as department
FROM employees e
RIGHT JOIN departments d ON e.department_id = d.id;
-- Result: All departments, including those with no employees

FULL OUTER JOIN: Returns all records when there’s a match in either table

SELECT e.name, d.name as department
FROM employees e
FULL OUTER JOIN departments d ON e.department_id = d.id;
-- Result: All employees and all departments, with NULLs where no match

CROSS JOIN: Cartesian product of both tables

SELECT e.name, d.name
FROM employees e
CROSS JOIN departments d;
-- Result: Every employee paired with every department
```"

### 2. Write a query to find the second highest salary in each department.

This tests understanding of window functions and ranking.

**Example Answer**: "Multiple approaches depending on SQL dialect:

**Using Window Functions (Modern SQL)**:
```sql
WITH ranked_salaries AS (
    SELECT 
        name,
        department_id,
        salary,
        ROW_NUMBER() OVER (
            PARTITION BY department_id 
            ORDER BY salary DESC
        ) as salary_rank
    FROM employees
)
SELECT name, department_id, salary
FROM ranked_salaries
WHERE salary_rank = 2;

Alternative with DENSE_RANK (handles ties differently):

WITH ranked_salaries AS (
    SELECT 
        name,
        department_id,
        salary,
        DENSE_RANK() OVER (
            PARTITION BY department_id 
            ORDER BY salary DESC
        ) as salary_rank
    FROM employees
)
SELECT name, department_id, salary
FROM ranked_salaries
WHERE salary_rank = 2;

Using Correlated Subquery (works in older SQL versions):

SELECT e1.name, e1.department_id, e1.salary
FROM employees e1
WHERE 1 = (
    SELECT COUNT(DISTINCT e2.salary)
    FROM employees e2
    WHERE e2.department_id = e1.department_id
    AND e2.salary > e1.salary
);

Using LIMIT/OFFSET approach (for single department):

SELECT name, salary
FROM employees
WHERE department_id = 1
ORDER BY salary DESC
LIMIT 1 OFFSET 1;  -- PostgreSQL/MySQL syntax

Key Considerations:

ROW_NUMBER() vs RANK() vs DENSE_RANK() handle ties differently
Window functions are more readable and performant
Consider what happens when there are salary ties”

3. How do you optimize a slow-running query?

Query optimization demonstrates practical database performance skills.

Example Answer: “Query optimization follows a systematic approach:

Step 1: Analyze Execution Plan

-- PostgreSQL
EXPLAIN ANALYZE 
SELECT e.name, d.name, COUNT(p.id)
FROM employees e
JOIN departments d ON e.department_id = d.id
LEFT JOIN projects p ON e.id = p.employee_id
WHERE e.salary > 50000
GROUP BY e.name, d.name
ORDER BY COUNT(p.id) DESC;

-- SQL Server
SET STATISTICS IO ON;
-- Query here

Step 2: Index Optimization

-- Create indexes on frequently queried columns
CREATE INDEX idx_employee_salary ON employees(salary);
CREATE INDEX idx_employee_dept ON employees(department_id);
CREATE INDEX idx_project_employee ON projects(employee_id);

-- Composite index for multiple WHERE conditions
CREATE INDEX idx_employee_dept_salary ON employees(department_id, salary);

Step 3: Query Restructuring

-- Before: Inefficient subquery in WHERE clause
SELECT name FROM employees 
WHERE department_id IN (
    SELECT id FROM departments WHERE budget > 100000
);

-- After: JOIN instead of subquery
SELECT e.name 
FROM employees e
INNER JOIN departments d ON e.department_id = d.id
WHERE d.budget > 100000;

Step 4: Eliminate Unnecessary Operations

-- Before: SELECT * and unnecessary columns
SELECT * FROM employees e
JOIN departments d ON e.department_id = d.id
WHERE e.salary > 50000
ORDER BY e.name;

-- After: Select only needed columns
SELECT e.name, e.salary
FROM employees e
WHERE e.salary > 50000
  AND e.department_id IS NOT NULL
ORDER BY e.name;

Step 5: Use LIMIT for Large Result Sets

-- Add pagination for large datasets
SELECT e.name, e.salary
FROM employees e
WHERE e.salary > 50000
ORDER BY e.salary DESC
LIMIT 20 OFFSET 0;

Common Optimization Techniques:

Use indexes strategically
Avoid SELECT *
Use WHERE before JOINs when possible
Consider partitioning for very large tables
Use UNION ALL instead of UNION when duplicates are acceptable
Replace correlated subqueries with JOINs”

4. Explain ACID properties with practical examples.

ACID properties ensure database transaction reliability.

Example Answer: “ACID guarantees data integrity in database transactions:

Atomicity: All operations in a transaction succeed or all fail

BEGIN TRANSACTION;

-- Transfer $1000 from account A to account B
UPDATE accounts SET balance = balance - 1000 WHERE id = 'A';
UPDATE accounts SET balance = balance + 1000 WHERE id = 'B';

-- If either update fails, both are rolled back
IF @@ERROR != 0
    ROLLBACK TRANSACTION;
ELSE
    COMMIT TRANSACTION;

Consistency: Database remains in a valid state before and after transaction

-- Constraint ensures account balance never goes negative
ALTER TABLE accounts ADD CONSTRAINT check_positive_balance 
CHECK (balance >= 0);

BEGIN TRANSACTION;
UPDATE accounts SET balance = balance - 1500 WHERE id = 'A';
-- This will fail if account A only has $1000, maintaining consistency
COMMIT TRANSACTION;

Isolation: Concurrent transactions don’t interfere with each other

-- Transaction 1 (Transfer)
BEGIN TRANSACTION;
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SELECT balance FROM accounts WHERE id = 'A'; -- $1000
UPDATE accounts SET balance = balance - 500 WHERE id = 'A';
-- ... other operations
COMMIT TRANSACTION;

-- Transaction 2 (Concurrent read)
-- Won't see partial results from Transaction 1
SELECT balance FROM accounts WHERE id = 'A';

Durability: Committed changes persist even after system failure

-- Once committed, this change survives power failures, crashes
BEGIN TRANSACTION;
INSERT INTO audit_log (action, timestamp, user_id) 
VALUES ('LOGIN', NOW(), 12345);
COMMIT TRANSACTION;
-- Data is permanently stored and recoverable

Isolation Levels:

-- READ UNCOMMITTED: Allows dirty reads
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;

-- READ COMMITTED: Prevents dirty reads
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;

-- REPEATABLE READ: Prevents dirty and non-repeatable reads
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;

-- SERIALIZABLE: Prevents all phenomena, highest isolation
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

Practical Example - Banking System:

-- Atomic money transfer with validation
CREATE PROCEDURE transfer_money(
    @from_account INT,
    @to_account INT,
    @amount DECIMAL(10,2)
)
AS
BEGIN
    BEGIN TRANSACTION;
    
    -- Check sufficient funds (Consistency)
    IF (SELECT balance FROM accounts WHERE id = @from_account) < @amount
    BEGIN
        ROLLBACK TRANSACTION;
        RAISERROR('Insufficient funds', 16, 1);
        RETURN;
    END
    
    -- Perform transfer (Atomicity)
    UPDATE accounts SET balance = balance - @amount WHERE id = @from_account;
    UPDATE accounts SET balance = balance + @amount WHERE id = @to_account;
    
    -- Log transaction (Durability)
    INSERT INTO transaction_log (from_id, to_id, amount, timestamp)
    VALUES (@from_account, @to_account, @amount, GETDATE());
    
    COMMIT TRANSACTION;
END
```"

### 5. What's the difference between clustered and non-clustered indexes?

Index types significantly impact query performance and storage.

**Example Answer**: "Indexes optimize data retrieval but differ in storage and behavior:

**Clustered Index**: Physically reorders table data
```sql
-- Primary key automatically creates clustered index
CREATE TABLE employees (
    id INT PRIMARY KEY,  -- Clustered index
    name VARCHAR(100),
    salary DECIMAL(10,2)
);

-- Table data physically stored in id order: 1, 2, 3, 4...
-- Each table can have only ONE clustered index

Non-Clustered Index: Separate structure pointing to table rows

-- Create non-clustered index on salary
CREATE INDEX idx_salary ON employees(salary);

-- Index structure:
-- Salary | Row Pointer
-- 30000  | Row 4
-- 45000  | Row 2  
-- 60000  | Row 1
-- 75000  | Row 3

Performance Implications:

-- Clustered index: Direct data access (fast)
SELECT * FROM employees WHERE id = 123;
-- Goes directly to physical location of row 123

-- Non-clustered index: Two-step lookup
SELECT * FROM employees WHERE salary = 50000;
-- Step 1: Find salary in index → get row pointer
-- Step 2: Go to row pointer location → get data

Multiple Non-Clustered Indexes:

-- Table can have multiple non-clustered indexes
CREATE INDEX idx_name ON employees(name);
CREATE INDEX idx_department ON employees(department_id);
CREATE INDEX idx_salary_dept ON employees(salary, department_id);  -- Composite

Choosing Clustered Index Column:

-- Good clustered index: Sequential, unique, static
-- Primary key (auto-incrementing ID) - default choice
CREATE TABLE orders (
    id INT IDENTITY PRIMARY KEY,  -- Good clustered index
    customer_id INT,
    order_date DATE
);

-- Alternative: Date-based clustering for time-series data
CREATE CLUSTERED INDEX idx_orders_date ON orders(order_date);

Storage and Maintenance:

-- Clustered index determines physical storage order
-- Insertions may cause page splits if not sequential
INSERT INTO employees (id, name) VALUES (999999, 'New Employee');
-- If id=999999 needs to be inserted between existing records,
-- causes expensive page reorganization

-- Non-clustered indexes require maintenance overhead
-- Each INSERT/UPDATE/DELETE must update all relevant indexes

Key Differences Summary:

Clustered: One per table, reorders data, faster for range queries
Non-clustered: Multiple allowed, separate structure, good for point lookups
Performance: Clustered faster for primary key, non-clustered for secondary searches
Maintenance: Clustered affects insert performance, non-clustered increases overhead”

6. Write a query to find employees who earn more than their manager.

This tests self-joins and hierarchical data queries.

Example Answer: “Self-joins handle hierarchical relationships within a single table:

Table Structure:

CREATE TABLE employees (
    id INT PRIMARY KEY,
    name VARCHAR(100),
    salary DECIMAL(10,2),
    manager_id INT,  -- References employees.id
    FOREIGN KEY (manager_id) REFERENCES employees(id)
);

-- Sample data
INSERT INTO employees VALUES
(1, 'CEO', 200000, NULL),
(2, 'VP Engineering', 150000, 1),
(3, 'Senior Dev', 120000, 2),
(4, 'Junior Dev', 160000, 2),  -- Earns more than manager!
(5, 'Intern', 140000, 3);      -- Earns more than manager!

Basic Self-Join Solution:

SELECT 
    emp.name as employee_name,
    emp.salary as employee_salary,
    mgr.name as manager_name,
    mgr.salary as manager_salary
FROM employees emp
INNER JOIN employees mgr ON emp.manager_id = mgr.id
WHERE emp.salary > mgr.salary;

Enhanced Version with Salary Difference:

SELECT 
    emp.name as employee_name,
    emp.salary as employee_salary,
    mgr.name as manager_name,
    mgr.salary as manager_salary,
    (emp.salary - mgr.salary) as salary_difference,
    ROUND(((emp.salary - mgr.salary) / mgr.salary) * 100, 2) as percentage_higher
FROM employees emp
INNER JOIN employees mgr ON emp.manager_id = mgr.id
WHERE emp.salary > mgr.salary
ORDER BY salary_difference DESC;

Including Department Information:

SELECT 
    emp.name as employee_name,
    dept.name as department,
    emp.salary as employee_salary,
    mgr.name as manager_name,
    mgr.salary as manager_salary
FROM employees emp
INNER JOIN employees mgr ON emp.manager_id = mgr.id
INNER JOIN departments dept ON emp.department_id = dept.id
WHERE emp.salary > mgr.salary
ORDER BY dept.name, (emp.salary - mgr.salary) DESC;

Advanced: Multi-Level Hierarchy Analysis:

-- Find employees who earn more than any manager in their chain
WITH management_chain AS (
    -- Base case: direct reports
    SELECT 
        id as employee_id,
        manager_id,
        1 as level
    FROM employees
    WHERE manager_id IS NOT NULL
    
    UNION ALL
    
    -- Recursive case: indirect reports
    SELECT 
        mc.employee_id,
        e.manager_id,
        mc.level + 1
    FROM management_chain mc
    INNER JOIN employees e ON mc.manager_id = e.id
    WHERE e.manager_id IS NOT NULL
)
SELECT DISTINCT
    emp.name as employee_name,
    emp.salary as employee_salary,
    STRING_AGG(mgr.name, ', ') as managers_earning_less
FROM employees emp
INNER JOIN management_chain mc ON emp.id = mc.employee_id
INNER JOIN employees mgr ON mc.manager_id = mgr.id
WHERE emp.salary > mgr.salary
GROUP BY emp.id, emp.name, emp.salary;

Key Concepts Demonstrated:

Self-joins for hierarchical data
Handling NULL values (CEO has no manager)
Calculating derived values (salary differences)
Common Table Expressions (CTEs) for complex hierarchies”

7. Explain the differences between UNION, UNION ALL, INTERSECT, and EXCEPT.

Set operations combine results from multiple queries.

Example Answer: “SQL set operations work with multiple result sets:

Sample Tables:

-- Current employees
CREATE TABLE current_employees (
    id INT, name VARCHAR(100), department VARCHAR(50)
);
INSERT VALUES 
(1, 'John', 'Engineering'),
(2, 'Jane', 'Marketing'),
(3, 'Bob', 'Engineering');

-- Former employees
CREATE TABLE former_employees (
    id INT, name VARCHAR(100), department VARCHAR(50)
);
INSERT VALUES 
(2, 'Jane', 'Marketing'),  -- Duplicate with current
(4, 'Alice', 'Sales'),
(5, 'Carol', 'Engineering');

UNION: Combines results, removes duplicates

SELECT name, department FROM current_employees
UNION
SELECT name, department FROM former_employees;

-- Result: John, Jane, Bob, Alice, Carol (Jane appears once)
-- Automatically sorts results and removes duplicates
-- Performance impact: Must sort and deduplicate

UNION ALL: Combines results, keeps duplicates

SELECT name, department FROM current_employees
UNION ALL
SELECT name, department FROM former_employees;

-- Result: John, Jane, Bob, Jane, Alice, Carol (Jane appears twice)
-- Faster than UNION (no sorting or deduplication)
-- Use when you know there are no duplicates or duplicates are acceptable

INTERSECT: Returns only common records

SELECT name, department FROM current_employees
INTERSECT
SELECT name, department FROM former_employees;

-- Result: Jane, Marketing (only record present in both tables)
-- Useful for finding overlaps between datasets

EXCEPT (or MINUS in Oracle): Returns records from first query not in second

SELECT name, department FROM current_employees
EXCEPT
SELECT name, department FROM former_employees;

-- Result: John, Bob (current employees who were never former employees)
-- Useful for finding differences between datasets

Practical Use Cases:

Data Auditing:

-- Find records that exist in production but not in backup
SELECT customer_id, email FROM production.customers
EXCEPT
SELECT customer_id, email FROM backup.customers;

Report Generation:

-- Combine data from multiple time periods
SELECT 'Q1' as quarter, product_id, SUM(sales) as total_sales
FROM q1_sales GROUP BY product_id
UNION ALL
SELECT 'Q2' as quarter, product_id, SUM(sales)
FROM q2_sales GROUP BY product_id
UNION ALL
SELECT 'Q3' as quarter, product_id, SUM(sales)
FROM q3_sales GROUP BY product_id
ORDER BY product_id, quarter;

Data Validation:

-- Find customers who appear in orders but not in customer table
SELECT customer_id FROM orders
EXCEPT
SELECT customer_id FROM customers;

Key Requirements:

All queries must have same number of columns
Corresponding columns must have compatible data types
Column names taken from first query
ORDER BY can only be used in final query (applies to entire result)”

8. How do you handle NULL values in SQL queries?

NULL handling is crucial for data integrity and accurate results.

Example Answer: “NULL represents unknown or missing data and requires special handling:

NULL Behavior in Comparisons:

-- These all return NULL (unknown), not true or false
SELECT * FROM employees WHERE salary = NULL;     -- Wrong! Returns no rows
SELECT * FROM employees WHERE salary != NULL;    -- Wrong! Returns no rows
SELECT * FROM employees WHERE salary > NULL;     -- Wrong! Returns no rows

-- Correct NULL comparisons
SELECT * FROM employees WHERE salary IS NULL;
SELECT * FROM employees WHERE salary IS NOT NULL;

NULL in Arithmetic Operations:

-- Any arithmetic with NULL returns NULL
SELECT 
    name,
    salary,
    bonus,
    salary + bonus as total_compensation  -- NULL if bonus is NULL
FROM employees;

-- Handle with COALESCE or ISNULL
SELECT 
    name,
    salary,
    COALESCE(bonus, 0) as bonus,  -- Replace NULL with 0
    salary + COALESCE(bonus, 0) as total_compensation
FROM employees;

NULL in Aggregate Functions:

-- Most aggregates ignore NULL values
SELECT 
    COUNT(*) as total_employees,        -- Counts all rows including NULLs
    COUNT(bonus) as employees_with_bonus, -- Counts only non-NULL bonus values
    AVG(bonus) as average_bonus,        -- Calculates average ignoring NULLs
    SUM(bonus) as total_bonus           -- Sums only non-NULL values
FROM employees;

-- Be careful with averages when NULLs present
SELECT 
    AVG(COALESCE(bonus, 0)) as avg_including_zero_bonus,
    AVG(bonus) as avg_excluding_null_bonus
FROM employees;

NULL in JOIN Operations:

-- NULLs don't match in JOINs
SELECT e.name, d.name as department
FROM employees e
LEFT JOIN departments d ON e.department_id = d.id;
-- Employees with NULL department_id will show NULL for department name

-- Finding unmatched records
SELECT e.name
FROM employees e
LEFT JOIN departments d ON e.department_id = d.id
WHERE d.id IS NULL;  -- Employees without valid department

NULL in WHERE Clauses with AND/OR:

-- Complex logic with NULLs
SELECT name FROM employees 
WHERE salary > 50000 AND bonus > 1000;
-- Excludes employees with NULL bonus

-- Include employees with NULL bonus
SELECT name FROM employees 
WHERE salary > 50000 AND (bonus > 1000 OR bonus IS NULL);

-- Three-valued logic: TRUE, FALSE, NULL

Practical NULL Handling Functions:

-- COALESCE: Return first non-NULL value
SELECT 
    name,
    COALESCE(work_phone, home_phone, mobile_phone, 'No phone') as contact_phone
FROM employees;

-- NULLIF: Return NULL if values are equal
SELECT 
    name,
    NULLIF(middle_name, '') as middle_name  -- Convert empty string to NULL
FROM employees;

-- CASE statements for complex NULL logic
SELECT 
    name,
    CASE 
        WHEN bonus IS NULL THEN 'No bonus'
        WHEN bonus = 0 THEN 'Zero bonus'
        ELSE CAST(bonus as VARCHAR) + ' bonus'
    END as bonus_description
FROM employees;

NULL in Sorting:

-- NULLs typically sort first (ASC) or last (DESC), but varies by database
SELECT name, bonus
FROM employees
ORDER BY bonus ASC;  -- NULLs first in most databases

-- Control NULL ordering explicitly
SELECT name, bonus
FROM employees
ORDER BY bonus ASC NULLS LAST;  -- PostgreSQL, Oracle

-- Alternative approach for consistent behavior
SELECT name, bonus
FROM employees
ORDER BY ISNULL(bonus, 1), bonus ASC;  -- SQL Server: NULLs last

Data Quality Checks:

-- Find records with unexpected NULLs
SELECT 
    'employees' as table_name,
    COUNT(*) as total_rows,
    COUNT(name) as non_null_names,
    COUNT(*) - COUNT(name) as null_names,
    COUNT(salary) as non_null_salaries,
    COUNT(*) - COUNT(salary) as null_salaries
FROM employees;

Best Practices:

Always use IS NULL and IS NOT NULL for NULL comparisons
Consider NULL behavior in aggregate functions and JOINs
Use COALESCE for default values
Document whether NULLs are expected in each column
Consider using NOT NULL constraints where appropriate”

Advanced SQL Topics for Senior Roles

Window Functions and Analytics

ROW_NUMBER(), RANK(), DENSE_RANK(): Ranking and numbering
LAG(), LEAD(): Access previous/next row values
FIRST_VALUE(), LAST_VALUE(): Window frame analysis

Performance and Optimization

Execution Plans: Understanding query optimization
Indexing Strategies: B-tree, hash, partial indexes
Partitioning: Table and index partitioning strategies
Query Hints: Forcing specific execution paths

Advanced Data Types and Functions

JSON/XML: Querying semi-structured data
Common Table Expressions: Recursive queries
Stored Procedures: Complex business logic
Triggers: Automated data processing

Conclusion

SQL proficiency remains essential for any role involving data management, from backend development to data analysis. These interview questions test both fundamental concepts and advanced database skills necessary for building scalable, efficient data systems.

The best SQL developers combine query writing skills with understanding of database design principles, performance optimization techniques, and data integrity concepts essential for enterprise applications.

Consider using Interview Zen’s technical interview platform to create comprehensive SQL assessments and observe candidates’ problem-solving approaches during database design and optimization challenges.