8 SQL Interview Questions Every Developer Should Master
Posted on February 15 2024 by Interview Zen TeamIntroduction
SQL (Structured Query Language) remains the cornerstone of data management and backend development. Whether you’re interviewing for database administrator, backend developer, data analyst, or full-stack positions, SQL proficiency is often a key requirement.
According to the Stack Overflow Developer Survey 2024, SQL ranks as one of the most widely used programming languages, with over 50% of developers using it regularly. The ability to efficiently query, manipulate, and optimize database operations is crucial for building scalable applications and managing enterprise data systems.
This comprehensive guide presents essential SQL interview questions that test both fundamental concepts and advanced database management skills, helping hiring managers evaluate candidates’ ability to work with complex data scenarios.
Core SQL Concepts
SQL proficiency encompasses multiple areas:
- Data Querying: Complex SELECT statements with multiple JOINs and conditions
- Database Design: Normalization, relationships, and schema optimization
- Performance Optimization: Indexing strategies and query optimization
- Data Integrity: Constraints, transactions, and consistency management
- Advanced Features: Window functions, CTEs, stored procedures, and triggers
Top 8 Essential SQL Interview Questions
1. Explain the different types of JOINs with practical examples.
JOIN operations are fundamental to relational database querying.
Example Answer: “SQL JOINs combine data from multiple tables based on relationships:
Database Schema:
-- Employees table
CREATE TABLE employees (
id INT PRIMARY KEY,
name VARCHAR(100),
department_id INT,
salary DECIMAL(10,2)
);
-- Departments table
CREATE TABLE departments (
id INT PRIMARY KEY,
name VARCHAR(100),
budget DECIMAL(12,2)
);
-- Projects table
CREATE TABLE projects (
id INT PRIMARY KEY,
name VARCHAR(100),
employee_id INT
);
INNER JOIN: Returns records that have matching values in both tables
SELECT e.name, d.name as department
FROM employees e
INNER JOIN departments d ON e.department_id = d.id;
-- Result: Only employees who have a valid department
LEFT JOIN (LEFT OUTER JOIN): Returns all records from left table, matched records from right
SELECT e.name, d.name as department
FROM employees e
LEFT JOIN departments d ON e.department_id = d.id;
-- Result: All employees, including those without departments (NULL values)
RIGHT JOIN: Returns all records from right table, matched records from left
SELECT e.name, d.name as department
FROM employees e
RIGHT JOIN departments d ON e.department_id = d.id;
-- Result: All departments, including those with no employees
FULL OUTER JOIN: Returns all records when there’s a match in either table
SELECT e.name, d.name as department
FROM employees e
FULL OUTER JOIN departments d ON e.department_id = d.id;
-- Result: All employees and all departments, with NULLs where no match
CROSS JOIN: Cartesian product of both tables
SELECT e.name, d.name
FROM employees e
CROSS JOIN departments d;
-- Result: Every employee paired with every department
```"
### 2. Write a query to find the second highest salary in each department.
This tests understanding of window functions and ranking.
**Example Answer**: "Multiple approaches depending on SQL dialect:
**Using Window Functions (Modern SQL)**:
```sql
WITH ranked_salaries AS (
SELECT
name,
department_id,
salary,
ROW_NUMBER() OVER (
PARTITION BY department_id
ORDER BY salary DESC
) as salary_rank
FROM employees
)
SELECT name, department_id, salary
FROM ranked_salaries
WHERE salary_rank = 2;
Alternative with DENSE_RANK (handles ties differently):
WITH ranked_salaries AS (
SELECT
name,
department_id,
salary,
DENSE_RANK() OVER (
PARTITION BY department_id
ORDER BY salary DESC
) as salary_rank
FROM employees
)
SELECT name, department_id, salary
FROM ranked_salaries
WHERE salary_rank = 2;
Using Correlated Subquery (works in older SQL versions):
SELECT e1.name, e1.department_id, e1.salary
FROM employees e1
WHERE 1 = (
SELECT COUNT(DISTINCT e2.salary)
FROM employees e2
WHERE e2.department_id = e1.department_id
AND e2.salary > e1.salary
);
Using LIMIT/OFFSET approach (for single department):
SELECT name, salary
FROM employees
WHERE department_id = 1
ORDER BY salary DESC
LIMIT 1 OFFSET 1; -- PostgreSQL/MySQL syntax
Key Considerations:
- ROW_NUMBER() vs RANK() vs DENSE_RANK() handle ties differently
- Window functions are more readable and performant
- Consider what happens when there are salary ties”
3. How do you optimize a slow-running query?
Query optimization demonstrates practical database performance skills.
Example Answer: “Query optimization follows a systematic approach:
Step 1: Analyze Execution Plan
-- PostgreSQL
EXPLAIN ANALYZE
SELECT e.name, d.name, COUNT(p.id)
FROM employees e
JOIN departments d ON e.department_id = d.id
LEFT JOIN projects p ON e.id = p.employee_id
WHERE e.salary > 50000
GROUP BY e.name, d.name
ORDER BY COUNT(p.id) DESC;
-- SQL Server
SET STATISTICS IO ON;
-- Query here
Step 2: Index Optimization
-- Create indexes on frequently queried columns
CREATE INDEX idx_employee_salary ON employees(salary);
CREATE INDEX idx_employee_dept ON employees(department_id);
CREATE INDEX idx_project_employee ON projects(employee_id);
-- Composite index for multiple WHERE conditions
CREATE INDEX idx_employee_dept_salary ON employees(department_id, salary);
Step 3: Query Restructuring
-- Before: Inefficient subquery in WHERE clause
SELECT name FROM employees
WHERE department_id IN (
SELECT id FROM departments WHERE budget > 100000
);
-- After: JOIN instead of subquery
SELECT e.name
FROM employees e
INNER JOIN departments d ON e.department_id = d.id
WHERE d.budget > 100000;
Step 4: Eliminate Unnecessary Operations
-- Before: SELECT * and unnecessary columns
SELECT * FROM employees e
JOIN departments d ON e.department_id = d.id
WHERE e.salary > 50000
ORDER BY e.name;
-- After: Select only needed columns
SELECT e.name, e.salary
FROM employees e
WHERE e.salary > 50000
AND e.department_id IS NOT NULL
ORDER BY e.name;
Step 5: Use LIMIT for Large Result Sets
-- Add pagination for large datasets
SELECT e.name, e.salary
FROM employees e
WHERE e.salary > 50000
ORDER BY e.salary DESC
LIMIT 20 OFFSET 0;
Common Optimization Techniques:
- Use indexes strategically
- Avoid SELECT *
- Use WHERE before JOINs when possible
- Consider partitioning for very large tables
- Use UNION ALL instead of UNION when duplicates are acceptable
- Replace correlated subqueries with JOINs”
4. Explain ACID properties with practical examples.
ACID properties ensure database transaction reliability.
Example Answer: “ACID guarantees data integrity in database transactions:
Atomicity: All operations in a transaction succeed or all fail
BEGIN TRANSACTION;
-- Transfer $1000 from account A to account B
UPDATE accounts SET balance = balance - 1000 WHERE id = 'A';
UPDATE accounts SET balance = balance + 1000 WHERE id = 'B';
-- If either update fails, both are rolled back
IF @@ERROR != 0
ROLLBACK TRANSACTION;
ELSE
COMMIT TRANSACTION;
Consistency: Database remains in a valid state before and after transaction
-- Constraint ensures account balance never goes negative
ALTER TABLE accounts ADD CONSTRAINT check_positive_balance
CHECK (balance >= 0);
BEGIN TRANSACTION;
UPDATE accounts SET balance = balance - 1500 WHERE id = 'A';
-- This will fail if account A only has $1000, maintaining consistency
COMMIT TRANSACTION;
Isolation: Concurrent transactions don’t interfere with each other
-- Transaction 1 (Transfer)
BEGIN TRANSACTION;
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SELECT balance FROM accounts WHERE id = 'A'; -- $1000
UPDATE accounts SET balance = balance - 500 WHERE id = 'A';
-- ... other operations
COMMIT TRANSACTION;
-- Transaction 2 (Concurrent read)
-- Won't see partial results from Transaction 1
SELECT balance FROM accounts WHERE id = 'A';
Durability: Committed changes persist even after system failure
-- Once committed, this change survives power failures, crashes
BEGIN TRANSACTION;
INSERT INTO audit_log (action, timestamp, user_id)
VALUES ('LOGIN', NOW(), 12345);
COMMIT TRANSACTION;
-- Data is permanently stored and recoverable
Isolation Levels:
-- READ UNCOMMITTED: Allows dirty reads
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
-- READ COMMITTED: Prevents dirty reads
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
-- REPEATABLE READ: Prevents dirty and non-repeatable reads
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
-- SERIALIZABLE: Prevents all phenomena, highest isolation
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
Practical Example - Banking System:
-- Atomic money transfer with validation
CREATE PROCEDURE transfer_money(
@from_account INT,
@to_account INT,
@amount DECIMAL(10,2)
)
AS
BEGIN
BEGIN TRANSACTION;
-- Check sufficient funds (Consistency)
IF (SELECT balance FROM accounts WHERE id = @from_account) < @amount
BEGIN
ROLLBACK TRANSACTION;
RAISERROR('Insufficient funds', 16, 1);
RETURN;
END
-- Perform transfer (Atomicity)
UPDATE accounts SET balance = balance - @amount WHERE id = @from_account;
UPDATE accounts SET balance = balance + @amount WHERE id = @to_account;
-- Log transaction (Durability)
INSERT INTO transaction_log (from_id, to_id, amount, timestamp)
VALUES (@from_account, @to_account, @amount, GETDATE());
COMMIT TRANSACTION;
END
```"
### 5. What's the difference between clustered and non-clustered indexes?
Index types significantly impact query performance and storage.
**Example Answer**: "Indexes optimize data retrieval but differ in storage and behavior:
**Clustered Index**: Physically reorders table data
```sql
-- Primary key automatically creates clustered index
CREATE TABLE employees (
id INT PRIMARY KEY, -- Clustered index
name VARCHAR(100),
salary DECIMAL(10,2)
);
-- Table data physically stored in id order: 1, 2, 3, 4...
-- Each table can have only ONE clustered index
Non-Clustered Index: Separate structure pointing to table rows
-- Create non-clustered index on salary
CREATE INDEX idx_salary ON employees(salary);
-- Index structure:
-- Salary | Row Pointer
-- 30000 | Row 4
-- 45000 | Row 2
-- 60000 | Row 1
-- 75000 | Row 3
Performance Implications:
-- Clustered index: Direct data access (fast)
SELECT * FROM employees WHERE id = 123;
-- Goes directly to physical location of row 123
-- Non-clustered index: Two-step lookup
SELECT * FROM employees WHERE salary = 50000;
-- Step 1: Find salary in index → get row pointer
-- Step 2: Go to row pointer location → get data
Multiple Non-Clustered Indexes:
-- Table can have multiple non-clustered indexes
CREATE INDEX idx_name ON employees(name);
CREATE INDEX idx_department ON employees(department_id);
CREATE INDEX idx_salary_dept ON employees(salary, department_id); -- Composite
Choosing Clustered Index Column:
-- Good clustered index: Sequential, unique, static
-- Primary key (auto-incrementing ID) - default choice
CREATE TABLE orders (
id INT IDENTITY PRIMARY KEY, -- Good clustered index
customer_id INT,
order_date DATE
);
-- Alternative: Date-based clustering for time-series data
CREATE CLUSTERED INDEX idx_orders_date ON orders(order_date);
Storage and Maintenance:
-- Clustered index determines physical storage order
-- Insertions may cause page splits if not sequential
INSERT INTO employees (id, name) VALUES (999999, 'New Employee');
-- If id=999999 needs to be inserted between existing records,
-- causes expensive page reorganization
-- Non-clustered indexes require maintenance overhead
-- Each INSERT/UPDATE/DELETE must update all relevant indexes
Key Differences Summary:
- Clustered: One per table, reorders data, faster for range queries
- Non-clustered: Multiple allowed, separate structure, good for point lookups
- Performance: Clustered faster for primary key, non-clustered for secondary searches
- Maintenance: Clustered affects insert performance, non-clustered increases overhead”
6. Write a query to find employees who earn more than their manager.
This tests self-joins and hierarchical data queries.
Example Answer: “Self-joins handle hierarchical relationships within a single table:
Table Structure:
CREATE TABLE employees (
id INT PRIMARY KEY,
name VARCHAR(100),
salary DECIMAL(10,2),
manager_id INT, -- References employees.id
FOREIGN KEY (manager_id) REFERENCES employees(id)
);
-- Sample data
INSERT INTO employees VALUES
(1, 'CEO', 200000, NULL),
(2, 'VP Engineering', 150000, 1),
(3, 'Senior Dev', 120000, 2),
(4, 'Junior Dev', 160000, 2), -- Earns more than manager!
(5, 'Intern', 140000, 3); -- Earns more than manager!
Basic Self-Join Solution:
SELECT
emp.name as employee_name,
emp.salary as employee_salary,
mgr.name as manager_name,
mgr.salary as manager_salary
FROM employees emp
INNER JOIN employees mgr ON emp.manager_id = mgr.id
WHERE emp.salary > mgr.salary;
Enhanced Version with Salary Difference:
SELECT
emp.name as employee_name,
emp.salary as employee_salary,
mgr.name as manager_name,
mgr.salary as manager_salary,
(emp.salary - mgr.salary) as salary_difference,
ROUND(((emp.salary - mgr.salary) / mgr.salary) * 100, 2) as percentage_higher
FROM employees emp
INNER JOIN employees mgr ON emp.manager_id = mgr.id
WHERE emp.salary > mgr.salary
ORDER BY salary_difference DESC;
Including Department Information:
SELECT
emp.name as employee_name,
dept.name as department,
emp.salary as employee_salary,
mgr.name as manager_name,
mgr.salary as manager_salary
FROM employees emp
INNER JOIN employees mgr ON emp.manager_id = mgr.id
INNER JOIN departments dept ON emp.department_id = dept.id
WHERE emp.salary > mgr.salary
ORDER BY dept.name, (emp.salary - mgr.salary) DESC;
Advanced: Multi-Level Hierarchy Analysis:
-- Find employees who earn more than any manager in their chain
WITH management_chain AS (
-- Base case: direct reports
SELECT
id as employee_id,
manager_id,
1 as level
FROM employees
WHERE manager_id IS NOT NULL
UNION ALL
-- Recursive case: indirect reports
SELECT
mc.employee_id,
e.manager_id,
mc.level + 1
FROM management_chain mc
INNER JOIN employees e ON mc.manager_id = e.id
WHERE e.manager_id IS NOT NULL
)
SELECT DISTINCT
emp.name as employee_name,
emp.salary as employee_salary,
STRING_AGG(mgr.name, ', ') as managers_earning_less
FROM employees emp
INNER JOIN management_chain mc ON emp.id = mc.employee_id
INNER JOIN employees mgr ON mc.manager_id = mgr.id
WHERE emp.salary > mgr.salary
GROUP BY emp.id, emp.name, emp.salary;
Key Concepts Demonstrated:
- Self-joins for hierarchical data
- Handling NULL values (CEO has no manager)
- Calculating derived values (salary differences)
- Common Table Expressions (CTEs) for complex hierarchies”
7. Explain the differences between UNION, UNION ALL, INTERSECT, and EXCEPT.
Set operations combine results from multiple queries.
Example Answer: “SQL set operations work with multiple result sets:
Sample Tables:
-- Current employees
CREATE TABLE current_employees (
id INT, name VARCHAR(100), department VARCHAR(50)
);
INSERT VALUES
(1, 'John', 'Engineering'),
(2, 'Jane', 'Marketing'),
(3, 'Bob', 'Engineering');
-- Former employees
CREATE TABLE former_employees (
id INT, name VARCHAR(100), department VARCHAR(50)
);
INSERT VALUES
(2, 'Jane', 'Marketing'), -- Duplicate with current
(4, 'Alice', 'Sales'),
(5, 'Carol', 'Engineering');
UNION: Combines results, removes duplicates
SELECT name, department FROM current_employees
UNION
SELECT name, department FROM former_employees;
-- Result: John, Jane, Bob, Alice, Carol (Jane appears once)
-- Automatically sorts results and removes duplicates
-- Performance impact: Must sort and deduplicate
UNION ALL: Combines results, keeps duplicates
SELECT name, department FROM current_employees
UNION ALL
SELECT name, department FROM former_employees;
-- Result: John, Jane, Bob, Jane, Alice, Carol (Jane appears twice)
-- Faster than UNION (no sorting or deduplication)
-- Use when you know there are no duplicates or duplicates are acceptable
INTERSECT: Returns only common records
SELECT name, department FROM current_employees
INTERSECT
SELECT name, department FROM former_employees;
-- Result: Jane, Marketing (only record present in both tables)
-- Useful for finding overlaps between datasets
EXCEPT (or MINUS in Oracle): Returns records from first query not in second
SELECT name, department FROM current_employees
EXCEPT
SELECT name, department FROM former_employees;
-- Result: John, Bob (current employees who were never former employees)
-- Useful for finding differences between datasets
Practical Use Cases:
Data Auditing:
-- Find records that exist in production but not in backup
SELECT customer_id, email FROM production.customers
EXCEPT
SELECT customer_id, email FROM backup.customers;
Report Generation:
-- Combine data from multiple time periods
SELECT 'Q1' as quarter, product_id, SUM(sales) as total_sales
FROM q1_sales GROUP BY product_id
UNION ALL
SELECT 'Q2' as quarter, product_id, SUM(sales)
FROM q2_sales GROUP BY product_id
UNION ALL
SELECT 'Q3' as quarter, product_id, SUM(sales)
FROM q3_sales GROUP BY product_id
ORDER BY product_id, quarter;
Data Validation:
-- Find customers who appear in orders but not in customer table
SELECT customer_id FROM orders
EXCEPT
SELECT customer_id FROM customers;
Key Requirements:
- All queries must have same number of columns
- Corresponding columns must have compatible data types
- Column names taken from first query
- ORDER BY can only be used in final query (applies to entire result)”
8. How do you handle NULL values in SQL queries?
NULL handling is crucial for data integrity and accurate results.
Example Answer: “NULL represents unknown or missing data and requires special handling:
NULL Behavior in Comparisons:
-- These all return NULL (unknown), not true or false
SELECT * FROM employees WHERE salary = NULL; -- Wrong! Returns no rows
SELECT * FROM employees WHERE salary != NULL; -- Wrong! Returns no rows
SELECT * FROM employees WHERE salary > NULL; -- Wrong! Returns no rows
-- Correct NULL comparisons
SELECT * FROM employees WHERE salary IS NULL;
SELECT * FROM employees WHERE salary IS NOT NULL;
NULL in Arithmetic Operations:
-- Any arithmetic with NULL returns NULL
SELECT
name,
salary,
bonus,
salary + bonus as total_compensation -- NULL if bonus is NULL
FROM employees;
-- Handle with COALESCE or ISNULL
SELECT
name,
salary,
COALESCE(bonus, 0) as bonus, -- Replace NULL with 0
salary + COALESCE(bonus, 0) as total_compensation
FROM employees;
NULL in Aggregate Functions:
-- Most aggregates ignore NULL values
SELECT
COUNT(*) as total_employees, -- Counts all rows including NULLs
COUNT(bonus) as employees_with_bonus, -- Counts only non-NULL bonus values
AVG(bonus) as average_bonus, -- Calculates average ignoring NULLs
SUM(bonus) as total_bonus -- Sums only non-NULL values
FROM employees;
-- Be careful with averages when NULLs present
SELECT
AVG(COALESCE(bonus, 0)) as avg_including_zero_bonus,
AVG(bonus) as avg_excluding_null_bonus
FROM employees;
NULL in JOIN Operations:
-- NULLs don't match in JOINs
SELECT e.name, d.name as department
FROM employees e
LEFT JOIN departments d ON e.department_id = d.id;
-- Employees with NULL department_id will show NULL for department name
-- Finding unmatched records
SELECT e.name
FROM employees e
LEFT JOIN departments d ON e.department_id = d.id
WHERE d.id IS NULL; -- Employees without valid department
NULL in WHERE Clauses with AND/OR:
-- Complex logic with NULLs
SELECT name FROM employees
WHERE salary > 50000 AND bonus > 1000;
-- Excludes employees with NULL bonus
-- Include employees with NULL bonus
SELECT name FROM employees
WHERE salary > 50000 AND (bonus > 1000 OR bonus IS NULL);
-- Three-valued logic: TRUE, FALSE, NULL
Practical NULL Handling Functions:
-- COALESCE: Return first non-NULL value
SELECT
name,
COALESCE(work_phone, home_phone, mobile_phone, 'No phone') as contact_phone
FROM employees;
-- NULLIF: Return NULL if values are equal
SELECT
name,
NULLIF(middle_name, '') as middle_name -- Convert empty string to NULL
FROM employees;
-- CASE statements for complex NULL logic
SELECT
name,
CASE
WHEN bonus IS NULL THEN 'No bonus'
WHEN bonus = 0 THEN 'Zero bonus'
ELSE CAST(bonus as VARCHAR) + ' bonus'
END as bonus_description
FROM employees;
NULL in Sorting:
-- NULLs typically sort first (ASC) or last (DESC), but varies by database
SELECT name, bonus
FROM employees
ORDER BY bonus ASC; -- NULLs first in most databases
-- Control NULL ordering explicitly
SELECT name, bonus
FROM employees
ORDER BY bonus ASC NULLS LAST; -- PostgreSQL, Oracle
-- Alternative approach for consistent behavior
SELECT name, bonus
FROM employees
ORDER BY ISNULL(bonus, 1), bonus ASC; -- SQL Server: NULLs last
Data Quality Checks:
-- Find records with unexpected NULLs
SELECT
'employees' as table_name,
COUNT(*) as total_rows,
COUNT(name) as non_null_names,
COUNT(*) - COUNT(name) as null_names,
COUNT(salary) as non_null_salaries,
COUNT(*) - COUNT(salary) as null_salaries
FROM employees;
Best Practices:
- Always use IS NULL and IS NOT NULL for NULL comparisons
- Consider NULL behavior in aggregate functions and JOINs
- Use COALESCE for default values
- Document whether NULLs are expected in each column
- Consider using NOT NULL constraints where appropriate”
Advanced SQL Topics for Senior Roles
Window Functions and Analytics
- ROW_NUMBER(), RANK(), DENSE_RANK(): Ranking and numbering
- LAG(), LEAD(): Access previous/next row values
- FIRST_VALUE(), LAST_VALUE(): Window frame analysis
Performance and Optimization
- Execution Plans: Understanding query optimization
- Indexing Strategies: B-tree, hash, partial indexes
- Partitioning: Table and index partitioning strategies
- Query Hints: Forcing specific execution paths
Advanced Data Types and Functions
- JSON/XML: Querying semi-structured data
- Common Table Expressions: Recursive queries
- Stored Procedures: Complex business logic
- Triggers: Automated data processing
Conclusion
SQL proficiency remains essential for any role involving data management, from backend development to data analysis. These interview questions test both fundamental concepts and advanced database skills necessary for building scalable, efficient data systems.
The best SQL developers combine query writing skills with understanding of database design principles, performance optimization techniques, and data integrity concepts essential for enterprise applications.
Consider using Interview Zen’s technical interview platform to create comprehensive SQL assessments and observe candidates’ problem-solving approaches during database design and optimization challenges.