The `GROUP BY` clause in SQL is used to arrange identical data into groups. This is particularly useful when you need to perform aggregate calculations like finding totals, averages, or counts. Additionally, the `GROUP BY` clause is often coupled with aggregate functions to provide meaning and insights into the summarised data.
### Aggregate Functions:
1. **COUNT**: Returns the number of rows in a set.
2. **SUM**: Adds up all the values in a set.
3. **AVG**: Calculates the average of a set of values.
4. **MIN**: Finds the minimum value in a set.
5. **MAX**: Finds the maximum value in a set.
### Basic Example with Data:
Consider the following simple dataset of sales:
| SaleID | Product | Category | Amount | Date |
|——–|———-|———-|——–|————|
| 1 | A | Electronics | 100 | 2023-01-01 |
| 2 | B | Electronics | 250 | 2023-01-05 |
| 3 | C | Furniture | 450 | 2023-01-10 |
| 4 | D | Furniture | 150 | 2023-01-15 |
| 5 | A | Electronics | 200 | 2023-01-20 |
#### Using `GROUP BY`:
To find the total sales amount for each category:
“`sql
SELECT Category, SUM(Amount) as TotalSales
FROM Sales
GROUP BY Category;
“`
**Result:**
| Category | TotalSales |
|————-|————|
| Electronics | 550 |
| Furniture | 600 |
#### Using Other Aggregate Functions:
Find the average, minimum, and maximum sales amount per category:
“`sql
SELECT Category, AVG(Amount) as AverageSales, MIN(Amount) as MinSale, MAX(Amount) as MaxSale
FROM Sales
GROUP BY Category;
“`
**Result:**
| Category | AverageSales | MinSale | MaxSale |
|————-|————–|———|———|
| Electronics | 183.33 | 100 | 250 |
| Furniture | 300.00 | 150 | 450 |
### Grouping by Multiple Columns:
You can group by multiple columns to get more granular insights. For example, if you want sales totals for each product per category:
“`sql
SELECT Category, Product, SUM(Amount) as TotalSales
FROM Sales
GROUP BY Category, Product;
“`
**Result:**
| Category | Product | TotalSales |
|————-|———|————|
| Electronics | A | 300 |
| Electronics | B | 250 |
| Furniture | C | 450 |
| Furniture | D | 150 |
### WHERE vs. HAVING:
– **WHERE**: Used to filter records before any grouping is done.
– **HAVING**: Used to filter records after an aggregation/grouping has occurred.
**Example with WHERE**:
To filter the data before grouping, suppose you are only interested in sales after 2023-01-10:
“`sql
SELECT Category, SUM(Amount) as TotalSales
FROM Sales
WHERE Date > ‘2023-01-10’
GROUP BY Category;
“`
**Example with HAVING**:
To filter after grouping, to only see categories with total sales greater than 500:
“`sql
SELECT Category, SUM(Amount) as TotalSales
FROM Sales
GROUP BY Category
HAVING SUM(Amount) > 500;
“`
**Result:**
| Category | TotalSales |
|————-|————|
| Furniture | 600 |
In this example, `HAVING` allows filtering based on the aggregated result of `SUM(Amount)`, which isn’t possible with `WHERE` since `WHERE` works before aggregation. This distinction is crucial for effectively using these clauses in SQL queries.