Aggregation and grouping are powerful concepts in SQL that allow you to summarize and categorize your data. They are particularly useful for generating reports and understanding trends in your data. In this post, we will explore these concepts in the context of an eCommerce application using Flask-SQLAlchemy.
Let’s assume we have the following Order
and Product
models defined for our eCommerce application:
class Product(db.Model):
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String(100))
price = db.Column(db.Float)
class Order(db.Model):
id = db.Column(db.Integer, primary_key=True)
product_id = db.Column(db.Integer, db.ForeignKey('product.id'))
quantity = db.Column(db.Integer)
product = db.relationship('Product')
Each Order
has a quantity and is associated with a Product
, which has a price.
Counting Records
Counting records in SQLAlchemy is straightforward. You can use the db.session.query().count()
method to count the number of records in a table.
For example, to count the total number of orders in the database, you would do the following:
order_count = db.session.query(Order).count()
print(f'Total orders: {order_count}')
Sum, Average, Min, and Max Functions
SQLAlchemy also provides functions to calculate the sum, average, minimum, and maximum of a column.
For instance, to calculate the total quantity of all orders, you would do:
from sqlalchemy import func
total_quantity = db.session.query(func.sum(Order.quantity)).scalar()
print(f'Total quantity ordered: {total_quantity}')
Ensure you import func from sqlalchemy
An aside: What is scalar?
The scalar()
function in SQLAlchemy is used to fetch only one row and column from the selected data.
When we execute a query that returns rows of data, sometimes, we’re only interested in a single value from a single row. This is where scalar()
comes into play. scalar()
fetches the first row and then returns the first column of that row.
For example, suppose you have a User
model and you want to get the email address of a user with a specific ID. Instead of fetching the entire User object and then accessing its email
attribute, you can do:
email = db.session.query(User.email).filter(User.id == 123).scalar()
In this case, email
will be a string (or None if no such user exists), not a User object.
Remember that scalar()
will return None if there are no results, and only the value of the first column of the first result if there are results. If the query returns multiple rows or columns, scalar()
will still only return the first column of the first row. If you need all results, you should use all()
, first()
or another method to fetch data.
Another use case of scalar()
is when using functions like func.count()
, func.sum()
, func.min()
, func.max()
, etc. to perform aggregate queries. Since these functions return a single value, you can use scalar()
to get the value directly:
# count the number of users
user_count = db.session.query(func.count(User.id)).scalar()
As aside: What is func?
In SQLAlchemy, func
is a special object that allows you to access SQL functions in a database-agnostic way. It is most commonly used for calling built-in database functions.
func
is typically used with methods that correspond to SQL functions. When you call a method on func
, it generates a SQL function call with that name.
For example, the following will generate a COUNT(*)
in SQL:
from sqlalchemy.sql import func
count = db.session.query(func.count('*')).select_from(User).scalar()
This will return the number of rows in the User
table.
You can also use func
to call other SQL functions. For example
from sqlalchemy.sql import func
# Use the SQL NOW() function to get the current date and time
current_time = db.session.query(func.now()).scalar()
# Use the SQL MAX() function to get the maximum user id
max_id = db.session.query(func.max(User.id)).scalar()
In these examples, func.now()
and func.max(User.id)
generate the SQL NOW()
and MAX()
functions, respectively.
Additionally, func
can be used to call any function that the database supports, even if SQLAlchemy doesn’t provide a specific method for it. For example, if your database supports a MY_CUSTOM_FUNCTION()
function, you could call it like so:
result = db.session.query
(func.MY_CUSTOM_FUNCTION(arg1, arg2, arg3))
.scalar()
This will generate the SQL MY_CUSTOM_FUNCTION(arg1, arg2, arg3).
Keep in mind that while func allows you to call functions in a database-agnostic way, the functions themselves are not always database-agnostic. Not all databases support the same functions, and even when they do, the functions might not behave in the same way. Be sure to check the documentation for your specific database when using func.
Getting back to our Ecommerce application, to find the average, minimum, and maximum product price, you would do:
average_price = db.session.query(func.avg(Product.price)).scalar()
min_price = db.session.query(func.min(Product.price)).scalar()
max_price = db.session.query(func.max(Product.price)).scalar()
print(f'Average product price: {average_price}')
print(f'Minimum product price: {min_price}')
print(f'Maximum product price: {max_price}')
Group By and Having Clauses
The group_by
clause allows you to divide your records into groups for aggregation.
For example, you might want to know the total quantity ordered for each product. To do this, you can group by product_id
and sum the quantity
:
from sqlalchemy import desc, func
product_quantities = db.session.query(
Order.product_id,
func.sum(Order.quantity)
).group_by(
Order.product_id
).order_by(
desc(func.sum(Order.quantity))
).all()
for product_id, total_quantity in product_quantities:
print(f'Product ID: {product_id}, Total Quantity: {total_quantity}')
This will give you a list of product IDs along with the total quantity ordered for each product, ordered from highest to lowest.
The having
clause is like a where
clause for groups. For instance, you could modify the above query to only include products that have a total ordered quantity greater than 100:
product_quantities = db.session.query(
Order.product_id,
func.sum(Order.quantity)
).group_by(
Order.product_id
).having(
func.sum(Order.quantity) > 100
).all()
These are just the basics of aggregation and grouping in SQLAlchemy. There’s a lot more you can do with these tools, but hopefully, this gives you a good foundation to start exploring on your own!