In this blog post, we’re going to delve into more advanced querying techniques using Flask-SQLAlchemy, building upon our existing e-commerce application. You’ll learn about subqueries, various types of joins, set operations (like UNION, INTERSECT, and EXCEPT), and common table expressions (CTEs).
Subqueries
A subquery is a query that is nested inside another query, allowing complex manipulations and filtering based on the results of the subquery. Subqueries can be used in various places such as the WHERE and SELECT clauses, and from the FROM clause. Let’s use our e-commerce app to illustrate. Let’s assume we have the following Supplier, Product and Order models defined for our eCommerce application:
class Supplier(db.Model):
__tablename__ = 'suppliers'
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String(50), nullable=False)
location = db.Column(db.String(100))
products = db.relationship('Product', backref='supplier')
def __repr__(self):
return f'<Supplier {self.name}>'
class Product(db.Model):
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String(100))
price = db.Column(db.Float)
supplier_id = db.Column(db.Integer, db.ForeignKey('suppliers.id'), nullable=False)
def __repr__(self):
return f'<Product {self.name}>'
class Order(db.Model):
id = db.Column(db.Integer, primary_key=True)
product_id = db.Column(db.Integer, db.ForeignKey('product.id'))
quantity = db.Column(db.Integer)
product = db.relationship('Product')
Let’s look at some use cases for subqueries using the SQLAlchemy ORM.
Use Case 1: Retrieving data based on aggregate values
Suppose we want to get the details of the products which have a price greater than the average price of all products. In this case, we need to first calculate the average price (a subquery), and then use that value in our main query.
from sqlalchemy import func
avg_price = db.session.query(func.avg(Product.price)).subquery()
products_above_avg = db.session.query(Product)
.filter(Product.price > avg_price)
.all()
In this example, avg_price
is a subquery that calculates the average product price. We use this subquery in the main query to filter products.
In the example above, subquery()
is used to designate the subquery, which SQLAlchemy will appropriately place within the main query.
Use Case 2: Returning data from a hierarchical or recursive relationship
Suppose our Product model has a “parent_id” field, allowing for a hierarchy of products (a “parent” product can have multiple “child” products). If we want to select all products that are children of a specific parent product, we could use a subquery.
# Get the ids of all child products of a specific parent product
subquery = db.session.query(Product.id)
.filter(Product.parent_id == some_parent_id).subquery()
# Main query to get child products
child_products = db.session.query(Product)
.filter(Product.id.in_(subquery))
.all()
Use Case 3: Applying aggregate functions to a related model
Suppose we want to get all suppliers who supply more than 5 products. We would first need to group suppliers by the number of products they supply (a subquery), and then use that in our main query.
from sqlalchemy import func
# Subquery to get suppliers who supply more than 5 products
subquery = db.session.query(Product.supplier_id, func.count(Product.id).label("product_count"))
.group_by(Product.supplier_id)
.having(func.count(Product.id) > 5)
.subquery()
# Main query to get suppliers
suppliers = db.session.query(Supplier)
.join(subquery, Supplier.id == subquery.c.supplier_id).all()
In this example, subquery.c.supplier_id
is used to refer to the supplier_id
column in the subquery. The c
attribute in SQLAlchemy provides access to the columns of a SELECT statement.
Joins (inner, outer, self)
Joins are used to combine rows from two or more tables based on a related column. Let’s discuss different types of joins.
Inner Join: This is the most common type of join. It returns records that have matching values in both tables. For instance, consider the two tables: Suppliers
and Products
. The Suppliers
table contains information about suppliers (id, name, location), and the Products
table contains information about products (id, name, supplier_id).
Suppliers Table
+----+---------+----------+
| ID | Name | Location |
+----+---------+----------+
| 1 | Apple | USA |
| 2 | Samsung | Korea |
| 3 | Sony | Japan |
+----+---------+----------+
Products Table
+----+--------+------------+
| ID | Name | SupplierID |
+----+--------+------------+
| 1 | iPhone | 1 |
| 2 | Galaxy | 2 |
| 3 | Xperia | 3 |
| 4 | Pixel | NULL |
+----+--------+------------+
If you wanted to list all the suppliers and the names of the products they supply, you could use an INNER JOIN like this:
SELECT Suppliers.Name, Products.Name
FROM Suppliers
INNER JOIN Products
ON Suppliers.ID = Products.SupplierID;
This would yield the following result:
+---------+--------+
| Name | Name |
+---------+--------+
| Apple | iPhone |
| Samsung | Galaxy |
| Sony | Xperia |
+---------+--------+
results = db.session.query(Product, Supplier).join(Supplier).all()
Outer Join: An outer join returns all records from one table, and the matched records from another table. If no match is found, the result is NULL
on the side of the table that doesn’t have a match. SQLAlchemy supports left outer joins using the outerjoin
function:
Notice that the Pixel
product is not in the result. This is because an INNER JOIN only returns rows where there is a match in both tables, and there is no supplier associated with the Pixel
product in the given data.
In terms of SQLAlchemy, you can accomplish the same thing using the join
method:
results = session.query(Suppliers.name, Products.name)
.join(Products).all()
In this above query, since we’ve established a relationship between the Suppliers
and Products
models in the SQLAlchemy model definitions, SQLAlchemy’s join()
method will automatically use that relationship to determine how to join the tables.
Even if a relationship is not defined explicitly in your SQLAlchemy models, you can still use the join()
function with an explicit join condition to relate tables in a query, as shown below,
results = session.query(Suppliers.name, Products.name)\
.join(Products, Suppliers.id == Products.supplier_id)\
.all()
In this case, Suppliers.id == Products.supplier_id
is an explicit join condition that matches up the Suppliers
and Products
tables based on their id
and supplier_id
fields, respectively.
In this query, SQLAlchemy will automatically use an INNER JOIN to combine the two tables based on the condition you specified. The result will be a list of tuples, where each tuple contains a supplier name and a product name.
Self Join: A self join is a regular join where a table is joined to itself. Imagine we had a ‘related_products’ table that lists pairs of related products. We could use a self join to get the names of both products in each pair:
class Product(db.Model):
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String(100))
price = db.Column(db.Float)
supplier_id = db.Column(db.Integer, db.ForeignKey('suppliers.id'), nullable=False)
def __repr__(self):
return f'<Product {self.name}>'
class RelatedProducts(db.Model):
id = db.Column(db.Integer, primary_key=True)
product_a_id = db.Column(db.Integer, db.ForeignKey('products.id'), nullable=False)
product_b_id = db.Column(db.Integer, db.ForeignKey('products.id'), nullable=False)
product_a = db.relationship('Product', foreign_keys=[product_a_id])
product_b = db.relationship('Product', foreign_keys=[product_b_id])
The Product
class represents a product in your store, identified by a unique id
and associated with a name
.
The RelatedProducts
class represents pairs of related products. It includes two foreign keys (product_a_id
and product_b_id
) that link to the id
column in the Product
table. These foreign keys establish a relationship between each pair of related products and the details of those products.
results = db.session.query(ProductA.name, ProductB.name)\
.join(RelatedProducts, RelatedProducts.product_a_id == ProductA.id)\
.join(Product, RelatedProducts.product_b_id == ProductB.id)\
.all()
Here’s what’s happening:
- You’re querying for pairs of related product names.
- The
join
function is used twice to connect theProductA
andProductB
tables through theRelatedProducts
table. - The first
join
connectsProductA
toRelatedProducts
whereRelatedProducts.product_a_id
equalsProductA.id
. - The second
join
connectsProductB
toRelatedProducts
whereRelatedProducts.product_b_id
equalsProductB.id
. - The
all()
function at the end retrieves all matching pairs of related product names from the database.
The result is a list of tuples, each containing a pair of related product names. If no matching pairs are found, you’ll get an empty list.
Remember that here, ProductA
and ProductB
are just aliases for the Product
model. We use these aliases because we’re referencing the Product
table twice in the same query – once for each product in the pair.
The use case for a self join as described in the previous example could be a scenario where you need to find related products in an e-commerce setting.
Imagine you run an online store, and for every product on your site, you want to suggest related products to your customers. This could be based on a variety of factors – products that are often bought together, products from the same category, or products that share certain characteristics.
For this, you might maintain a RelatedProducts
table in your database, where each row represents a pair of related products. This table could be populated either manually (if you want to hand-pick related products) or using an algorithm (if you have enough data to determine which products are related based on customer behavior).
When a customer is looking at a product on your site, you can use a self join to find all products that are related to the one they’re viewing. This way, you can display these related products on the same page, encouraging the customer to consider them as well.
Here’s a practical example: let’s say a customer is looking at a T-shirt on your site. Using a self join on the RelatedProducts
table, you can find other products that are related to this T-shirt – perhaps other T-shirts from the same brand, or pants that match the T-shirt, or even accessories like hats and scarves that go well with the T-shirt. You can then display these related products on the same page as the original T-shirt, providing a richer shopping experience for your customer.
Union, Intersect, and Except
SQLAlchemy provides methods to perform set operations, such as UNION, INTERSECT, and EXCEPT, which combine the results of two or more SELECT statements.
UNION: The UNION operator combines the result set of two or more SELECT statements (only distinct values).
INTERSECT: The INTERSECT operator returns the intersection of the result sets of two or more SELECT statements.
EXCEPT: The EXCEPT operator returns the difference of two SELECT statements.
Example: Union
The featured product table might look like this
Let’s say we want to get all distinct product names from both the ‘products’ and ‘featured_products’ tables, we can use UNION:
class FeaturedProduct(db.Model):
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String(80), unique=True, nullable=False)
price = db.Column(db.Float, nullable=False)
description = db.Column(db.String(200))
def __repr__(self):
return f"<FeaturedProduct {self.name}>"
And to get all the products names, you can use a query like:
from sqlalchemy import union
all_products = db.session.query(Product.name)
featured_products = db.session.query(FeaturedProduct.name)
all_product_names = all_products.union(featured_products).all()
The union
method combines the results of the two queries. It creates a new query that will select the product names that appear in either all_products
or featured_products
. The union
operation removes duplicates, so each product name will appear only once in the result, even if it is present in both all_products
and featured_products
.
Example: Intersect
Suppose you want to find out the products that are both in the Product
table and the FeaturedProduct
table. This is a perfect use case for intersect
.
Here’s how you would do it using SQLAlchemy’s intersect
:
from sqlalchemy import intersect
# Query to get names of all products
all_products = db.session.query(Product.name)
# Query to get names of all featured products
featured_products = db.session.query(FeaturedProduct.name)
# Query to get names of products that are in both all_products and featured_products
common_products = all_products.intersect(featured_products).all()
In this example, all_products
is a query that gets the names of all products, and featured_products
is a query that gets the names of all featured products. common_products
is a query that gets the names of products that are both regular products and featured products. The intersect
function is used to find the common elements between the two queries.
Finally, calling all()
on common_products
executes the query and returns the result as a list of tuples, where each tuple represents a row from the result set.
Example: Extract
The SQL EXCEPT
operator is used to return all the distinct rows selected by the first query but not by the second. In other words, it returns the difference between the two sets. In SQLAlchemy, you can use the except_
function to accomplish the same thing.
Here’s a sample use case. Let’s assume we have an e-commerce application with a Product
model and a FeaturedProduct
model, both of which have a name
field. We want to find all products that are not featured products.
Here’s how you can do this:
from sqlalchemy import except_
and FeaturedProduct are models with a name field
all_products = db.session.query(Product.name)
featured_products = db.session.query(FeaturedProduct.name)
non_featured_products = all_products.except_(featured_products).all()
for product in non_featured_products:
print(product.name)
In this code:
- We first create two query objects,
all_products
andfeatured_products
, which select the names of all products and featured products, respectively. - We then call the
except_
method on theall_products
query, passing in thefeatured_products
query. This creates a new query that will select the names of all products that are not featured products. - We call
all()
on the result to execute the query and retrieve the results as a list of tuples, where each tuple represents a row in the result set. - Finally, we iterate over the result and print out the name of each non-featured product.
Common Table Expressions (CTEs)
A Common Table Expression (CTE) is a named temporary result set that you can reference within another SELECT, INSERT, UPDATE, or DELETE statement. CTEs can be thought of as alternatives to derived tables (subqueries in the FROM clause), views, and inline user-defined functions.
let’s use the Product
and Order
tables. Let’s assume that we have a many-to-many relationship between Product
and Order
tables, handled by an association table order_product
, and we want to find out the total quantity ordered for each product.
First, let’s look at our tables:
class Product(db.Model):
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String(100))
price = db.Column(db.Float)
supplier_id = db.Column(db.Integer, db.ForeignKey('suppliers.id'), nullable=False)
def __repr__(self):
return f'<Product {self.name}>'
class Order(db.Model):
id = db.Column(db.Integer, primary_key=True)
product_id = db.Column(db.Integer, db.ForeignKey('product.id'))
quantity = db.Column(db.Integer)
product = db.relationship('Product')
product_orders = db.relationship('ProductOrder', backref='order')
class ProductOrder(db.Model):
__tablename__ = 'product_order'
order_id = db.Column(db.Integer, db.ForeignKey('order.id'), primary_key=True)
product_id = db.Column(db.Integer, db.ForeignKey('product.id'), primary_key=True)
quantity = db.Column(db.Integer)
product = db.relationship('Product', backref='product_orders')
Here we have an Order
model and a Product
model, and ProductOrder
model that acts as the association table, containing the quantity
of each product in each order.
Now, let’s create a CTE that sums up the quantity for each product across all orders:
from sqlalchemy import func
# Define the CTE
product_sales_cte = db.session.query(
ProductOrder.product_id,
func.sum(ProductOrder.quantity).label('total_quantity')
).group_by(
ProductOrder.product_id
).cte(name='product_sales_cte')
# Use the CTE in a query
stmt = db.session.query(
Product.name,
product_sales_cte.c.total_quantity
).join(
product_sales_cte, Product.id == product_sales_cte.c.product_id
).all()
for result in stmt:
print(f"Product: {result.name}, Total Quantity Ordered: {result.total_quantity}")
In this example, product_sales_cte
is a CTE that calculates the total quantity ordered for each product. Then we join this CTE with the Product
table to get the name of each product along with its total ordered quantity.
Yes. that was a lot. But we got here somehow. These advanced query techniques are powerful tools for managing and extracting data. Mastery of these concepts will enable you to write more efficient and effective queries. Happy querying!