G. Advanced Query Techniques with Flask-SQLAlchemy

In this blog post, we’re going to delve into more advanced querying techniques using Flask-SQLAlchemy, building upon our existing e-commerce application. You’ll learn about subqueries, various types of joins, set operations (like UNION, INTERSECT, and EXCEPT), and common table expressions (CTEs).

Subqueries

A subquery is a query that is nested inside another query, allowing complex manipulations and filtering based on the results of the subquery. Subqueries can be used in various places such as the WHERE and SELECT clauses, and from the FROM clause. Let’s use our e-commerce app to illustrate. Let’s assume we have the following Supplier, Product and Order models defined for our eCommerce application:

class Supplier(db.Model):
    __tablename__ = 'suppliers'

    id = db.Column(db.Integer, primary_key=True)
    name = db.Column(db.String(50), nullable=False)
    location = db.Column(db.String(100))

    products = db.relationship('Product', backref='supplier')

    def __repr__(self):
        return f'<Supplier {self.name}>'


class Product(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    name = db.Column(db.String(100))
    price = db.Column(db.Float)
    supplier_id = db.Column(db.Integer, db.ForeignKey('suppliers.id'), nullable=False)

    def __repr__(self):
        return f'<Product {self.name}>'


class Order(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    product_id = db.Column(db.Integer, db.ForeignKey('product.id'))
    quantity = db.Column(db.Integer)
    product = db.relationship('Product')

Let’s look at some use cases for subqueries using the SQLAlchemy ORM.

Use Case 1: Retrieving data based on aggregate values

Suppose we want to get the details of the products which have a price greater than the average price of all products. In this case, we need to first calculate the average price (a subquery), and then use that value in our main query.

from sqlalchemy import func

avg_price = db.session.query(func.avg(Product.price)).subquery()

products_above_avg = db.session.query(Product)
                      .filter(Product.price > avg_price)
                      .all()

In this example, avg_price is a subquery that calculates the average product price. We use this subquery in the main query to filter products.

In the example above, subquery() is used to designate the subquery, which SQLAlchemy will appropriately place within the main query.

Use Case 2: Returning data from a hierarchical or recursive relationship

Suppose our Product model has a “parent_id” field, allowing for a hierarchy of products (a “parent” product can have multiple “child” products). If we want to select all products that are children of a specific parent product, we could use a subquery.

# Get the ids of all child products of a specific parent product
subquery = db.session.query(Product.id)
           .filter(Product.parent_id == some_parent_id).subquery()

# Main query to get child products
child_products = db.session.query(Product)
                 .filter(Product.id.in_(subquery))
                 .all()

Use Case 3: Applying aggregate functions to a related model

Suppose we want to get all suppliers who supply more than 5 products. We would first need to group suppliers by the number of products they supply (a subquery), and then use that in our main query.

from sqlalchemy import func

# Subquery to get suppliers who supply more than 5 products
subquery = db.session.query(Product.supplier_id, func.count(Product.id).label("product_count"))
                         .group_by(Product.supplier_id)
                         .having(func.count(Product.id) > 5)
                         .subquery()

# Main query to get suppliers
suppliers = db.session.query(Supplier)
            .join(subquery, Supplier.id == subquery.c.supplier_id).all()

In this example, subquery.c.supplier_id is used to refer to the supplier_id column in the subquery. The c attribute in SQLAlchemy provides access to the columns of a SELECT statement.

Joins (inner, outer, self)

Joins are used to combine rows from two or more tables based on a related column. Let’s discuss different types of joins.

Inner Join: This is the most common type of join. It returns records that have matching values in both tables. For instance, consider the two tables: Suppliers and Products. The Suppliers table contains information about suppliers (id, name, location), and the Products table contains information about products (id, name, supplier_id).

Suppliers Table
+----+---------+----------+
| ID | Name    | Location |
+----+---------+----------+
| 1  | Apple   | USA      |
| 2  | Samsung | Korea    |
| 3  | Sony    | Japan    |
+----+---------+----------+

Products Table
+----+--------+------------+
| ID | Name   | SupplierID |
+----+--------+------------+
| 1  | iPhone | 1          |
| 2  | Galaxy | 2          |
| 3  | Xperia | 3          |
| 4  | Pixel  | NULL       |
+----+--------+------------+

If you wanted to list all the suppliers and the names of the products they supply, you could use an INNER JOIN like this:

SELECT Suppliers.Name, Products.Name
FROM Suppliers
INNER JOIN Products
ON Suppliers.ID = Products.SupplierID;

This would yield the following result:

+---------+--------+
| Name    | Name   |
+---------+--------+
| Apple   | iPhone |
| Samsung | Galaxy |
| Sony    | Xperia |
+---------+--------+
results = db.session.query(Product, Supplier).join(Supplier).all()

Outer Join: An outer join returns all records from one table, and the matched records from another table. If no match is found, the result is NULL on the side of the table that doesn’t have a match. SQLAlchemy supports left outer joins using the outerjoin function:

Notice that the Pixel product is not in the result. This is because an INNER JOIN only returns rows where there is a match in both tables, and there is no supplier associated with the Pixel product in the given data.

In terms of SQLAlchemy, you can accomplish the same thing using the join method:

results = session.query(Suppliers.name, Products.name)
                      .join(Products).all()

In this above query, since we’ve established a relationship between the Suppliers and Products models in the SQLAlchemy model definitions, SQLAlchemy’s join() method will automatically use that relationship to determine how to join the tables.

Even if a relationship is not defined explicitly in your SQLAlchemy models, you can still use the join() function with an explicit join condition to relate tables in a query, as shown below,

results = session.query(Suppliers.name, Products.name)\
    .join(Products, Suppliers.id == Products.supplier_id)\
    .all()

In this case, Suppliers.id == Products.supplier_id is an explicit join condition that matches up the Suppliers and Products tables based on their id and supplier_id fields, respectively.

In this query, SQLAlchemy will automatically use an INNER JOIN to combine the two tables based on the condition you specified. The result will be a list of tuples, where each tuple contains a supplier name and a product name.

Self Join: A self join is a regular join where a table is joined to itself. Imagine we had a ‘related_products’ table that lists pairs of related products. We could use a self join to get the names of both products in each pair:

class Product(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    name = db.Column(db.String(100))
    price = db.Column(db.Float)
    supplier_id = db.Column(db.Integer, db.ForeignKey('suppliers.id'), nullable=False)

    def __repr__(self):
        return f'<Product {self.name}>'


class RelatedProducts(db.Model):

    id = db.Column(db.Integer, primary_key=True)
    product_a_id = db.Column(db.Integer, db.ForeignKey('products.id'), nullable=False)
    product_b_id = db.Column(db.Integer, db.ForeignKey('products.id'), nullable=False)

    product_a = db.relationship('Product', foreign_keys=[product_a_id])
    product_b = db.relationship('Product', foreign_keys=[product_b_id])

The Product class represents a product in your store, identified by a unique id and associated with a name.

The RelatedProducts class represents pairs of related products. It includes two foreign keys (product_a_id and product_b_id) that link to the id column in the Product table. These foreign keys establish a relationship between each pair of related products and the details of those products.

results = db.session.query(ProductA.name, ProductB.name)\
                    .join(RelatedProducts, RelatedProducts.product_a_id == ProductA.id)\
                    .join(Product, RelatedProducts.product_b_id == ProductB.id)\
                    .all()

Here’s what’s happening:

  1. You’re querying for pairs of related product names.
  2. The join function is used twice to connect the ProductA and ProductB tables through the RelatedProducts table.
  3. The first join connects ProductA to RelatedProducts where RelatedProducts.product_a_id equals ProductA.id.
  4. The second join connects ProductB to RelatedProducts where RelatedProducts.product_b_id equals ProductB.id.
  5. The all() function at the end retrieves all matching pairs of related product names from the database.

The result is a list of tuples, each containing a pair of related product names. If no matching pairs are found, you’ll get an empty list.

Remember that here, ProductA and ProductB are just aliases for the Product model. We use these aliases because we’re referencing the Product table twice in the same query – once for each product in the pair.

The use case for a self join as described in the previous example could be a scenario where you need to find related products in an e-commerce setting.

Imagine you run an online store, and for every product on your site, you want to suggest related products to your customers. This could be based on a variety of factors – products that are often bought together, products from the same category, or products that share certain characteristics.

For this, you might maintain a RelatedProducts table in your database, where each row represents a pair of related products. This table could be populated either manually (if you want to hand-pick related products) or using an algorithm (if you have enough data to determine which products are related based on customer behavior).

When a customer is looking at a product on your site, you can use a self join to find all products that are related to the one they’re viewing. This way, you can display these related products on the same page, encouraging the customer to consider them as well.

Here’s a practical example: let’s say a customer is looking at a T-shirt on your site. Using a self join on the RelatedProducts table, you can find other products that are related to this T-shirt – perhaps other T-shirts from the same brand, or pants that match the T-shirt, or even accessories like hats and scarves that go well with the T-shirt. You can then display these related products on the same page as the original T-shirt, providing a richer shopping experience for your customer.

Union, Intersect, and Except

SQLAlchemy provides methods to perform set operations, such as UNION, INTERSECT, and EXCEPT, which combine the results of two or more SELECT statements.

UNION: The UNION operator combines the result set of two or more SELECT statements (only distinct values).

INTERSECT: The INTERSECT operator returns the intersection of the result sets of two or more SELECT statements.

EXCEPT: The EXCEPT operator returns the difference of two SELECT statements.

Example: Union

The featured product table might look like this

Let’s say we want to get all distinct product names from both the ‘products’ and ‘featured_products’ tables, we can use UNION:

class FeaturedProduct(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    name = db.Column(db.String(80), unique=True, nullable=False)
    price = db.Column(db.Float, nullable=False)
    description = db.Column(db.String(200))

    def __repr__(self):
        return f"<FeaturedProduct {self.name}>"

And to get all the products names, you can use a query like:

from sqlalchemy import union


all_products = db.session.query(Product.name)
featured_products = db.session.query(FeaturedProduct.name)

all_product_names = all_products.union(featured_products).all()

The union method combines the results of the two queries. It creates a new query that will select the product names that appear in either all_products or featured_products. The union operation removes duplicates, so each product name will appear only once in the result, even if it is present in both all_products and featured_products.

Example: Intersect

Suppose you want to find out the products that are both in the Product table and the FeaturedProduct table. This is a perfect use case for intersect.

Here’s how you would do it using SQLAlchemy’s intersect:

from sqlalchemy import intersect

# Query to get names of all products
all_products = db.session.query(Product.name)

# Query to get names of all featured products
featured_products = db.session.query(FeaturedProduct.name)

# Query to get names of products that are in both all_products and featured_products
common_products = all_products.intersect(featured_products).all()

In this example, all_products is a query that gets the names of all products, and featured_products is a query that gets the names of all featured products. common_products is a query that gets the names of products that are both regular products and featured products. The intersect function is used to find the common elements between the two queries.

Finally, calling all() on common_products executes the query and returns the result as a list of tuples, where each tuple represents a row from the result set.

Example: Extract

The SQL EXCEPT operator is used to return all the distinct rows selected by the first query but not by the second. In other words, it returns the difference between the two sets. In SQLAlchemy, you can use the except_ function to accomplish the same thing.

Here’s a sample use case. Let’s assume we have an e-commerce application with a Product model and a FeaturedProduct model, both of which have a name field. We want to find all products that are not featured products.

Here’s how you can do this:

from sqlalchemy import except_

and FeaturedProduct are models with a name field

all_products = db.session.query(Product.name)
featured_products = db.session.query(FeaturedProduct.name)

non_featured_products = all_products.except_(featured_products).all()

for product in non_featured_products:
    print(product.name)

In this code:

  1. We first create two query objects, all_products and featured_products, which select the names of all products and featured products, respectively.
  2. We then call the except_ method on the all_products query, passing in the featured_products query. This creates a new query that will select the names of all products that are not featured products.
  3. We call all() on the result to execute the query and retrieve the results as a list of tuples, where each tuple represents a row in the result set.
  4. Finally, we iterate over the result and print out the name of each non-featured product.

Common Table Expressions (CTEs)

A Common Table Expression (CTE) is a named temporary result set that you can reference within another SELECT, INSERT, UPDATE, or DELETE statement. CTEs can be thought of as alternatives to derived tables (subqueries in the FROM clause), views, and inline user-defined functions.

let’s use the Product and Order tables. Let’s assume that we have a many-to-many relationship between Product and Order tables, handled by an association table order_product, and we want to find out the total quantity ordered for each product.

First, let’s look at our tables:


class Product(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    name = db.Column(db.String(100))
    price = db.Column(db.Float)
    supplier_id = db.Column(db.Integer, db.ForeignKey('suppliers.id'), nullable=False)

    def __repr__(self):
        return f'<Product {self.name}>'


class Order(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    product_id = db.Column(db.Integer, db.ForeignKey('product.id'))
    quantity = db.Column(db.Integer)
    product = db.relationship('Product')
    product_orders = db.relationship('ProductOrder', backref='order')



class ProductOrder(db.Model):
    __tablename__ = 'product_order'

    order_id = db.Column(db.Integer, db.ForeignKey('order.id'), primary_key=True)
    product_id = db.Column(db.Integer, db.ForeignKey('product.id'), primary_key=True)
    quantity = db.Column(db.Integer)
    product = db.relationship('Product', backref='product_orders')

Here we have an Order model and a Product model, and ProductOrder model that acts as the association table, containing the quantity of each product in each order.

Now, let’s create a CTE that sums up the quantity for each product across all orders:

from sqlalchemy import func

# Define the CTE
product_sales_cte = db.session.query(
    ProductOrder.product_id,
    func.sum(ProductOrder.quantity).label('total_quantity')
).group_by(
    ProductOrder.product_id
).cte(name='product_sales_cte')

# Use the CTE in a query
stmt = db.session.query(
    Product.name,
    product_sales_cte.c.total_quantity
).join(
    product_sales_cte, Product.id == product_sales_cte.c.product_id
).all()

for result in stmt:
    print(f"Product: {result.name}, Total Quantity Ordered: {result.total_quantity}")

In this example, product_sales_cte is a CTE that calculates the total quantity ordered for each product. Then we join this CTE with the Product table to get the name of each product along with its total ordered quantity.

Yes. that was a lot. But we got here somehow. These advanced query techniques are powerful tools for managing and extracting data. Mastery of these concepts will enable you to write more efficient and effective queries. Happy querying!

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top