The Differences between joinedload(), subqueryload() and contains_eager

contains_eager()

is used in SQLAlchemy ORM when you’ve already joined the data you want to load, and you want to inform the ORM about this. This way, the ORM won’t need to send extra queries to the database to retrieve the related data.

Let’s look at an example in Flask-SQLAlchemy:

from flask_sqlalchemy import SQLAlchemy
from sqlalchemy.orm import contains_eager

db = SQLAlchemy()

class User(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    name = db.Column(db.String)

class Order(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    user_id = db.Column(db.Integer, db.ForeignKey('user.id'))
    user = db.relationship('User', backref='orders')

# when querying for orders, also join users
orders = db.session.query(Order).join(User)
                   .options(contains_eager(Order.user)).all()

In this example, when we query for Order, we also join the User and inform the ORM using contains_eager(). Now, when you access order.user for any Order object in orders, SQLAlchemy ORM will not issue a new query to fetch the user because it knows the user has already been eagerly loaded.

When you define a relationship in SQLAlchemy, the ORM automatically knows how to load related objects. However, by default, it does this using a loading technique known as “lazy loading”. Under lazy loading, the related objects are not loaded until they are actually accessed. This means that in your example, even though you are joining User in the query, SQLAlchemy won’t actually load the User objects when the Order objects are fetched. Instead, it will issue a new query to fetch a User object every time you access the user attribute of an Order object for example Order.user.name

The contains_eager() option changes this behavior. It tells SQLAlchemy that the join should be used to load the User objects eagerly, i.e., at the same time as the Order objects. This way, when you access the user attribute of an Order object, SQLAlchemy won’t need to issue a new query, because the User object has already been loaded.

This can be a significant performance optimization if you’re going to access the user attribute of every Order, because it can drastically reduce the number of queries that need to be issued. But if you’re only going to access the user attribute of a few Order objects, lazy loading might be more efficient, because it avoids loading User objects that you don’t need.

Joinedload()

Both joinedload and subqueryload are options for eager loading, which is a concept in SQLAlchemy where you load all the data you need from the database at once, as opposed to lazy loading, where additional data is queried as it’s needed. Here’s how they differ:

joinedload is used when you want SQLAlchemy to automatically join the related table and load the related objects in the same query when you load the main object. It is a form of eager loading. For example

users = User.query.options(joinedload(User.orders)).all()

This will fetch all users and their related orders in a single query, using a SQL JOIN.

contains_eager, on the other hand, is used when you are manually joining the related table in the query, but you still want SQLAlchemy to treat the related objects as being eagerly loaded.

For example:

users = User.query.join(User.orders).options(contains_eager(User.orders)).all()

In this case, you are manually joining the Order table, but you still want the orders for each user to be treated as if they were eagerly loaded.

In general, use joinedload when you want SQLAlchemy to take care of the join and loading the related objects, and use contains_eager when you are handling the join yourself but still want the related objects to be treated as eagerly loaded.

Joinedlod is efficient in terms of the number of queries (since everything is loaded in a single query), but it can create very large result sets because each row of the main table is repeated for each related object.

subqueryload()

This results in a separate SQL query for each relation, but these queries are constructed using a subquery so that only one round trip to the database is required. The main advantage of subqueryload over joinedload is that it does not result in duplication of the main object data. However, it can be slower for small numbers of objects because of the overhead of the additional SQL queries.

For example:

users = User.query.options(subqueryload(User.orders)).all()

This would fetch all users in one query, and all related orders for those users in a second query, using a subquery to match the orders to the users.

In general, joinedload is most useful when loading a single object and its related objects, and subqueryload is often better when loading multiple objects and their related objects. However, the best choice can depend on the specific structure and data distribution of your database.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top