SQLAlchemy Expression Language, Advanced Usage

Expression Language

One of the core components of SQLAlchemy is the Expression Language. It is allows the programmer to specify SQL statements in Python constructs and use the constructs directly in more complex queries. Since the expression language is backend-neutral and comprehensively covers every aspect of raw SQL, it is closer to raw SQL than any other component in SQLAlchemy. In this article, we are going to illustrate the power of the expression language using a three table database.

Database Models

Suppose we want to model multiple shopping carts, each of which is created by a user and stores multiple products. From the specification, we can deduce that a user owns multiple shopping carts, a shopping cart includes multiple products, and a product can be included in multiple shopping carts. Therefore, we want to establish a many-to-many relationship between ShoppingCart and Product, and another one-to-many between User and ShoppingCart. Let’s create the database models:

from sqlalchemy import Column, DateTime, String, Integer, ForeignKey, Float
from sqlalchemy.orm import relationship, backref
from sqlalchemy.ext.declarative import declarative_base
 
Base = declarative_base()
 
 
class User(Base):
    __tablename__ = 'user'
    id = Column(Integer, primary_key=True)
    name = Column(String)
 
 
class ShoppingCart(Base):
    __tablename__ = 'shopping_cart'
    id = Column(Integer, primary_key=True)
    owner_id = Column(Integer, ForeignKey(User.id))
    owner = relationship(
        User, backref=backref('shopping_carts', uselist=True)
    )
    products = relationship(
        'Product',
        secondary='shopping_cart_product_link'
    )
    def __repr__(self):
        return '( {0}:{1.owner.name}:{1.products!r} )'.format(ShoppingCart, self)
 
 
class Product(Base):
    __tablename__ = 'product'
    id = Column(Integer, primary_key=True)
    name = Column(String)
    # Using a Float is not the right way of modeling a currency value.
    # We will investigate that topic in a different article.
    price = Column(Float)
    shopping_carts = relationship(
        'ShoppingCart',
        secondary='shopping_cart_product_link'
    )
    def __repr__(self):
        return '( {0}:{1.name!r}:{1.price!r} )'.format(Product, self)
 
 
class ShoppingCartProductLink(Base):
    __tablename__ = 'shopping_cart_product_link'
    shopping_cart_id = Column(Integer, ForeignKey('shopping_cart.id'), primary_key=True)
    product_id = Column(Integer, ForeignKey('product.id'), primary_key=True)
 
 
from sqlalchemy import create_engine
engine = create_engine('sqlite:///')
 
 
from sqlalchemy.orm import sessionmaker
DBSession = sessionmaker()
DBSession.configure(bind=engine)
Base.metadata.create_all(engine)

Create Users, Products and Shopping Carts

Now let’s create one user and several products.

>>> session = DBSession()
>>> cpu = Product(name='CPU', price=300.00)
>>> motherboard = Product(name='Motherboard', price=150.00)
>>> coffee_machine = Product(name='Coffee Machine', price=30.00)
>>> john = User(name='John')
>>> session.add(cpu)
>>> session.add(motherboard)
>>> session.add(coffee_machine)
>>> session.add(john)
>>> session.commit()
>>> session.close()

Before progressing further, let’s verify that we have one user and three products in the database now.

>>> session = DBSession()
>>> cpu = session.query(Product).filter(Product.name == 'CPU').one()
>>> motherboard = session.query(Product).filter(Product.name == 'Motherboard').one()
>>> coffee_machine = session.query(Product).filter(Product.name == 'Coffee Machine').one()
>>> john = session.query(User).filter(User.name == 'John').one()
>>> session.close()

Now we can create two shopping carts for user John.

>>> session = DBSession()
>>> cpu = session.query(Product).filter(Product.name == 'CPU').one()
>>> motherboard = session.query(Product).filter(Product.name == 'Motherboard').one()
>>> coffee_machine = session.query(Product).filter(Product.name == 'Coffee Machine').one()
>>> john = session.query(User).filter(User.name == 'John').one()
>>> john_shopping_cart_computer = ShoppingCart(owner=john)
>>> john_shopping_cart_kitchen = ShoppingCart(owner=john)
>>> john_shopping_cart_computer.products.append(cpu)
>>> john_shopping_cart_computer.products.append(motherboard)
>>> john_shopping_cart_kitchen.products.append(coffee_machine)
>>> session.add(john_shopping_cart_computer)
>>> session.add(john_shopping_cart_kitchen)
>>> session.commit()
>>> session.close()

Use the Expression Language to Query the Database

Now we have one user, three products and two shopping carts in the database, we can start playing with the expression language. First, let’s write a query to answer the question: which products’ prices are higher than $100.00?

>>> product_higher_than_one_hundred = select([Product.id]).where(Product.price > 100.00)
>>>
>>> session = DBSession()
>>> session.query(Product).filter(Product.id.in_(product_higher_than_one_hundred)).all()
[( :u'CPU':300.0 ), ( :u'Motherboard':150.0 )]
>>> session.close()

Then, let’s write a query to answer a more complicated question: which shopping carts contain at least one product whose price is higher than $100.00?

>>> shopping_carts_with_products_higher_than_one_hundred = select([ShoppingCart.id]).where(
...     ShoppingCart.products.any(Product.id.in_(product_higher_than_one_hundred))
... )
>>> session = DBSession()
>>> session.query(ShoppingCart).filter(ShoppingCart.id.in_(shopping_carts_with_products_higher_than_one_hundred)).one()
( :John:[( :u'CPU':300.0 ), ( :u'Motherboard':150.0 )] )
>>> session.close()

Then, let’s write a query to answer a slightly different question: which shopping carts contain no product whose price is lower than $100.00?

>>> products_lower_than_one_hundred = select([Product.id]).where(Product.price < 100.00)
>>> from sqlalchemy import not_
>>> shopping_carts_with_no_products_lower_than_one_hundred = select([ShoppingCart.id]).where(
...     not_(ShoppingCart.products.any(Product.id.in_(products_lower_than_one_hundred)))
... )
>>> session = DBSession()
>>> session.query(ShoppingCart).filter(ShoppingCart.id.in_(
...     shopping_carts_with_no_products_lower_than_one_hundred)
... ).all()
[( :John:[( :u'CPU':300.0 ), ( :u'Motherboard':150.0 )] )]
>>> session.close()

Or the previous question can be formed in a different way: how can we find the shopping carts all of whose products have a price higher than $100.00?

>>> from sqlalchemy import and_
>>> shopping_carts_with_all_products_higher_than_one_hundred = select([ShoppingCart.id]).where(
...     and_(
...         ShoppingCartProductLink.product_id.in_(product_higher_than_one_hundred),
...         ShoppingCartProductLink.shopping_cart_id == ShoppingCart.id
...     )
... )
>>> session = DBSession()
>>> session.query(ShoppingCart).filter(ShoppingCart.id.in_(
...     shopping_carts_with_all_products_higher_than_one_hundred)
... ).all()
[( :John:[( :u'CPU':300.0 ), ( :u'Motherboard':150.0 )] )]
>>> session.close()

Now, we can ask a different kind of questions regarding aggregation over Product.price. For example, we can ask: which shopping carts’ total price of the products is higher than $200.00?

>>> from sqlalchemy import func
>>> total_price_of_shopping_carts = select([
...     ShoppingCart.id.label('shopping_cart_id'),
...     func.sum(Product.price).label('product_price_sum')
... ]).where(
...     and_(
...         ShoppingCartProductLink.product_id == Product.id,
...         ShoppingCartProductLink.shopping_cart_id == ShoppingCart.id,
...     )
... ).group_by(ShoppingCart.id)
>>> session = DBSession()
>>> session.query(total_price_of_shopping_carts).all()
[(1, 450.0), (2, 30.0)]
>>> session.query(ShoppingCart).filter(
...     ShoppingCart.id == total_price_of_shopping_carts.c.shopping_cart_id,
...     total_price_of_shopping_carts.c.product_price_sum > 200.00
... ).all()
[( :John:[( :u'CPU':300.0 ), ( :u'Motherboard':150.0 )] )]
>>> session.query(ShoppingCart).filter(
...     ShoppingCart.id == total_price_of_shopping_carts.c.shopping_cart_id,
...     total_price_of_shopping_carts.c.product_price_sum < 200.00
... ).all()
[( :John:[( :u'Coffee Machine':30.0 )] )]
>>> session.close()

In the previous example, we begin with constructing an SQLAlchemy selectable total_price_of_shopping_carts whose ‘columns’ are ShoppingCart.id of each shopping cart and the sum of all the products’ prices in each corresponding shopping cart. Once we have such a selectable, it’s easy to write the query that finds all the shopping carts whose sum of products’ prices are higher than $200.00.

Potential Gotchas

So far, our example program seems to be working pretty well. But if we accidentally break the program by writing and using the constructs in unintended ways? Will SQLAlchemy inform us about what’s wrong with our program so we can debug it?

For example, the column Product.price is defined as a Float. What if we create a Product object with a price that’s a string? Will SQLAlchemy break because the data type of the input for price is different from the definition? Let’s give it a try.

>>> session = DBSession()
>>> cpu = Product(name='CPU', price='0.15')
>>> session.add(cpu)
>>> session.commit()
>>> cpu = session.query(Product).filter(Product.name == 'CPU').one()
>>> cpu.price
0.15

So, the product CPU with a string price was inserted successfully into the database. How about using a string for the price that is not a number at all?

>>> cpu_two = Product(name='CPU Two', price='asdf')
>>> session.add(cpu_two)
>>> session.commit()
...
sqlalchemy.exc.StatementError: could not convert string to float: asdf (original cause: ValueError: could not convert string to float: asdf) u'INSERT INTO product (name, price) VALUES (?, ?)' [{'price': 'asdf', 'name': 'CPU Two'}]

Oops. Now SQLAlchemy raises a StatementError because “asdf” cannot be converted into a Float. This is a good feature since it eliminates potential programming errors caused by carelessness.

You might also notice that the filter() method in our example use expressions such as Product.name == 'CPU' and Product.price > 100.0. Aren’t these expressions being evaluated first and then the resulting Boolean values are passed into the filter() function to get actual filter results? Let’s use several examples to verify the behaviour of filter().

>>> session.query(Product).filter(True).all()
[( :u'CPU':300.0 ), ( :u'Motherboard':150.0 ), ( :u'Coffee Machine':30.0 )]
>>> session.query(Product).filter(Product.name='CPU').all()
  File "", line 1
SyntaxError: keyword can't be an expression
>>> session.query(Product).filter(Product.price > '100.0').all()
[( :u'CPU':300.0 ), ( :u'Motherboard':150.0 )]

From the examples above, we see that filter() does accept straightforward Boolean values like True, which returns all products in the database. However, it does not accept expression like Product.name = 'CPU' whose meaning is unclear in the context of a filter. Like the Product constructor, it also converts a String value '100.0' into a float and filter the product table based on the final criteria.

Now let’s examine several cases where the SQLAlchemy API might look a bit less than intuitive. First, the select() statement seems to only accept a list of columns as the first argument like select([Product.id]). Wouldn’t it be nice if we can write something like select(Product.id)?

>>> products_lower_than_one_hundred = select(Product.id).where(Product.price < 100.00)
...
NotImplementedError: Operator 'getitem' is not supported on this expression

Oops. SQLAlchemy does not like a single element as the first argument of select(). Remember, always pass in a list.

Second, the syntax in some of the where clauses do not look Pythonic: ShoppingCart.products.any(Product.id.in_(product_higher_than_one_hundred)). Wouldn’t it be nice if we can write something like ShoppingCart.products.any(Product.id in product_higher_than_one_hundred))?

>>> shopping_carts_with_products_higher_than_one_hundred = select([ShoppingCart.id]).where(
...     ShoppingCart.products.any(Product.id in product_higher_than_one_hundred)
... )
...
TypeError: argument of type 'Select' is not iterable

Since SQLAlchemy’s 'Select' objects are not iterable, using it in a in context does not work. It may look like a downside, but it makes sense since a 'Select' object is very flexible in SQLAlchemy. As what’s shown in the examples, a 'Select' object can be passed around into any filter() or where() to become part of a more complex query or event another 'Select' object. Supporting iterable on such an object would require lots of changes to the underlying implementation.

Third, the results of query() seem to be returning objects that are formatted nicely, such as ( :u'CPU':300.0 ) as the display for one Product object. It looks different from a typical object like:

>>> class C:
...   pass
...
>>> c = C()
>>> c

Why? It’s because we overrode the __repr__() method of Product and the print() command from the Python interpreter is calling repr() on the resulting array of Product and ShoppingCart objects which call __repr__() of each corresponding class’s implementation.

Finally, why does SQLAlchemy implement their own Float column type? Why can’t they re-use Python’s internal float type?

Well, the short answer is that SQLAlchemy is an ORM and an ORM maps Python constructs into SQL constructs using a defined type system, and the type system has to be database-agnostic, which means it has to handle different database backends with the same column type. And the long answer is that every column defined in a model has to be defined by SQLAlchemy and the definition / column type implements custom methods that are called by the SQLAlchemy’s lower level API to convert a Python construct into a corresponding SQL construct.

Tips and Summary

In this article, we used a three-table database to illustrate how to use SQLAlchemy’s Expression Language. One thing to keep in mind is that we use mathematical sets to guide us on writing SQL queries. Whenever we encounter a question that’s not trivial, especially a question involving multiple tables, we should divide and conquer the question by answering parts of the question first. For example, the question ‘how can we find all the shopping carts whose products’ sum of prices are higher than $200.00′ can be divided into the following ones: 1. how can we calculate the sum of products’ prices? (func.sum()) 2. how can we list all the tuples (ShoppingCart.idfunc.sum(Product.price)) in a selectable? 3. how can we use the selectable to write the actual query?

Leave a Reply

Your email address will not be published. Required fields are marked *