Association Tables
In our previous articles, we used an association table to model many-to-many
relationships between tables, such as the relationship between Department
and Employee
. In this article, we are going to dive deeper into the association table concept and see how we can use it to further solve more complicated problems.
DepartmentEmployeeLink and Extra Data
In our previous article, we created the following SQLAlchemy models:
import os from sqlalchemy import Column, DateTime, String, Integer, ForeignKey, func from sqlalchemy.orm import relationship, backref from sqlalchemy.ext.declarative import declarative_base Base = declarative_base() class Department(Base): __tablename__ = 'department' id = Column(Integer, primary_key=True) name = Column(String) employees = relationship( 'Employee', secondary='department_employee_link' ) class Employee(Base): __tablename__ = 'employee' id = Column(Integer, primary_key=True) name = Column(String) hired_on = Column(DateTime, default=func.now()) departments = relationship( Department, secondary='department_employee_link' ) class DepartmentEmployeeLink(Base): __tablename__ = 'department_employee_link' department_id = Column(Integer, ForeignKey('department.id'), primary_key=True) employee_id = Column(Integer, ForeignKey('employee.id'), primary_key=True)
- Migrate SQLAlchemy Databases with Alembic
- SQLAlchemy Expression Language, More Advanced Usage
- SQLAlchemy Expression Language, Advanced Usage
Notice that the DepartmentEmployeeLink
class contains two foreign key columns which are enough to model the many-to-many relationship between Department
and Employee
. Now let’s add one more column extra_data
and two more relationships department
and employee
.
import os from sqlalchemy import Column, DateTime, String, Integer, ForeignKey, func from sqlalchemy.orm import relationship, backref from sqlalchemy.ext.declarative import declarative_base Base = declarative_base() class Department(Base): __tablename__ = 'department' id = Column(Integer, primary_key=True) name = Column(String) employees = relationship( 'Employee', secondary='department_employee_link' ) class Employee(Base): __tablename__ = 'employee' id = Column(Integer, primary_key=True) name = Column(String) hired_on = Column(DateTime, default=func.now()) departments = relationship( Department, secondary='department_employee_link' ) class DepartmentEmployeeLink(Base): __tablename__ = 'department_employee_link' department_id = Column(Integer, ForeignKey('department.id'), primary_key=True) employee_id = Column(Integer, ForeignKey('employee.id'), primary_key=True) extra_data = Column(String(256)) department = relationship(Department, backref=backref("employee_assoc")) employee = relationship(Employee, backref=backref("department_assoc"))
With one more extra column and two more relationships on the DepartmentEmployeeLink
association model, we can store more information and be more liberal with how we want to use it. For example, suppose we have an employee John who works part-time in the IT department, we can insert the string ‘part-time’ into the column extra_data
and create an DepartmentEmployeeLink
object to represent this relationship.
>>> fp = 'orm_in_detail.sqlite' >>> if os.path.exists(fp): ... os.remove(fp) ... >>> from sqlalchemy import create_engine >>> engine = create_engine('sqlite:///association_tables.sqlite') >>> >>> from sqlalchemy.orm import sessionmaker >>> session = sessionmaker() >>> session.configure(bind=engine) >>> Base.metadata.create_all(engine) >>> >>> >>> IT = Department(name="IT") >>> John = Employee(name="John") >>> John_working_part_time_at_IT = DepartmentEmployeeLink(department=IT, employee=John, extra_data='part-time') >>> s = session() >>> s.add(John_working_part_time_at_IT) >>> s.commit()
Then, we can find John by querying the IT department or the DepartmentEmployeeLink
model.
>>> IT.employees[0].name u'John' >>> de_link = s.query(DepartmentEmployeeLink).join(Department).filter(Department.name == 'IT').one() >>> de_link.employee.name u'John' >>> de_link = s.query(DepartmentEmployeeLink).filter(DepartmentEmployeeLink.extra_data == 'part-time').one() >>> de_link.employee.name u'John'
Finally, adding an IT employee using the relationship Department.employees
still works, as shown in the previous article:
>>> Bill = Employee(name="Bill") >>> IT.employees.append(Bill) >>> s.add(Bill) >>> s.commit()
Linking Relationships with Backref
One common keyword argument we have used so far in relationship
definitions is backref
. A backref
is a common shortcut to place a second relationship()
onto the destination table. For example, the following code puts a second relationship()
“posts” onto the user
table by specifying a backref
on Post.owner
:
class User(Base): __tablename__ = 'user' id = Column(Integer, primary_key=True) name = Column(String(256)) class Post(Base): __tablename__ = 'post' id = Column(Integer, primary_key=True) owner_id = Column(Integer, ForeignKey('user.id')) owner = relationship(User, backref=backref('posts', uselist=True))
This is equivalent to the following definition:
class User(Base): __tablename__ = 'user' id = Column(Integer, primary_key=True) name = Column(String(256)) posts = relationship("Post", back_populates="owner") class Post(Base): __tablename__ = 'post' id = Column(Integer, primary_key=True) owner_id = Column(Integer, ForeignKey('user.id')) owner = relationship(User, back_populates="posts")
Now we have a one-to-many
relationship between User
and Post
. We can interact with these two models in the following way:
>>> s = session() >>> john = User(name="John") >>> post1 = Post(owner=john) >>> post2 = Post(owner=john) >>> s.add(post1) >>> s.add(post2) >>> s.commit() >>> s.refresh(john) >>> john.posts [, ] >>> john.posts[0].owner >>> john.posts[0].owner.name u'John'
One-to-One
Creating a one-to-one
relationship between models is very similar to creating many-to-one
relationships. By modifying the uselist
argument’s value to False
in a backref()
, we force the database models to be mapped to each other in a one-to-one
relationship.
class User(Base): __tablename__ = 'user' id = Column(Integer, primary_key=True) name = Column(String(256)) class Address(Base): __tablename__ = 'address' id = Column(Integer, primary_key=True) address = Column(String(256)) user_id = Column(Integer, ForeignKey('user.id')) user = relationship('User', backref=backref('address', uselist=False))
Then, we can use the models in the following way:
>>> s = session() >>> john = User(name="John") >>> home_of_john = Address(address="1234 Park Ave", user=john) >>> s.add(home_of_john) >>> s.commit() >>> s.refresh(john) >>> john.address.address u'1234 Park Ave' >>> john.address.user.name u'John' >>> s.close()
Relationship Update Cascades
In a relational database, referential integrity guarantees that when the primary key of a referenced object in a one-to-many
or many-to-many
relationship changes, the refering objects’ foreign keys that reference the primary key will change as well. However, for databases that do not support referential integrity, such as SQLite or MySQL with their referential integrity option turned off, changing the primary key values of a referenced object does not trigger updates of the refering objects. In this case, we can use the passive_updates
flag in relationship
or backref
to inform the database to execute extra SELECT and UPDATE statements that will update the values of the refering objects’ foreign keys.
In the following example, we construct a one-to-many
relationship between User
and Address
and not specifying the passive_updates
flag in the relationship. The database backend is SQLite.
class User(Base): __tablename__ = 'user' id = Column(Integer, primary_key=True) name = Column(String(256)) class Address(Base): __tablename__ = 'address' id = Column(Integer, primary_key=True) address = Column(String(256)) user_id = Column(Integer, ForeignKey('user.id')) user = relationship( 'User', backref=backref('addresses', uselist=True) )
Then, when we change the primary key value of a User
object, its Address
objects’ user_id
foreign key values will not change. Therefore, when you want to access an address
‘s user
object again, you will get an AttributeError
.
>>> s = session() >>> john = User(name='john') >>> home_of_john = Address(address='home', user=john) >>> office_of_john = Address(address='office', user=john) >>> s.add(home_of_john) >>> s.add(office_of_john) >>> s.commit() >>> s.refresh(john) >>> john.id 1 >>> john.id = john.id + 1 >>> s.commit() >>> s.refresh(home_of_john) >>> s.refresh(office_of_john) >>> home_of_john.user.name Traceback (most recent call last): File "", line 1, in AttributeError: 'NoneType' object has no attribute 'name' >>> s.close()
If we specify the passive_updates
flag in the Address
model, then we can change the primary key of john
and expect SQLAlchemy to issue extra SELECT and UPDATE statements to keep home_of_john.user
and office_of_john.user
up-to-date.
class User(Base): __tablename__ = 'user' id = Column(Integer, primary_key=True) name = Column(String(256)) class Address(Base): __tablename__ = 'address' id = Column(Integer, primary_key=True) address = Column(String(256)) user_id = Column(Integer, ForeignKey('user.id')) user = relationship( 'User', backref=backref('addresses', uselist=True, passive_updates=False) )
>>> s = session() >>> john = User(name='john') >>> home_of_john = Address(address='home', user=john) >>> office_of_john = Address(address='office', user=john) >>> s.add(home_of_john) >>> s.add(office_of_john) >>> s.commit() >>> s.refresh(john) >>> john.id 1 >>> john.id = john.id + 1 >>> s.commit() >>> s.refresh(home_of_john) >>> s.refresh(office_of_john) >>> home_of_john.user.name u'john' >>> s.close()
Summary
In this article, we dig a little deeper into SQLAlchemy’s association tables and the backref
keyword argument. It’s often crucial to understand the mechanism behind these two concepts to fully master complex join queries, as what will be shown in future articles.