Alembic Offline Mode with Data Migration

Lukas J. Wensby, 2019-11-24

In this article, I will go over one way of starting to use Alembic with a database already populated with data, to which we have limited access. This is sometimes the case, that we can't rely on connecting to the database for doing doing our alembic revisions, and so on. This means that we will be using Alembic's "Offline Mode", but how do we setup the revisions, if we have no access to the database? How do we ensure our data are safe? This is something I will be exploring in this article.

Disclaimer: This is my first foray into Alembic, meaning I'm probably making several mistakes along the way. For consistency, and easy replicability, I'm running this tutorial as root in a Docker Ubuntu container (Image ID: 7698f282e524).

Setup

We will first have to setup a small app with SQLAlchemy to work with as our starting point. Let's install our first set of dependencies. First we update the packages list.

/# apt-get update

Vim

I'll be using vim to easily edit my files.

/# apt-get install vim

Python and pip

In order to apt-get python, we need to add another apt repository... In order to add another apt repository, we need to install the software-properties-common package.

/# apt install software-properties-common

This provides us with useful scripts, such as add-apt-repository which we're gonna use to add the apt repository from where we can get Python.

/# add-apt-repository ppa:deadsnakes/ppa

Now, we can go ahead and install Python, pip and Python venv.

/# apt-get install python3.7 python3-pip python3.7-venv

PostgreSQL

Specific "snapshot" versions of PostgreSQL packages are available from the default Ubuntu repository, but we want to use the latest version. Therefore, we have to add the apt repository suggested by PostgreSQL:

/# apt-get install wget ca-certificates

This will give us wget, which is a tool for getting files from the internet, as well as ca-certificates, containing some common certificate authorities certificates.

/# wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | apt-key add -

Now we can add the repository.

/# sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt/ `lsb_release -cs`-pgdg main" >> /etc/apt/sources.list.d/pgdg.list'

Retrieve a new list of packages from our added repositories and finally install PostgreSQL:

/# apt-get update
...
/# apt-get install postgresql postgresql-contrib

Now, let's start the postgresql service:

/# service postgresql start

And make sure that we can connect to it. In order to do this, we need to make a small adjustment to the pg_hba.conf file. Make the following adjustment:

/etc/postgresql/12/main/pg_hba.conf

- local   all             postgres                                peer
+ local   all             postgres                                trust

Note, that this is only for making things as easy for us as possible for this tutorial. In any other case, I strongly discourage trusting anyone connecting to the postgres database. After this, restart the postgresql service.

/# service postgresql restart

Now it should work to connect to the default PostgreSQL database like so:

/# psql -U postgres -d postgres
psql (12.1 (Ubuntu 12.1-1.pgdg18.04+1))
Type "help" for help.

postgres=#

Great, it works. Now let's disconnect from the database, either by typing \q or with CTRL+D.

Python App

Let's create our project directory we will be using for this tutorial.

/# mkdir project
/# cd project

Create a Python virtual environment, activate it, and install the dependencies we need.

/project# python3.7 -m venv venv
/project# source venv/bin/activate
(venv) /project# pip install SQLAlchemy==1.3.11 pyscopg2-binary==2.8.4

Now, let's create our simple app.py file with "schema" declaration:

/project/app.py

from sqlalchemy import create_engine, Column, Integer, String
from sqlalchemy.ext.declarative import declarative_base

engine = create_engine('postgresql://postgres@/postgres', echo=True)
Base = declarative_base()

class Game(Base):
  __tablename__ = 'game'
  id = Column(Integer, primary_key=True)
  name = Column(String, nullable=False)

Let's populate the database with some data. Open up the python interpreter and do the following:

(venv) /project# python
Python 3.7.5 (default, Oct 15 2019, 22:40:07)
[GCC 7.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from app import Base, engine, Game
>>> from sqlalchemy.orm import sessionmaker
>>> Base.metadata.create_all(engine)
2019-11-24 12:30:22,437 INFO sqlalchemy.engine.base.Engine select version()
...
2019-11-24 12:30:22,525 INFO sqlalchemy.engine.base.Engine COMMIT
>>> session = sessionmaker(bind=engine)()
>>> session.add(Game(name='Grim Fandango'))
>>> session.add(Game(name='Age of Mythology'))
>>> session.commit()
2019-11-24 12:32:47,881 INFO sqlalchemy.engine.base.Engine BEGIN (implicit)
...
2019-11-24 12:32:47,893 INFO sqlalchemy.engine.base.Engine COMMIT

Great, exit the interpreter and verify that the data is now in the database by again connect to the database and select all rows from the game table.

(venv) /project# psql -U postgres -d postgres
sql (12.1 (Ubuntu 12.1-1.pgdg18.04+1))
Type "help" for help.

postgres=# select * from game;
 id |       name
----+------------------
  1 | Grim Fandango
  2 | Age of Mythology
(2 rows)

We have now setup everything we need to start experimenting with adding Alembic and start using the migration tools.

Alembic

Our goal now is to setup Alembic so that it's starting state takes into account that we've already constructed the database and filled it with data, so that when we upgrade the database with our revisions, it's not going to do anything nasty, and will keep our data safe. We will also work as if we don't have direct access to the database, which is sometimes the case. Alembic will have to work in the dark. Obviously, we will not be able to autogenerate our migrations, since that requires a connection to the database for Alembic to compare against the SQLAlchemy table metadata in the application. Let's begin by installing Alembic:

(venv) /project# pip install alembic==1.3.1

Create the alembic environment:

(venv) /project# alembic init alembic

Adding game release dates

Let's add a new column to our game table containing the date of the game's release. Alongside adding the column in the declarative code in the Flask application, we also need to create a Alembic revision with an upgrade and downgrade function that mimics these changes.

/project/app.py

from sqlalchemy import create_engine, Column, Integer, String, Date
from sqlalchemy.ext.declarative import declarative_base

engine = create_engine('postgresql://postgres@/postgres', echo=True)
Base = declarative_base()

class Game(Base):
  __tablename__ = 'game'
  id = Column(Integer, primary_key=True)
  name = Column(String, nullable=False)
  release_date = Column(Date)

Now, create a new Alembic revision:

(venv) /project# alembic revision -m "add release date"
  Generating /project/alembic/versions/bd97e753c53e_add_release_date.py ...  done

Modify this revision:

/project/alembic/versions/bd97e753c53e_add_release_date.py

...
def upgrade():
    op.add_column('game', 
        sa.Column('release_date', sa.Date))


def downgrade():
    op.drop_column('game', 'release_date')
...

We want to check that this would generate the desired SQL. In order for Alembic to work it needs to know to generate PostgreSQL, so we would need to adjust the sqlalchemy.url in alembic.ini file like so:

/project/alembic.ini

...
sqlalchemy.url = postgresql:///irrelevant
...

Then, if we would run alembic upgrade bd97e753c53e --sql, we see that it generates what we expect:

INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO  [alembic.runtime.migration] Generating static SQL
INFO  [alembic.runtime.migration] Will assume transactional DDL.
BEGIN;

CREATE TABLE alembic_version (
    version_num VARCHAR(32) NOT NULL,
    CONSTRAINT alembic_version_pkc PRIMARY KEY (version_num)
);

INFO  [alembic.runtime.migration] Running upgrade  -> bd97e753c53e, add release date
-- Running upgrade  -> bd97e753c53e

ALTER TABLE game ADD COLUMN release_date DATE;

INSERT INTO alembic_version (version_num) VALUES ('bd97e753c53e');

COMMIT;

Let's run this SQL in our database to upgrade it.

(venv) /project# alembic upgrade bd97e753c53e --sql | psql -U postgres -d postgres
INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO  [alembic.runtime.migration] Generating static SQL
INFO  [alembic.runtime.migration] Will assume transactional DDL.
BEGIN
INFO  [alembic.runtime.migration] Running upgrade  -> bd97e753c53e, add release date
CREATE TABLE
ALTER TABLE
INSERT 0 1
COMMIT

And now, let's add some release dates to our games. Open up the Python interpreter and do the following:

>>> from app import engine, Game
>>> from sqlalchemy.orm import sessionmaker
>>> from datetime import date
>>> session = sessionmaker(bind=engine)()
>>> grim_fandango = session.query(Game).filter_by(name='Grim Fandango').first()
...
>>> grim_fandango.release_date = date(1998, 10, 30)
>>> age_of_mythology = session.query(Game).filter_by(name='Age of Mythology').first()
...
>>> age_of_mythology.release_date = date(2002, 10, 30)
>>> session.commit()
...

Cool, if we now inspect our database, we see that:

postgres=# select * from game;
 id |       name       | release_date
----+------------------+--------------
  1 | Grim Fandango    | 1998-10-30
  2 | Age of Mythology | 2002-10-30
(2 rows)