Migrating Databases with Ecto 25 Nov 2022
On a new project I'm working to migrate data between Azure SQL (aka MSSQL) and PostgreSQL, and to migrate a business's software from Visual Basic and C# to Elixir. In the first few weeks of development I've discovered some features of Ecto that I was previously unaware of. Combining these features will allow us to ship a resilient, well-tested application while constraining the possibility of accidentally altering data in production.
defmodule Test.Integration.CrossRepoTransactions do
use Test.DataCase, async: true
test "sharing database connections" do
{:ok, alice} =
Test.Fixtures.azure_person(:alice)
|> Azure.Person.changeset()
|> Azure.WriteableRepo.insert!()
assert {:ok, _} = Azure.ReadOnlyRepo.get(Azure.Person, alice.id)
end
end
Caveats
I in no way wish to imply that Visual Basic or C# codebases cannot be resilient or well tested, nor that PostgreSQL and Elixir are inherently better than Azure SQL and the dotnet ecosystem. The existing codebases for this company, however, do not have tests, and the team looking to extend the product functionality do not have the technical expertise to make them comfortable developing (or operating) the existing stack.
In this case, with this team, a re-write and a data migration made the most sense.
Ecto and Azure SQL
Upon being first introduced the project, I did a quick search of hex.pm to find an Ecto adapter for mssql, finding several libraries that had not been updated in several years, with download counts implying minimal usage. Uh oh, I thought. This is going to be difficult.
Then my coworker mentioned that an adapter shipped with Ecto itself. Lo and
behold, the ecto_sql
package includes
Ecto.Adapters.Tds, which can be
used to connect to MSSQL Server or Azure SQL. This requires the addition of another
library, tds
.
defmodule MyApp.MixProject do
use Mix.Project
# ...
defp deps do
[
# ... other dependencies``
{:ecto, "~> 3.9"},
{:ecto_sql, "~> 3.9"},
{:tds, "~> 2.3"}
]
end
end
Leaving our primary PostgreSQL repo alone, we'll add a second repo at lib/azure/repo.ex
:
defmodule Azure.Repo do
@moduledoc "An `Ecto.Repo` connecting to Azure SQL"
use Ecto.Repo,
adapter: Ecto.Adapters.Tds,
otp_app: :my_app
end
This will require that we add some configuration to a few files. Note that we get the
password from the environment; this will compile the password into the application, but
since it's only in development and test I don't worry too much about it. For production,
we do not set the password in config/prod.exs
, but instead pull in a database
URL in config/runtime.exs
, to ensure that no secrets are compiled into our production
application.
# config/config.exs
config :my_app, ecto_repos: [Core.Repo, Azure.Repo]
config :my_app, Azure.Repo, priv: "priv/azure_repo"
# config/dev.exs, config/test.exs
config :my_app, Azure.Repo,
database: "my_app_#{config_env()}",
hostname: "localhost",
password: System.get_env("MSSQL_SA_PASSWORD"),
pool_size: 10,
port: 1433,
show_sensitive_data_on_connection_error: true,
stacktrace: true,
username: "sa"
The :ecto_repos
configuration determines how mix tasks will behave. mix ecto.gen.migration
will create migrations in two directories, priv/repo/migrations
and priv/azure_repo/migrations
.
mix ecto.migrate
and mix ecto.rollback
will run against both directories unless you
specify -r Core.Repo
. Depending on how you run production migrations, you might want to think
about which repos are configured. We do not use (or even have) mix in production environments, as we
ship Elixir releases; we hard-code specific repos in our
release module.
We also have to start our new repo in lib/core/application.ex
:
defmodule Core.Application do
@moduledoc false
use Application
@impl true
def start(_type, _args) do
children =
[
Web.Telemetry,
Core.Repo,
Azure.Repo,
{Phoenix.PubSub, name: Core.PubSub},
{Finch, name: Core.Finch},
Web.Endpoint
]
opts = [strategy: :one_for_one, name: Core.Supervisor]
Supervisor.start_link(children, opts)
end
# ...
end
But what about running the database itself during our development workflow? Happily,
Microsoft provides a docker image that runs on both amd64
and aarch64
, allowing
us to run it on our M1 Macs as well as in GitHub Actions. The README for the docker
image suggests that it includes a command line tool sqlcmd
, but it only compiles on
amd64
, and thus is not available in the image version that we use on our Macs.
docker run --rm \
--cap-add SYS_PTRACE \
-e 'ACCEPT_EULA=1' \
-e "MSSQL_SA_PASSWORD=${MSSQL_SA_PASSWORD}" \
-p 1433:1433 \
-v $PWD/priv/mssql/data:/var/opt/mssql/data \
-v $PWD/priv/mssql/log:/var/opt/mssql/log \
--name azuresqledge \
--detach \
"mcr.microsoft.com/azure-sql-edge"
name: Test & Audit
on:
push:
branches:
- main
pull_request:
branches:
- main
env:
ELIXIR_VERSION: "1.14.2"
OTP_VERSION: "25.1.2"
MSSQL_SA_PASSWORD: "Super-secure-password"
jobs:
test:
name: Test
needs: build_test
runs-on: ubuntu-20.04
env:
MIX_ENV: test
services:
postgres:
image: postgres:15.1-alpine
env:
POSTGRES_PASSWORD: postgres
POSTGRES_USER: postgres
options: >-
--health-cmd pg_isready
--health-interval 10s
--health-timeout 5s
--health-retries 5
ports:
- 5432:5432
mssql:
image: mcr.microsoft.com/azure-sql-edge
env:
ACCEPT_EULA: 1
MSSQL_SA_PASSWORD: ${{ env.MSSQL_SA_PASSWORD }}
options: >-
--cap-add SYS_PTRACE
ports:
- 1433:1433
steps:
- uses: actions/checkout@v3
- name: Set up Elixir
uses: erlef/setup-beam@v1
with:
elixir-version: ${{ env.ELIXIR_VERSION }}
otp-version: ${{ env.OTP_VERSION }}
version-type: "strict"
- name: Cache deps
uses: actions/cache@v3
with:
path: deps
key: ${{ runner.os }}-test-deps-v1-${{ hashFiles('**/mix.lock', '.tool-versions') }}
- name: Cache _build
uses: actions/cache@v3
with:
path: _build
key: ${{ runner.os }}-test-build-v1-${{ hashFiles('**/mix.lock', '.tool-versions') }}
- name: Run tests
run: mix test --color --include external:true
working-directory: .
Outside of docker, we can install a new-ish (at the time of writing) Golang port of sqlcmd
.
It behaves slightly differently from the original version of the tool, but allows us to
specify the password with $SQLCMDPASSWORD
.
brew install sqlcmd
SQLCMDPASSWORD="${MSSQL_SA_PASSWORD}" sqlcmd -U sa
1> SELECT 1;
1> GO
-----------
1
(1 row affected)
Because we only start this container for this project and we use the same environment variable
everywhere we connect to the database, we commit the password into .envrc
, which is loaded by
the direnv
tool that we use in development.
Read-only Repo
<aside>
I prefer iterative development plans. Where possible, I especially like development plans where deadlines can be fuzzed and put into the hands of the key project stakeholders. Why say that we will go live on a specific date, when experience tells us that unforseen complications will invariably arise as a part of development? Data will turn out to be different than expected. External integrations will be more complicated and error prone than described in documentation (if documentation exists at all).
Instead of go-live deadlines, I prefer to commit to making things available to stakeholders as soon as possible. Starting a new project? Set up a staging and production deployment on day one. Can't set it up on day one? Try to deploy on day two. There's nothing in the application to see? Who cares. Deploy it. Set up continual deployment to the staging environment from CI, even if a manual step is used to deploy production.
Allow the stakeholders direct access to the new app or the new features, possibly through feature flags that can be enabled only for them. In the case of a new application, this can as simple as deploying it on a subdomain that is not linked anywhere, with authentication only available to specific users. This might make authentication flows higher priority to implement earlier.
When migrating to a new tech stack, this can allow you to run both applications... the people doing the work can use both workflows, and tell you when the new application is ready to replace the old one. Until then, they can use the old application for their primary work.
</aside>
Our new application will start by taking over only the very end of the business's workflows, consuming data from Azure SQL but leaving alone the processes that update data. After the key stakeholders decide that the new application is complete enough to use instead of their old software, we can decommission one of their old applications, and begin working up the chain of business processes.
This means that, at least in the near term, we would prefer that our connection to Azure SQL in production be read-only. In development and tests, we'll need to write data into our local database, but if we ever accidentally ship code into production that alters data in the old database, I would like an exception to be raised.
Ecto gives this to us with the read_only
flag when defining the repo.
defmodule Azure.Repo do
@moduledoc "A read-only `Ecto.Repo` connecting to Azure SQL"
use Ecto.Repo,
adapter: Ecto.Adapters.Tds,
otp_app: :my_app,
read_only: true
end
By default when a module use
s Ecto.Repo
, it defines not only read callbacks such as
all
, get
, and one
, but write callbacks such as delete
, insert
, and update
. When
specifying that a repo is read_only: true
, only the callbacks that do not alter data are
added. One cannot accidentally alter data via changesets, because the functions to do so
are not present.
iex(1)> Azure.Person.changeset(%{}) |> Azure.Repo.insert()
** (UndefinedFunctionError) function Azure.Repo.insert/1 is undefined or private
(my_app 0.1.0) Azure.Repo.insert(#Ecto.Changeset<action: nil, changes: %{}, errors: [], data: #Azure.Person<>, valid?: true>)
iex:1: (file)
If someone really wanted to cause a problem, they could do so with Azure.Repo.query/1
,
so we take the additional precaution of only providing read-only credentials to the staging
and production environments.
These days, I might ship something using query
to quickly experiment with data where I
can't figure out how to shape the query using Ecto.Query
macros, but I prefer to eventually
refactor towards using the macros, and I do not alter data using raw queries.
But because migrations are inserted into the schema_migrations
table using the repo's
insert
function, we can no longer run migrations for our azure repo.
# config/config.exs
config :my_app, ecto_repos: [Core.Repo]
Seeds and Development
Now that our application code is only able to read from the database, we run into a new problem: how can I write seed data into the database for development?
Here is where I really appreciate that Ecto allows you to define multiple repo modules.
defmodule Azure.WriteRepo do
@moduledoc "A read-write `Ecto.Repo` connecting to Azure SQL. Only started in dev/test."
use Ecto.Repo,
adapter: Ecto.Adapters.Tds,
otp_app: :my_app
end
We can bring back the ability to run migrations, optionally specifying the directory into which our migrations will reside. As we develop more parts of our data migration code, we'll dump the schemas of more and more tables into migrations, to ensure that our development and test databases run against a matching database structure.
# config/config.exs
config :my_app, ecto_repos: [Core.Repo, Azure.WriteRepo]
config :my_app, Azure.WriteRepo, priv: "priv/azure_repo"
# config/dev.exs, config/test.exs
for repo <- [Azure.Repo, Azure.WriteRepo] do
config :my_app, repo,
database: "my_app_#{config_env()}",
hostname: "localhost",
password: System.get_env("MSSQL_SA_PASSWORD"),
pool_size: 10,
port: 1433,
show_sensitive_data_on_connection_error: true,
stacktrace: true,
username: "sa"
end
We can ensure that only the read-only repo is available in production by conditionally starting it in our application.
defmodule Core.Application do
@moduledoc false
use Application
@impl true
def start(_type, _args) do
children =
[
Web.Telemetry,
ecto_repos(),
{Phoenix.PubSub, name: Core.PubSub},
{Finch, name: Core.Finch},
Web.Endpoint
]
|> List.flatten()
opts = [strategy: :one_for_one, name: Core.Supervisor]
Supervisor.start_link(children, opts)
end
defp ecto_repos, do: Application.fetch_env!(:my_app, :started_ecto_repos)
# ...
end
# config/dev.exs, config/test.exs
config :my_app, started_ecto_repos: [Core.Repo, Azure.Repo, Azure.WriteRepo]
# config/prod.exs
config :my_app, started_ecto_repos: [Core.Repo, Azure.Repo]
We still have the read-only database credentials as a final protection, but if someone were to ship code that tries to use the read/write repo in production, Ecto will raise with a clear error signaling that the repo has not been started.
Async Tests and the SQL Sandbox
Now that we have two separate repos, one for read-only production access, and one for seeding data in development and tests, we have a new problem: how do we write async tests?
Ecto provides us with the Ecto.Adapters.SQL.Sandbox
, which wraps each ExUnit
test in a database transaction, providing data isolation for each test. At the
beginning of each test, a transaction is opened. When queries are made from
any process that has been allowed to access a specific connection checkout, the
sandbox ensures that the process receives the same connection, and thus is in
the same transaction.
# test/support/data_case.ex
def setup_sandbox(tags) do
pid = Ecto.Adapters.SQL.Sandbox.start_owner!(Core.Repo, shared: not tags[:async])
on_exit(fn ->
Ecto.Adapters.SQL.Sandbox.stop_owner(pid)
end)
end
Now that we have two repos, each with its own connection pool, and the production code will only use the read-only repo, we have a new problem. If we write into the test database via the writeable repo, with transaction isolation, the read-only repo will not be able to see any of the records.
Let's take a look at the internals of how Ecto.Adapters.SQL.Sandbox checks out a database connection.
def checkout(repo, opts \\ []) when is_atom(repo) or is_pid(repo) do
%{pid: pool, opts: pool_opts} = lookup_meta!(repo)
pool_opts =
if Keyword.get(opts, :sandbox, true) do
[
post_checkout: &post_checkout(&1, &2, opts),
pre_checkin: &pre_checkin(&1, &2, &3, opts)
] ++ pool_opts
else
pool_opts
end
pool_opts_overrides = Keyword.take(opts, [:ownership_timeout, :isolation_level])
pool_opts = Keyword.merge(pool_opts, pool_opts_overrides)
case DBConnection.Ownership.ownership_checkout(pool, pool_opts) do
:ok ->
if isolation = opts[:isolation] do
set_transaction_isolation_level(repo, isolation)
end
:ok
other ->
other
end
end
The post_checkout
and pre_checkin
callbacks call through to the connection
modules, calling through to the database to open or cancel a transaction. In
terms of PostgreSQL, this calls through to Postgrex
to
issue a BEGIN statement
and a
ROLLBACK statement.
For Azure SQL, this uses Tds
to
TM_BEGIN_XACT
and TM_ROLLBACK_XACT.
Nice to know, but not helpful for our purposes.
DBConnection.Ownership.ownership_checkout/2 is interesting, and provides some interesting possibilities when one follows the code trail deeper into the library. This is where I went next, but gave up after about a half hour of fiddling with different settings.
At this point, in the interest of shipping something, I switched some tests with async: false, with some manual resetting of tables.
# test/support/data_case.ex
def setup_sandbox(tags) do
pid = Ecto.Adapters.SQL.Sandbox.start_owner!(Core.Repo, shared: not tags[:async])
on_exit(fn ->
Ecto.Adapters.SQL.Sandbox.stop_owner(pid)
end)
if tags[:azure_repo] do
ro_pid = Ecto.Adapters.SQL.Sandbox.start_owner!(Azure.Repo, shared: true)
rw_pid = Ecto.Adapters.SQL.Sandbox.start_owner!(Azure.WriteRepo, shared: true)
on_exit(fn ->
~w[my azure tables]
|> Enum.each(fn table_name -> Azure.WriteRepo.query!("TRUNCATE #{table_name}") end)
Ecto.Adapters.SQL.Sandbox.stop_owner(ro_pid)
Ecto.Adapters.SQL.Sandbox.stop_owner(rw_pid)
end)
end
end
Not the happiest of outcomes, but acceptable in the short term to ship some code.
Enter Dynamic Repos
Given another look at the sandbox, I noticed that the pool itself is found via a private function lookup_meta!/1. What is that?
defp lookup_meta!(repo) do
%{opts: opts} =
meta =
repo
|> find_repo()
|> Ecto.Adapter.lookup_meta()
if opts[:pool] != DBConnection.Ownership do
raise """
cannot invoke sandbox operation with pool #{inspect(opts[:pool])}.
To use the SQL Sandbox, configure your repository pool as:
pool: #{inspect(__MODULE__)}
"""
end
meta
end
defp find_repo(repo) when is_atom(repo), do: repo.get_dynamic_repo()
defp find_repo(repo), do: repo
Finding the repo uses get_dynamic_repo/0, the documentation of which reads:
@callback get_dynamic_repo() :: atom() | pid()
Returns the atom name or pid of the current repository.
See put_dynamic_repo/1 for more information.
The documentation for put_dynamic_repo/1 includes the following:
Sometimes you may want a single Ecto repository to talk to many different database instances.
The included examples imply that this can be used for multi-tenant applications, where each tenant resides in a different database. I can see how this could also be used to shard data across multiple databases, or across N PostgreSQL schemas sharded across M database servers.
But if it can be uses to share a single repo module across multiple connection pools, could it be used to share a single connection pool across multiple ecto repos? If so, and if the SQL sandbox uses only the pool and the current pid in order to determine how to reuse connections in tests, then maybe we can use both repositories across a single connection.
# test/support/data_case.ex
def setup_sandbox(tags) do
Azure.WriteRepo.put_dynamic_repo(Azure.Repo)
pg_pid = Ecto.Adapters.SQL.Sandbox.start_owner!(Core.Repo, shared: not tags[:async])
ro_pid = Ecto.Adapters.SQL.Sandbox.start_owner!(Azure.Repo, shared: not tags[:async])
rw_pid = Ecto.Adapters.SQL.Sandbox.start_owner!(Azure.WriteRepo, shared: not tags[:async])
on_exit(fn ->
Ecto.Adapters.SQL.Sandbox.stop_owner(pg_pid)
Ecto.Adapters.SQL.Sandbox.stop_owner(ro_pid)
Ecto.Adapters.SQL.Sandbox.stop_owner(rw_pid)
end)
end
defmodule Test.Integration.CrossRepoTransactions do
use Test.DataCase, async: true
test "sharing database connections" do
{:ok, alice} =
Test.Fixtures.azure_person(:alice)
|> Azure.Person.changeset()
|> Azure.WriteableRepo.insert!()
assert {:ok, _} = Azure.Repo.get(Azure.Person, alice.id)
end
end
Voilà! The tests pass.
Deadlocked tests
Note that in the documentation for the Ecto.Adapters.SQL.Sandbox database support, a disclaimer is made that, between MySQL and PostgreSQL, only the latter supports concurrent tests. The MySQL transaction implementation is more prone to deadlocks.
So far, we've found the similar case with Azure SQL, at least in our tests. While the
above code allowed us to make most of our tests async, we had to keep some of our tests
async: true
to avoid deadlocks. In the future, hopefully we can discover the reason
for the deadlocks and make all of our tests async. Even if we had to
run all tests interacting with Azure SQL synchronously, however, this test setup is
simpler and easier to understand.
Postscript
In real code we don't name things Azure
, but rather name entities based on their
function or purpose. Azure.Repo
is used in these examples to ensure that the source of the
data is clear.