Test Your Seeds 17 May 2024

When building products with Phoenix and Ecto, a common practice is to include development seeds in a file named priv/repo/seeds.exs. By default, it's expected that this file will include a list of actions to run procedurally upon loading the file. With a few changes, we can instead make a testable module, so that we will know when our refactors will break the seeds for ourselves and for others.

Łukasz Rawa @ Unsplash : https://unsplash.com/photos/brown-and-black-dried-leaves-NDro8tjU4e0
𐡷

Let me provide a simple example. Let's assume that we have an application with profiles and with orgs in its database. This application might have a Core.People context for creating, getting, and updating profiles and orgs. The create functions might look a little like this:

defmodule Core.People  do
  def create_org(attrs \\ []),
    do: attrs |> new_org() |> Core.Repo.insert()

  def create_profile(attrs \\ []),
    do: creator |> new_profile(attrs) |> Core.Repo.insert()

  # ... etc
end

For development, it's helpful to provide ourselves with seeded data, so that when we open up the application locally on our workstations we are able to do things such as logging in and testing new features without having to go through the registration flow each time we reset our local database.

These development seeds might drop into priv/repo/seeds.exs as follows.

_alice = Core.People.create_profile(name: "Alice", email: "[email protected]")
_billy = Core.People.create_profile(name: "Billy", email: "[email protected]")

_org = Core.People.create_org(name: "My Organization")

These seeds are inserted when running mix tasks in a terminal. These tasks are defined as aliases in mix.exs:

defmodule MyApp.MixProject do
  use Mix.Project

  # ... application, project, deps, etc.

  defp aliases,
    do: [
      # ...
      "ecto.reset": ["ecto.drop", "ecto.setup"],
      "ecto.setup": ["ecto.create", "ecto.migrate", "run priv/repo/seeds.exs"]
    ]
end

Other team members will presumably discover when the seeds have changed, and run mix ecto.reset to drop and recreate their development database.

𐡷

Refactoring code can break seeds

Now let's presume that after several months of development, the team decides that when new orgs are created, the profile of the person doing the action should be recorded, either in logs or in the database.

defmodule Core.People  do
  def create_org(%Schema.Profile{} = creator, attrs \\ []),
    do: creator |> new_org(attrs) |> Core.Repo.insert()

  def create_profile(attrs \\ []),
    do: creator |> new_profile(attrs) |> Core.Repo.insert()

  # ... etc
end

Presumably there will be unit tests for the create_org function that will now fail. Tests for controllers and live views will also fail, highlighting any callers of create_org/1 that require updating to create_org/2. After some updates throughout the codebase, all tests and linters will pass, and one may feel free to ship the changes…

Days or weeks later, someone will try to reset their development database, and discover to their chagrin that the process crashes. Nobody remembered to update the seeds file when the function declarations changed! More importantly, no test failed to highlight the need for updating the file.

𐡷

Elixir scripts are not loaded by default

Why did nothing fail or crash when the context changed? Elixir scripts are not automatically loaded by the compile or the VM. priv/repo/seeds.exs is an Elixir script file, and must be manually required or loaded in order for the Erlang VM to compile and run its contents. Code.require_file/2 can be used to do this… but with procedural code in a seeds file, the act of requiring the file will execute its contents.

Seeds as a module

Let's start by rewriting the contents of the seeds file as a module:

defmodule Seeds do
  require Logger

  def run do
    Logger.configure(level: :info)
    run_seeds()
  end

  def run_seeds do
    _alice = Core.People.create_profile(name: "Alice", email: "[email protected]")
    _billy = Core.People.create_profile(name: "Billy", email: "[email protected]")

    _org = Core.People.create_org(name: "My Organization")
  end
end

Now nothing is executed by just requiring the file. The Seeds module can be utilized by ecto.setup with a minimal change:

defmodule MyApp.MixProject do
  use Mix.Project

  # ... application, project, deps, etc.

  defp aliases,
    do: [
      # ...
      "ecto.reset": ["ecto.drop", "ecto.setup"],
      "ecto.setup": ["ecto.create", "ecto.migrate", "run --require priv/repo/seeds.exs --eval 'Seeds.run()'"]
    ]
end

Testing the seeds module

Now we can write a simple test that executes the interior bits of the seeds:

defmodule SeedsTest do
  use Test.DataCase, async: true

  test "successfully creates seed data" do
    [{seeds_module, _}] = Code.require_file("priv/repo/seeds.exs")
    seeds_module.run_seeds()

    # assert expected profiles are created
    # assert expected orgs are created
  end
end

Note that we can't refer to Seeds directly in the test, because at the time the VM loads and compiles the test file itself, our module does not exist. Code.require_file/2 returns a list: atuple for each interior module, where the first member of the tuple is the loaded module.

𐡷

Epilogue

After initially publishing this post, a few people reached out to me with questions. Specifically, where do I put my test file, and why don't I include the Seeds module in lib (in an Elixir file, vs an Elixir script file)?

Q: Where is the test file? In the codebase from which I pulled this example, we put the test file in test/seeds_test.exs. I could also see putting this into test/priv/seeds_test.exs or test/priv/repo/seeds_test.exs. The main point is making it discoverable, but for that we use annotations in our files: Nova plugin, VSCode plugin, Neovim plugin.

Q: Why not put Seed modules in lib? In many cases I don't mind changing production code or including test or development code in packaged releases. In this case, I feel like I don't want this module to ever be accidentally run in production… it should be clear when seeing …@example.com email addresses that no one should run these functions from a release, but I can't predict what will seem obvious to myself or others in future months or years. Things that seem obvious now might not be obvious in the middle of a production incident. So rather than include the Seeds module in a file that's automatically loaded and available in any environment, I would prefer instead to ship clear, concise, and testable context functions (the create_profile and create_org functions, for example), even if those function heads are never used by workflows available from the application.

Q: What about seeds that do need to run in production? Production seeds have very different requirements from development seeds. When production seeds change, data needs to be migrated. When development seeds change, one can drop and recreate the development database. Even with an application that needs production seeds, I would prefer those code paths to be different… and consider running production seeds functions from my development seeds module.

𐡷

Attribution

  • image: Łukasz Rawa @ Unsplash: https://unsplash.com/photos/brown-and-black-dried-leaves-NDro8tjU4e0
𐡷