One day you wake up, you grab your laptop, you open your email, you see a strange alert, and you open up your exception tracking service. There you see the following text.
ActiveRecord::StatementInvalid: PG::NumericValueOutOfRange:
ERROR: integer out of range
You then close your laptop and climb back into bed. If anyone asks later, you never saw that message. What even is computer?
By default when Rails migrations create id columns in Postgres, it uses the serial primary field
native type, an integer
. When reference columns are specified, Rails uses an integer
. The range of numbers covered by the integer numeric type in Postgres is:
-2147483648 to +2147483647
The range of numbers for bigints is:
-9223372036854775808 to 9223372036854775807
For most datasets, integers are more than enough for the id space of most tables. In the lifetime of a company, the upper bound of integers might never be reached in any database. Unfortunately (or fortunately), we have run into this problem several times now. The first was caused by a bug, where data synchronized between two of our applications unintentionally deleted and recreated the data on a daily basis. A moderate-sized dataset of a few hundred million rows had overflowed the integer space of its ids. Whoops!
This turned out to be a very insidious and unexpected bug, which was particularly interesting to me because of where we have not run into integer overflows in the past. We have services with more than three billion records in a single table space. In those applications we have sharded the data heavily, using thousands of Postgres schemas as logical shards across a set of database zones. Because of the nature of that data, however, we were able to generate unique identifiers for each row based on the data in the row, in the form of base62 encoded strings. Doing so allowed us to shard data without having to worry about unique identifier generation—as a side effect, it completely obviated the possibility of integer overflow errors.
The second case where we have had to migrate from integers to bigints was another scalability project. In order to scale writes to one of our internal Rails applications, we decided to split the entire database into multiple shards. The upside was that this was quite easy to do using the Multidb gem; Multidb does not natively do everything we need, but the code is extremely simple, readable, and the missing bits were easy for us to fill in ourselves. The downside was that in order to do this sharding, we needed to generate unique ids across shards.