UUIDs: for Security OR obscurity?
Software EngineeringUUIDs (Universally Unique Identifiers) are a common alternative to autoincrementing integers when designing database schemas for web applications.
Often I’ve seen the use of UUIDs in web applications disparaged as “security through obscurity”.
That is, using UUIDs to make URL endpoints in web applications unguessable is a bad security practice, as it leads the developer into thinking they’ve introduced a layer of security, when in reality UUID based URLs offer no real security value as they can be brute forced or easily leaked simply by copying and pasting.
While I’m in total agreement that UUIDs are not a viable security mechanism, there are plenty of other use cases where UUIDs make sense and should not be disregarded.
Why use UUIDs?
Prevent the leakage of business intelligence
Simply using auto increment fields exposes a significant vector for the leakage of sensitive business information. Consider an e-commerce application which uses a URL structure such as the following for the order details page:
ecommerce-store.com/orders/{{order}}
If we expose the auto increment field to the end user, on completing an order they will be redirected to a page like:
ecommerce-store.com/orders/12345
Consider a competitor notices this technical detail and decides to place an order at the start of business and the end of business. All they need to do is subtract the end-of-day auto increment value from the start-of-day one to determine how many orders have been place in a typical business day.
For example:
# Start of day:
ecommerce-store.com/orders/12345
...
# End of day:
ecommerce-store.com/orders/12445
... would imply 100 orders placed on this particular day.
Using a UUID here prevents the leakage as there’s no way to determine how many UUIDs occur between two given values:
# Start of day:
ecommerce-store.com/orders/dcdb78cc-e33e-4f4f-aad2-27160acd97c5
...
# End of day:
ecommerce-store.com/orders/3e027584-9e3d-4a6e-ae58-ed11ac74b716
...🤔
Allow deferred creation of entities
When using auto increment IDs we cannot reference an entity until it is inserted into the database. We need the ID to identify the entity, and that ID can only be determined by inserting a record into the database. Using a UUID means we can generate the UUID and use it to refer to the new entity before inserting the record into the database. This is particularly relevant in high-load scenarios (e.g: analytics events, audit logs) where we might want to defer inserting database records to a background task.
Distributed systems (e.g: microservices, API/RPC invocations)
UUIDs allow us to uniquely refer to entities inside of distributed systems without coordination. Consider an ecommerce system constituted by a set of microservices communicating over a network. If more than one of the microservices is capable of instantiating a particular entity (for example: an "order", which can be created by the "mobile app API" microservice or the "mail order" microservice) it becomes very difficult to coordinate object creation using regular autoincrement IDs.
By leveraging UUIDs multiple systems can create entities without risk of ID collisions.
Uniquely identify entities across the entire schema
This is particularly relevant for developers.
Using UUIDs allows us to determine what kind of entity a key refers to (user, order, customer, etc) by looking up the UUID across multiple tables. This is not possible with auto increment IDs as integers are reused across all the tables.
This is very useful in logfiles and in caught exceptions, as (depending on language and tooling) it can sometimes be difficult to identify the specific entity type which has triggered a log or exception message.
Database performance concerns
With that said there are other concerns with using UUIDs everywhere. UUIDs are larger than integer keys, and often result in larger, slower indexes. My usual approach here is to use both UUIDs and auto increment IDs. The UUID is used in all user facing routes and API endpoints, along with a normal auto increment key which is never exposed to the user but is used for indexes and foreign keys.
That is: IDs for performance, UUIDs for all the other reasons detailed above.