4 reasons you should not have logic in your database (cascade delete, foreign key constraints, triggers, etc)
This post is going to anger database admins everywhere, however I'm going to make a case for avoiding the following in databases:
- ON DELETE CASCADE
- Foreign key constraints
- Stored procedures
The reason for this is that they describe business logic. Consider the following uses for these database tools:
- When a record is updated create an audit trail (using triggers)
- Before adding a record to the comments table, check there is a corresponding user (foreign keys)
- When deleting a user, also delete all their addresses (ON DELETE CASCADE)
These are all business logic and do not belong in the database. From the point of view of a software developer, having these rules enforced in the database rather than the application has some fairly serious practical implications.
1. Business logic changes
When you built the system, you may not have allowed people to post without having an account. Your client may decide to allow guests to post, in which case your database now needs to accept
null in the
userId column. Making these kind of changes to the database can be slow and time consuming. For example when MySQL sees an ALTER TABLE command it essentially copies the table, makes the change then replaces the original table with the amended one. On large tables this is very slow and depending on the database server/table storage engine in use can lock the table so that it's unavailable until the indexing is complete. Blocking access to the table on a live website, potentially breaking some of the website's functionality.
These kind of changes where foreign keys must be removed happen more frequently than people would like to admit:
- Data protection law changes now means users should be able to delete their personal information (account details/shipping addresses). However, for our stock taking records we still need to keep the details of what was ordered and how much was paid.
- We are not longer just offering physical products but digital products as well, users no longer need to provide a shipping address
The same is true of
ON DELETE CASCADE, stored procedures and triggers. The underlying logic will change and you'll have to make changes to the database to accommodate new business requirements.
2. It makes deployment and version control much more difficult
When you do make a change to the database logic (e.g. adding or removing a foreign key), you have to make the change on your development server and live server. There are methods for managing deployment of database changes but doing so is significantly harder than using git to push application code changes to the production server. The more logic is in your database, the more difficult deployment of changes becomes. You can't just overwrite the source code or executable, you also have to deploy the database updates.
And if there is a mistake in the logic, if that logic is in your application code you can easily use git to revert to an earlier commit. For your database, the process is considerably more difficult and less convenient.
3. Unit testing becomes near impossible
For testing purposes it's good practice to substitute the database with mock objects. That way you can test the application logic without requiring a real database.
However, if there is any application logic in the database (including check there is a corresponding record in another table) then unless you replicate that logic in the tests (which itself is a bad thing-the logic then needs to be expressed in two different places and changed twice if the business rules change) you cannot be sure that a passing test means the application will work when a real database is present.
By placing logic in the database, that logic can only be tested if the database is running and used in the tests, this makes it impossible to perform isolated unit tests. When a test fails, is it due to a problem with the logic in the code or a problem with the logic in the database?
4. You cannot easily re-use your application with a different data source
If you have no logic in the database and your database is a dumb data store, with an ORM implementation, the database can be replaced entirely with a simple
json_decode, a web service or a completely different database implementation (moving from MySQL to MongoDB for example). Swap out the ORM implementation and you can use any data source you like, as long as there's no logic in the database.
With business logic in the database, moving the application from one data source to another means re-writing the logic for each data source. And not all databases support the same features, so that might be tricky. Or impossible if, for example, you want to replace a MySQL database with a JSON file.
By removing logic from the database you are able to run the same code in different locations with different data sources. Got a small blogging website? You can run it all from simple JSON files. Or use the exact same code for a national newspaper's website with a MySQL backend.
If all the business logic exists in the application, rather than the database, you can easily swap out the database implementation without having to rewrite the logic.
A note on indexes
To be clear, I am not telling you to avoid indexes, which are incredibly important for performance reasons. However, an index is different to a foreign key constraint as it does not affect the result of a query in any way but speed. Whether the index is there or not, the application will not see any difference, the same is not true with foreign key constraints, sometimes they will cause
INSERT queries to fail.
You should still add indexes to any column that's used in a
ORDER BY or
GROUP BY clause.
Any business logic that you might consider placing in the database can be expressed just as easily in the application code. By placing the business logic where it belongs in the application code, you can thoroughly test your application without needing a real database, deployment becomes significantly easier and by decoupling the logic from the database, it's easy to swap out the data storage mechanism while keeping the same business logic
Although database admins will cringe at my suggestions, as a software developers we need to look into more than just the database and consider the practical implications of all our choices. I want a database to store data, nothing else. If I can replace
$db->query('INSERT INTO user (...)') with
$users = $user; file_put_contents('users.json', json_encode($users)); then I consider your database well designed.
If you're interested in an ORM that supports interchangeable data sources in this way, take a look at Maphper.