Why I prefer PostgreSQL to MySQL
_This post is more a rant against MySQL (InnoDB) rather than a technical tutorial (as I was used to write in the previous months).
Everything started with this tweet.
I'm not going to talk that much about exotic features in a multi server configuration in my comparison,
I will try to focus on inconsistencies that might affect relatively simple every-d
When I approached web development was the 2001 (I was 18). In reality I even did not know really well what "software development" means or where I will end up.
I knew MySQL just becase my very first personal website was built using it as database.
A year later I started to work in a web agency, the database they were using was PostgreSQL (at that time they were running some PostgreSQL 7.x version).
For me there were no differences between PostgreSQL and MySQL, the only difference I was noticing was that I have to use
pg_connect instead of
mysql_connect and the admin panel was "phppgadmin" instead of "phpmyadmin".
Fast forward, to nowadays. I've learned a lot and there is still a lot to learn.
I fall in love with PostgreSQL and for its consistency, but really often I have to work on projects that use MySQL and
the experience is not that nice.
I've tried to analyze the "why people start projects by using mysql instead of postgres?"
My conclusion is just one. Back in the time when I started development (~2001) PostgreSQL was not available on Windows
(Mac and Linux based SO were not yet that good). MySQL was easy to install and run.
So generations of developers learned to use MySQL, not as "a" database, but as "the" database.
In 2005 postgres 8.0 was running natively on windows, but only 2-3 years later become good enough.
In the last two decades I've built and/or maintained a variety of applications, working intensively with both databases.
Each time I had to deal with Postgres was always "ordinary work", I was able to focus on the application logic I was
developing. On the other hand, most of the time I had to work with MySQL
I had always to pay attention to quirks and edge-cases.
MySQL is a great database that contributed to the success of many other projects (included PHP). With a promise of being
SQL-compliant and ACID-compliant.
But as many of the software out there, what is MySQL today is the product of evolution.
Unfortunately MySQL developers did not manage to keep a good balance between stability, consistency and new features.
Many of the parameter configurations available in MySQL 5.7 (the most popular installed version) are about "disabling"
some of the weird features that MySQL offers.
The even more weird thing that most of the time, the default setting prefers the less-standard value,
creating more damage than benefit when they were introduced in first.
sql-mode is a mixed mess of
options aimed to disable/enable some of the weird features were added in previous versions.
ALLOW_INVALID_DATES: (enabled by default) MySQL allows you to save a date as "1900-00-00".
Why? Which days is "1900-00-00"
NO_ZERO_DATE: similarly as in the previous case, in the past (enabled by default)
MySQL allows you to save an invalid date as "0000-00-00".
explicit_defaults_for_timestamp: to make things more confusing, in MySQL 8.0 there is this new parameter that is
somewhere in between of the previous two.
option is still used nowadays to be able to retrieve some extra columns when doing a "group by" query;
The rows retrieved are not predictable and in more of the case depend on how data are internally stored in MySQL.
ENUMis one of the basic data types in MySQL, but it has many trade offs:
- A column with
ENUM('Mercury', 'Venus', 'Earth')allows you to insert records as
""(empty string)... Why
- A column with
(this is one of my favorites) Adding a column in a table might (and will in highly concurrent environments)
create duplicate rows and the failure of the column addition.
This weird behaviour is explained here,
the MySQL documentation explains the technical reason of this here.<br>
When running an online DDL operation, the thread that runs the ALTER TABLE statement applies an “online log” of
DML operations that were run concurrently on the same table from other connection threads.
When the DML operations are applied, it is possible to encounter a duplicate key entry error
ERROR 1062 (23000): Duplicate entry), even if the duplicate entry is only temporary and would be reverted by a
later entry in the “online log”. <br>
Documenting the bug allows MySQL to not fix it! Nice!
MySQL will lock the tables for most of the schema operation as adding columns, removing columns and so on,
leading to the inability to write data to the table and in some cases also to read data from it.
This is a real pain point if you have tables with a lot of data.
When making schema changes (adding/removing tables, columns, indexes and so on) MySQL will automatically commit all
the changes... making transactions useless...
MySQL allows only one trigger per type/table... making that trigger generally more complex and hard to maintain.
Another fun detail, this year MySQL started to roll-out a new major release, MySQL 8.0. Wait... where is the 6.0 and 7.0?
Out there, there are some examples of great ways of using MySQL.
Designing Schemaless, Uber Engineering’s Scalable Datastore Using MySQL:
In this article Uber explains how they use MySQL to store their trip data.<br>
They are using MySQL basically as key-value storage (with some nice extras) to store a huge amount of data
in an eventually-consistent system. It is not SQL (many operations are just not possible),
it is just an application-level managed storage. Scales well and works well for the Uber use case. <br>
To be honest, I'm not that smart to understand why want they did could not be implemented
with any other database (postgres included), since all the storage logic is implemented by the application.
MyRocks: A space- and write-optimized MySQL database
In this article, the facebook team explains how they have optimized MySQL to make a write-efficient database.<br>
MyRocks looks to be just a different database built on top of the MySQL storage-engine API.
It has weak points, limitations and advantages but is far from being MySQL with the InnoDB storage.
This are just two example of some great software build on top of MySQL. But this cases (and many other) have one point in
common. They are not really using MySQL, they are using a subset of it and building a brand new application top of it.
MySQL master-master replication
MySQL can be configured to perform master-to-master replication. This is one of the main selling point of MySQL.
Most people I've met, do not know what they are buying.
Master-Master replication is tricky, has many limitations and you lose many of the traditional ACID-compliant
relational databases features.
Auto-increments are tricky to manage because of possible write conflicts;
Unique-keys can not be easily supported and can lead to conflict situations having on different masters duplicate rows;
This is an answer to the mysql
master-master question i really like.
On the other hand... PostgreSQL!
PostgreSQL, initially developed by the Berkeley University (California) started with a much more formal approach
to databases than MySQL.
The development process of PostgreSQL is relatively strict, but on the other side PostgreSQL has a very low bug-rate
and the features added are always well thought and consistent.
Personally do not remember cases of weird features introduced just for "marketing".
The list of features that PostgreSQL offers is endless. Will try just to point out some of the most common
(and that I miss when I have to work with MySQL).
- SQL-standards: Reading the documentation postgres claims to
follow at least 160 out of 179 SQL:2011 mandatory features required for full Core conformance. On the other hand
MySQL explicitly says that they are ready to sacrifice
standards for speed and reliability (instead of pursuing a way to achieve both goals as postgres does most of the time).
- Functional Indexes: In Postgres you can use functions (build in or written by you via stored procedures) and build
indexes over them. This is just a killer feature most of the time! You can putt all the complex logic in a custom function,
index it and your searches will be super fast!
- Window-Functions: If you have to deal with reporting, this will save your life! Otherwise just do not bother about it.
Window functions perform a calculations across a set of table rows that are somehow related to the current row.
More info about it here.
- GIN and GIST indexes: This are special kind of indexes that store on the leafs more than one row-reference.
They are a killer feature if you have to store (and search) arrays, geo-data, full-text and similar complex data-sets.
Moore info about GIN/GIST indexes is available here.
- EXPLAIN: The postgres EXPLAIN query feature is just great! You should try it to see the difference.
The postgres EXPLAIN query contains detailed index usage, scan and join strategies, timing, buffers and memory usage
and much much more.
- Row level constraints: PostgreSQL in addition to foreign keys and unique indexes offers
Essentially is a check executed on each update/insert of a row and ensures the consistency of the data. As example
we can create a check as
CHECK (price > 0 AND discount <=100 OR special IS NOT NULL)
to check that the "price" column is always greater than zero and the discount is less than 100 only if "special"
is not null. The expression inside CHECK can be any valid postgres expression, including checking geo coordinates checks,
special indexes and so on.
- The merge-join strategy: is a special join strategy that is super fast when both tables have the same sorting.
- Locking: Postgres has many features to avoid locking as example the
CREATE INDEX CONCURRENTLY
that allows you to add indexes without locking.
- Adding/removing columns is atomic: adding columns (with default NULL) and removing them is super-fast as it alters
only the table metadata (no need to copy the table onto a new one).
MySQL in most of the cases has to re-write the table (that might be very expensive if you have millions of rows).
When using postgres at-scale in a highly concurrent environment this makes the difference.
- Full text: MySQL has full text support, but compared to postgres-full-text looks like a toy. PostgreSQL allows you to
configure the language of the text, fine-tune the stemming rules for each language, weighted ranking and so on.
- Spatial data: if you have to work with spatial data (geo data or geometric types/shapes) postgres is just excellent.
By default postgres has specific data types to store and query information about "shapes" as squares, triangles, circles
spheres cubes and so on. For more advanced use-cases,
PostGIS is an extension-set that makes geo-data handling a pleasure (used by OpenStreetMap as example).
MySQL introduced a similar feature-set in 5.7 but is much simpler and limited, with postgres you have more operation
types built in specialized indexes for read-optimized or write-optimized operations on shapes and much more.
The list of PostgreSQL features I like (and miss when working with MySQL) is endless.
What makes postgres so attractive
to me is the extension mechanism it has that allows to add extra features to postgres really really easily.
Postgres has already dozens of extensions ready to be used
and that will make your life easier. Some extensions I use often are
bloom (super efficient index types for probabilistic index matching),
cube (multi dimensional indexable data type for 3D or more dimensions searches),
earthdistance (calculate easily geo distances), hstore (simple, efficient and ACID key-value storage).
PostgreSQL offers master-slave replication. Works similarly to MySQL. Obviously has some differences but the big picture is similar.
Offers classical master-slave replication mode with also hot-standby master mode (essential a "waiting" master ready to
take over in case of the main master failure).
Out of the box PostgreSQL does not offer master-master replication. This could sound a big disadvantage. <br>
In my experience, with PostgreSQL takes much more time (and load) to reach a state where more than one master is necessary
for performance reasons.
(For high availability, the hot-standby master-mode is already part of the postgres core.)
In reality there are third-party solutions for master-master replications as
Postgres-BDR that do an excellent job with multi-master solutions!
If there is a rule I've learned in life is that things are never black or white, most of the time are gray.
So probably there are use cases where MySQL does a much better job than PostgreSQL, I just have not met them yet.
Would be glad to hear experiences from both sides as in my opinion there is always something to learn.
Would be glad to hear experiences where in your opinion MySQL did really a great job.
This article was originally published on https://www.goetas.com/blog/how-i-approached-software-development-and-why-i-prefer-postgresql-to-mysql/.