SQL queries being slow is one of the main complaints directed at MySQL. Why are queries slow? What’s making SQL queries slow? How to make them faster? We will explore some suggestions which could (hopefully) point developers in the right direction – we will look at optimizing all four CRUD operations – Create, Read, Update and Delete.
Understand what You’re Dealing With
As far as MySQL is concerned, developers have a few choices when dealing with the RDBMS. They can choose the table storage engines. Correctly choosing a table storage engine is crucial – wrong choice in this scenario can cost both time and resources in the long run. There are seven frequently used table storage engines:
- InnoDB – the default storage engine for MySQL 5.5 and higher. Supports transactions, row-level locking and foreign keys;
- MyISAM – this storage engine manages non-transactional tables and supports full-text searching;
- MEMORY – this storage engine stores all data in memory, hence the name;
- BLACKHOLE – a /dev/null storage engine. Accepts data but does not store it;
- CSV – an engine that stores data in text-files using a comma-separated values format;
- ARCHIVE – a storage engine that produces special-purpose tables that store large amounts of unindexed data in a very small footprint;
- FEDERATED – a storage engine that enables data access from a remote MySQL database without using replication or clustering.
We won’t be taking a deep look at table engines in this post, however, it is widely accepted that when dealing with data that is important we should use the InnoDB engine – the only advantages of MyISAM engine are that the MyISAM tables occupy less space on disk compared to uncompressed InnoDB tables and that the COUNT(*) query on MyISAM is quite a bit faster than on InnoDB, that’s a topic for another day though.
Understand the Query Lifecycle
Before you ask why are your queries slow, remember that queries are tasks that are composed of subtasks. In order to optimize a query, in a nutshell, you optimize the performance of the subtasks by making them quickier, making them happen fewer times or not happen at all.
While accomplishing all of these tasks, the query goes through multiple “stages” – network, CPU, operations (planning etc.), locking and fetching rows. The speed of those operations depend on memory, CPU and the disk speed – I/O operations are also important.
Query Optimization 101.1 – Only ask the Database for What you Need
A lot of times when a query is slow, that is because it is fetching much more rows or using more columns than necessary. That might be the case if a SELECT statement that uses all columns is used – we’re talking about
SELECT *. What you should try to do instead is only ask for data you actually need – for example, if your table has id, name and purchase date columns and you only need to fetch the name of the customer, use
SELECT name instead of
SELECT *. Avoiding fetching things you don’t need can prevent needless operations, and in turn, make your query run faster.
On the other hand, if the database you’re dealing with doesn’t have many rows and you need to simplify the development of your project by, for example, using the same block of code in multiple places, using the
SELECT * approach can be reasonable. Just make sure you do it after fully understanding its impact on the database.
Query Optimization 101.2 – Does your Database Contain Needless Data?
Another thing you might need to consider is your database actually containing needless data. If you only need distinct values in the database, but it is full of duplicates, the queries will obviously run slower. This might not be as noticeable on small databases, but as the database gets larger, the problem might become more and more apparent.
In order to avoid storing needless data, examine your needs upfront, and don’t store data you know you won’t need to access. This can be accomplished by removing duplicate values by utilizing an
UNIQUE INDEX (we will cover indexes later) or including a
DISTINCT statement in your
Query Optimization 101.3 – Do You Profile Your Queries?
MySQL query profiling can enable you to figure out in what stage of the query it gets stuck. In order to properly profile a query, perform these steps:
- Insert a new row into the setup_actors table of performance schema in order to enable query profiling for the user that will run the query:
INSERT INTO performance_schema.setup_actors (HOST,USER,ROLE,ENABLED,HISTORY) VALUES ('localhost', 'root', '%', 'YES', 'YES');
- Ensure that stage instrumentation is enabled (most likely it will be enabled by default, but it never hurts to check):
UPDATE performance_schema.setup_instruments SET ENABLED = 'YES', TIMED = 'YES' WHERE NAME LIKE '%statement/%';
UPDATE performance_schema.setup_instruments SET ENABLED = 'YES', TIMED = 'YES' WHERE NAME LIKE '%stage/%';
- Ensure that certain consumers are enabled – specifically, the events_statements_* and events_stages_*:
UPDATE performance_schema.setup_consumers SET ENABLED = 'YES' WHERE NAME LIKE '%events_statements_%';
UPDATE performance_schema.setup_consumers SET ENABLED = 'YES' WHERE NAME LIKE '%events_stages_%';
- Run the query you want to profile under the account you set in the setup_actors table. For example:
SELECT * FROM test_table WHERE id = 1;
- Identify the event ID of the statement you ran by running:
SELECT EVENT_ID, TRUNCATE(TIMER_WAIT / 1000000000000, 6) AS Duration, SQL_TEXT FROM performance_schema.events_statements_history_long WHERE SQL_TEXT LIKE '%Event ID%';
- Finally, query the events_stages_history_long table to get all of the events the query went through:
SELECT event_name AS Stage, TRUNCATE(TIMER_WAIT / 1000000000000, 6) AS Duration FROM performance_schema.events_stages_history_long WHERE NESTING_EVENT_ID = Event ID;
When you accomplish all of those things, you should be see a result set that should look something like this:
Another (though almost a deprecated as of MySQL 5.6.7) way to perform profiling is by:
SET profiling = 1;in the database;
- Running the queries that you need to profile;
- Get the query IDs by running
- See the query specific profiling information by running
SHOW PROFILE FOR QUERY xwhere x is the ID of the query.
Query Optimization 101.4 – Do You EXPLAIN Your Queries?
EXPLAIN statement is intended to be used together with a query that needs explaining.
EXPLAIN works on
DELETE statements. The statement provides information about how does an MySQL database execute a query and can be a great way to quickly understand the bottlenecks of query performance.
EXPLAIN statement can be a great way to quickly identify the causes of queries being slow, but when that doesn’t help, it’s time to dive deeper.
What are we optimizing?
Before you optimize your queries for speed, you must know what statement are you optimizing and what result is good enough. How fast is fast enough? Is a query being executed in a second fast? What about five seconds? Ten seconds? A minute?
Then, decide what kind of a query you need to be optimizing. Is it a
SELECT statement? An
INSERT statement? An
UPDATE statement? A
DELETE statement? Maybe you need to optimize an
ALTER TABLE statement?
Each of those statements accomplish different tasks and so, every one of them needs to be approached differently. We’re going to be starting with
SELECT query optimization.
Optimizing SELECT Queries – Indexing
Indexing is one of the first things you will hear when talking about
SELECT query optimization. An index, simply put, is a data structure that improves the speed of operations in a table. There are multiple kinds on indexes each suited for different things. Indexing works slightly differently in each engine too – we won’t be covering everything in this chapter, only the basics.
There are five types of indexes:
- INDEX – a type of index where values don’t need to be unique. Accepts NULL values;
- UNIQUE INDEX – all values on a field with this index must be unique. Frequently used to eliminate duplicate values in a table;
- FULLTEXT INDEX – an index applied on fields that utilize full text search capabilities. Queries using this index use
AGAINST()statements. A type of index that finds keywords in the text instead of comparing values directly to the values in the index. This type of index has its own nuances – stopwords, search modes etc.;
- DESCENDING INDEX – introduced in MySQL 8. A descending index is an index sorted in a reverse order;
- PRIMARY KEY – an unique index. Accepts no NULL values and once set, values in the primary key cannot be changed. Frequently used together with AUTO_INCREMENT on automatically incrementing ID values.
An index optimizes query performance by avoiding a full table scan. However, indexes are only one player in a book. There many others, such as partitioning.
Optimizing SELECT Queries – Partitioning
Partitioning in MySQL allows a table to be split into multiple subtables but still be treated as one table in the SQL layer. There are multiple types of partitioning:
- Partitioning by RANGE – a table partitioned by range contains rows that fall within a range. Values are defined by the
VALUES LESS THANoperator;
- Partitioning by LIST – a table partitioned by list enables to spread data based on a pre-defined set of values;
- Partitioning by COLUMNS – a table partitioned by columns enables developers to use multiple columns in partitioning keys;
- Partitioning by HASH – a table partitioned by hash enables developers to ensure data is distributed evenly across multiple partitions;
- Partitioning by KEY – columns partitioned by KEY partitioning must be a part of the primary key.
Partitioning can make queries faster if there’s a need of dealing with big data sets. It is worth mentioning that since from the SQL perspective partitions are treated as tables, developers can restore individual partitions.
Optimizing INSERT Queries
In order to optimize
INSERT queries, keep away from indexes (try to avoid inserting data into a column containing an index – the more indexes your table has, the slower
INSERTs will become), utilize extended
INSERTs or consider loading data into a table in chunks.
One thing to note is that you should avoid INSERT queries when dealing with big data sets – consider bulk importing instead.
Optimizing UPDATE Queries
UPDATE queries in MySQL, bear in mind that the number of records being updated has a rather huge impact on the query performance.
In order to improve
UPDATE query perfomance, consider adding a
WHERE clause to the query – this way only rows that match the clause will be updated. Bear in mind that
UPDATE queries with a
WHERE clause will be faster if you use an index.
Also consider delaying updates, then after some time locking your table and performing multiple
UPDATEs at once – doing so is much quicker than performing one
UPDATE at a time.
Optimizing DELETE Queries
In order to optimize
DELETE query performance, use the MySQL-specific
LIMIT clause to tell MySQL how many rows should be deleted at once and repeat the statement by slightly altering it afterwards – there’s not that much you can do in this scenario.
In short, in order to optimize query performance, make sure you’re only asking the database to return rows you actually need, don’t make the database examine unnecessary things, where appropriate, use indexes (bear in mind that indexes are only effective when the
WHERE clause is used – indexes slow down
INSERT performance), take a look at partitioning,
EXPLAIN and profile your queries if needed, also consider utilizing the