When talking about query optimization, indexing is one of the first topics that gets covered. In this post we will cover the types of indexes, why they might be important, their benefits and cons.
Although it’s almost impossible to cover everything related to indexes in one post (the entire subject deserves a book), we will try to dig deeper and figure out the differences of multiple types of indexes and where they might be used.
Indexes (also called keys) are data structures that improve the speed of operations in a table. There are several types of indexes:
NULL
values;MATCH()
and AGAINST()
statements. A type of index that finds keywords in the text instead of comparing values directly to the values in the index. This type of index has its own nuances – stopwords, search modes etc.;NULL
values and once set, values in the primary key cannot be changed. Frequently used together with AUTO_INCREMENT
on automatically incrementing ID values.The purpose of an index is to optimize query performance by avoiding a full table scan.
B-tree indexes, which are the default indexes for most MySQL storage engines, function by storing the data in a sorted order. They are most frequently used to gain speed advantages when a query has a WHERE
clause.
MySQL uses indexes for these operations:
WHERE
clause;Indexes can also be made to use the first n characters of a column. Using indexes in this way makes the index much smaller. Prefixes can be up to 767 bytes long for InnoDB tables that use the REDUNDANT
or COMPACT
row format (row formats are out of scope for this post so we aren’t going to go into details).
A UNIQUE
index is a type of index that allows to enforce the uniqueness of values in a column. Duplicate values being inserted into the column with a UNIQUE
index will error out with an “Error 1062 – duplicate key” unless the IGNORE
statement is used – existing duplicates in a column cannot be removed by adding a UNIQUE
index without adding an IGNORE
statement either.
Here’s an interesting one though – the values in a column with a UNIQUE
index are distinct, but the column can have multiple NULL
values in it. Even though this was reported as a bug in 2007, it’s not.
A search that is utilizing a FULLTEXT
index is completely different from a search that’s utilizing an ordinary index. This is the case because as already outlined above, this type of index has many unique features – we will look at stopwords first.
There are two types of stopwords – MyISAM stopwords and InnoDB stopwords each exclusive to the engine in use.
The stopword list on MyISAM tables is derived from the file storage/myisam/ft_static.c
– MyISAM has 143 stopwords. The variable called ft_stopword_file
enables the default list to be changed with words from another file, or for stopwords to be ignored altogether. The defined file is not parsed for comments – MyISAM treats all words as stopwords.
Stopwords can also be disabled by setting the ft_stopword_file
variable to an empty string. However, if the variable or the contents of the stopword file are changed, the FULLTEXT
index must be rebuilt by using the REPAIR TABLE table_name QUICK
statement where table_name is the name of the table that needs the indexes to be rebuilt.
By default, the InnoDB stopword list is much shorter than the MyISAM list having only 36 words. Stopwords on FULLTEXT
indexes are enabled only if the innodb_ft_enable_stopword
variable is set at the time the index was created.
InnoDB stopwords are derived from the table set in the variable innodb_ft_user_stopword_table
if set, otherwise they are derived from the innodb_ft_server_stopword_table
variable. If neither of those two variables are set, InnoDB uses the built-in list.
Search modes are another unique feature of FULLTEXT
indexes. There are three types of search modes:
FULLTEXT
search query with no modifiers is run – it can also be enabled utilizing the IN NATURAL LANGUAGE MODE
modifier.FULLTEXT
search query utilizes the WITH QUERY EXPANSION
modifier. Using this search mode is generally useful when the user has some implied knowledge (for example, a user searching for “database” might expect to see “MySQL”, “InnoDB” and “MyISAM” in the search results).FULLTEXT
search query utilizes the IN BOOLEAN MODE
modifier. Using this search mode is useful if you need to find rows with complex or fuzzy matching, for example, you could find rows containing a Database keyword but not the MySQL keyword.A boolean full text search supports the following operators:
FULLTEXT
searches have minimum length of words. The length is three characters for InnoDB engines and four characters for MyISAM engines.
FULLTEXT
indexes are useful if you’re doing “floating” searches where a LIKE
query does not utilize an index (we discuss this in the index usage examples).
FULLTEXT
indexes are suitable for MATCH() AGAINST()
operations – not for WHERE
operations meaning that if a WHERE
clause will be used, the usefulness of a B-Tree index will not be eliminated. It is also worth mentioning that the FULLTEXT
index can only be created on CHAR
, VARCHAR
or TEXT
columns.
A descending index type was introduced into MySQL in MySQL 8.0 – a query will use this type of an index when a descending order is requested by the query. In order to add a descending index to a column, append DESC
to the end of the index creation statement. Descending indexes are supported only by the InnoDB storage engine.
A primary key for a table represents columns that are most frequently used in queries. One table can have only one primary key – when a primary key constraint is defined, a unique index is created. A primary key has an associated index in order to improve query performance – that index will be a clustered index by default.
Every InnoDB table has a clustered index – usually it’s the PRIMARY KEY
, but what the clustered index might be depends on certain things:
PRIMARY KEY
? If it does, InnoDB will use it as the clustered index;UNIQUE INDEX
? If the key columns are NOT NULL
, InnoDB will use that column as the clustered index.Clustered indexes store a B-tree index and the rows together in the same structure – lookups are usually faster than those on secondary indexes, however, you can only have one clustered index per table.
Secondary indexes are all indexes that are not primary indexes. They contain the columns of the primary key along with their own columns – a secondary index is a way to access rows without using the primary key.
Aside from b-tree indexes, there are also hash indexes. The main difference in between b-tree indexes and hash indexes is that hash indexes allow the usage of equality comparisons that use the = and <=> operators while b-tree indexes allow the usage of column comparisons in expressions that use the =, >, >=, <, <= or BETWEEN operators.
A covering index is an index that covers all of the data needed for the query to run successfully. When a covering index is used, all of the required fields are included in the covering index. That means that the query that uses the covering index will not fetch the rows from a table – instead, it will fetch the data fields from the index.
MySQL can also create composite indexes – composite indexes are indexes that consist of multiple columns. The topic of utilizing multicolumn indexes is often misunderstood – one of the topics that causes the most confusion in this space is the order of columns. When using multicolumn indexes choosing their order is vital: before creating indexes on columns, it might be a good idea to run some queries to figure out which column has a higher selectivity first, then index the columns based on that.
It’s worth keeping in mind that if there’s a choice between multiple indexes, MySQL normally uses the index that finds the smallest amount of rows, so it can be a smart decision to have a look at data cardinality before beginning indexing your columns.
Now, we will look at a few examples of when certain types of indexes are used incorrectly and change them. Let’s look at the first sample of index usage:
SELECT * FROM table WHERE column LIKE '%text%';
Using an index on a query like so would not speed up the query because it is not clear what the row begins with – in order to fix this issue, change the query to the following:
SELECT * FROM table WHERE column LIKE 'text%';
In this case, the query would actually use the index because a search term is at the start of the string.
Another frequently occurring mistake is that the indexed column is not isolated – MySQL can’t use indexes if the query performs some sorts of operations (multiplying, division etc.) after the column is mentioned in the query:
SELECT * FROM table WHERE column * 10 = 100
Such a query will not be very effective – it cannot use the index because the column is not isolated. In order to fix such an issue, isolate your columns – make sure that the indexed column is the only thing before the equation. The column should not be inside of a function or a part of some sort of expression.
MySQL allows developers to create multiple indexes of the same type on a column. For example, look at this query:
CREATE TABLE table (
`id` INT(5) NOT NULL,
`field` VARCHAR(10) NOT NULL,
`field_2` VARCHAR(10) NOT NULL,
UNIQUE(id),
INDEX(id)
) ENGINE = InnoDB;
The above code block would create two indexes on the same column. Add “AUTO_INCREMENT PRIMARY KEY
” to the “id” column and you will have three of them. The only feasible reason to do this would be when you would want different types of indexes to satisfy different types of queries. Keep in mind that you could need hash or full text indexes together with a b-tree index because the index types are different.
Duplicate indexes on multiple columns are quite a bit different, but in a nutshell:
By using the INFORMATION_SCHEMA.INDEX_STATISTICS
table and enabling the userstats
variable in MySQL you can see how much each index is used and then drop unnecessary indexes – this way you can gain some disk space, particularly if the table contains many rows. Indexes that are not used are just a waste of disk space.
Indexes in MySQL can be a very powerful tool – however, sometimes they might do more harm than good. If you opt to use indexes, know the differences, pros and cons between different index types carefully and adapt accordingly.
There have been rumors about a data breach targeting Schneider Electric. Did a data breach…
There have been rumors about the Fiskars Group – the company behind Fiskars scissors and…
Russia has fined Google more than two undecillion roubles because Google has refused to pay…
Why does RockYou 2024.txt look like a binary file when you open it up? Find…
Duolicious is a dating app that connects people who are “chronically online.” Did the Duolicious…
This blog will tell you what RockYou 2024 is, how RockYou 2024.txt came to be,…