Understanding MySQL: The Power of Indexing

Post Views: 965

When talking about query optimization, indexing is one of the first topics that gets covered. In this post we will cover the types of indexes, why they might be important, their benefits and cons.

Although it’s almost impossible to cover everything related to indexes in one post (the entire subject deserves a book), we will try to dig deeper and figure out the differences of multiple types of indexes and where they might be used.

What are Indexes?

Indexes (also called keys) are data structures that improve the speed of operations in a table. There are several types of indexes:

INDEX – a type of index where values don’t need to be unique. Accepts NULL values;
UNIQUE INDEX – all values on a field with this index must be unique. Frequently used to eliminate duplicate values in a table;
FULLTEXT INDEX – an index applied on fields that utilize full text search capabilities. Queries using this index use MATCH() and AGAINST() statements. A type of index that finds keywords in the text instead of comparing values directly to the values in the index. This type of index has its own nuances – stopwords, search modes etc.;
DESCENDING INDEX – introduced in MySQL 8.0. A descending index is an index sorted in a reverse order;
PRIMARY KEY – an unique index. Accepts no NULL values and once set, values in the primary key cannot be changed. Frequently used together with AUTO_INCREMENT on automatically incrementing ID values.

The purpose of an index is to optimize query performance by avoiding a full table scan.

Ordinary Indexes

B-tree indexes, which are the default indexes for most MySQL storage engines, function by storing the data in a sorted order. They are most frequently used to gain speed advantages when a query has a WHERE clause.

MySQL uses indexes for these operations:

To speed up queries with a WHERE clause;
To query through less rows;
To acquire rows from other tables in join operations;
To help the optimizer to look up rows if a multiple-column index is used – in that case, a leftmost side index is used.

Indexes can also be made to use the first n characters of a column. Using indexes in this way makes the index much smaller. Prefixes can be up to 767 bytes long for InnoDB tables that use the REDUNDANT or COMPACT row format (row formats are out of scope for this post so we aren’t going to go into details).

Unique Indexes

A UNIQUE index is a type of index that allows to enforce the uniqueness of values in a column. Duplicate values being inserted into the column with a UNIQUE index will error out with an “Error 1062 – duplicate key” unless the IGNORE statement is used – existing duplicates in a column cannot be removed by adding a UNIQUE index without adding an IGNORE statement either.

Here’s an interesting one though – the values in a column with a UNIQUE index are distinct, but the column can have multiple NULL values in it. Even though this was reported as a bug in 2007, it’s not.

Full text Indexes

A search that is utilizing a FULLTEXT index is completely different from a search that’s utilizing an ordinary index. This is the case because as already outlined above, this type of index has many unique features – we will look at stopwords first.

There are two types of stopwords – MyISAM stopwords and InnoDB stopwords each exclusive to the engine in use.

MyISAM Stopwords

The stopword list on MyISAM tables is derived from the file storage/myisam/ft_static.c – MyISAM has 143 stopwords. The variable called ft_stopword_file enables the default list to be changed with words from another file, or for stopwords to be ignored altogether. The defined file is not parsed for comments – MyISAM treats all words as stopwords.

Stopwords can also be disabled by setting the ft_stopword_file variable to an empty string. However, if the variable or the contents of the stopword file are changed, the FULLTEXT index must be rebuilt by using the REPAIR TABLE table_name QUICK statement where table_name is the name of the table that needs the indexes to be rebuilt.

InnoDB Stopwords

By default, the InnoDB stopword list is much shorter than the MyISAM list having only 36 words. Stopwords on FULLTEXT indexes are enabled only if the innodb_ft_enable_stopword variable is set at the time the index was created.

InnoDB stopwords are derived from the table set in the variable innodb_ft_user_stopword_table if set, otherwise they are derived from the innodb_ft_server_stopword_table variable. If neither of those two variables are set, InnoDB uses the built-in list.

Full text Search Modes

Search modes are another unique feature of FULLTEXT indexes. There are three types of search modes:

A natural language search mode – such search mode interprets the search string as a literal phrase. The value is provided in double quote characters. Such search mode will be enabled if a FULLTEXT search query with no modifiers is run – it can also be enabled utilizing the IN NATURAL LANGUAGE MODE modifier.
A query expansion search mode – such a search mode works by performing the search twice. The search phrase for the second search is the same as the original search phrase, but a few most relevant documents from the first search are also included in the result set. Such search mode will be enabled if a FULLTEXT search query utilizes the WITH QUERY EXPANSION modifier. Using this search mode is generally useful when the user has some implied knowledge (for example, a user searching for “database” might expect to see “MySQL”, “InnoDB” and “MyISAM” in the search results).
A boolean search mode – such a search mode allows searching based on very complex queries that can include Boolean operators. The search mode will be enabled if the FULLTEXT search query utilizes the IN BOOLEAN MODE modifier. Using this search mode is useful if you need to find rows with complex or fuzzy matching, for example, you could find rows containing a Database keyword but not the MySQL keyword.

A boolean full text search supports the following operators:

+ defines that a word must be present in a row;
– defines that a word must not be present in a row;
< and > changes the rank of the search value accordingly. “<” ranks the value lower, “>” ranks the value higher;
( and ) allows to create subexpressions;
~ lowers the value’s contribution to the results. For example, if a query “+green ~blue” would be executed, it would find rows that contain both the words green and blue, but if the row contains the word “blue”, it would be rated lower than if it did not;
* is a wildcard operator. A query “some*” would find “something”;
“” matches only rows that match the literal value typed in the double quotes.

FULLTEXT searches have minimum length of words. The length is three characters for InnoDB engines and four characters for MyISAM engines.

FULLTEXT indexes are useful if you’re doing “floating” searches where a LIKE query does not utilize an index (we discuss this in the index usage examples).

FULLTEXT indexes are suitable for MATCH() AGAINST() operations – not for WHERE operations meaning that if a WHERE clause will be used, the usefulness of a B-Tree index will not be eliminated. It is also worth mentioning that the FULLTEXT index can only be created on CHAR, VARCHAR or TEXT columns.

Descending Indexes

A descending index type was introduced into MySQL in MySQL 8.0 – a query will use this type of an index when a descending order is requested by the query. In order to add a descending index to a column, append DESC to the end of the index creation statement. Descending indexes are supported only by the InnoDB storage engine.

Primary Keys

A primary key for a table represents columns that are most frequently used in queries. One table can have only one primary key – when a primary key constraint is defined, a unique index is created. A primary key has an associated index in order to improve query performance – that index will be a clustered index by default.

Clustered Indexes

Every InnoDB table has a clustered index – usually it’s the PRIMARY KEY, but what the clustered index might be depends on certain things:

Does your table have a PRIMARY KEY? If it does, InnoDB will use it as the clustered index;
Does your table have a UNIQUE INDEX? If the key columns are NOT NULL, InnoDB will use that column as the clustered index.

Clustered indexes store a B-tree index and the rows together in the same structure – lookups are usually faster than those on secondary indexes, however, you can only have one clustered index per table.

Secondary Indexes

Secondary indexes are all indexes that are not primary indexes. They contain the columns of the primary key along with their own columns – a secondary index is a way to access rows without using the primary key.

Hash Indexes

Aside from b-tree indexes, there are also hash indexes. The main difference in between b-tree indexes and hash indexes is that hash indexes allow the usage of equality comparisons that use the = and <=> operators while b-tree indexes allow the usage of column comparisons in expressions that use the =, >, >=, <, <= or BETWEEN operators.

Covering Indexes

A covering index is an index that covers all of the data needed for the query to run successfully. When a covering index is used, all of the required fields are included in the covering index. That means that the query that uses the covering index will not fetch the rows from a table – instead, it will fetch the data fields from the index.

Multicolumn Indexes

MySQL can also create composite indexes – composite indexes are indexes that consist of multiple columns. The topic of utilizing multicolumn indexes is often misunderstood – one of the topics that causes the most confusion in this space is the order of columns. When using multicolumn indexes choosing their order is vital: before creating indexes on columns, it might be a good idea to run some queries to figure out which column has a higher selectivity first, then index the columns based on that.

It’s worth keeping in mind that if there’s a choice between multiple indexes, MySQL normally uses the index that finds the smallest amount of rows, so it can be a smart decision to have a look at data cardinality before beginning indexing your columns.

Index Usage Examples

Now, we will look at a few examples of when certain types of indexes are used incorrectly and change them. Let’s look at the first sample of index usage:

SELECT * FROM table WHERE column LIKE '%text%';

Using an index on a query like so would not speed up the query because it is not clear what the row begins with – in order to fix this issue, change the query to the following:

SELECT * FROM table WHERE column LIKE 'text%';

In this case, the query would actually use the index because a search term is at the start of the string.

Another frequently occurring mistake is that the indexed column is not isolated – MySQL can’t use indexes if the query performs some sorts of operations (multiplying, division etc.) after the column is mentioned in the query:

SELECT * FROM table WHERE column * 10 = 100

Such a query will not be very effective – it cannot use the index because the column is not isolated. In order to fix such an issue, isolate your columns – make sure that the indexed column is the only thing before the equation. The column should not be inside of a function or a part of some sort of expression.

Avoid Duplicate Indexes

MySQL allows developers to create multiple indexes of the same type on a column. For example, look at this query:

CREATE TABLE table ( `id` INT(5) NOT NULL, `field` VARCHAR(10) NOT NULL, `field_2` VARCHAR(10) NOT NULL, UNIQUE(id), INDEX(id) ) ENGINE = InnoDB;

The above code block would create two indexes on the same column. Add “AUTO_INCREMENT PRIMARY KEY” to the “id” column and you will have three of them. The only feasible reason to do this would be when you would want different types of indexes to satisfy different types of queries. Keep in mind that you could need hash or full text indexes together with a b-tree index because the index types are different.

What about Redundant Indexes on Multiple Columns?

Duplicate indexes on multiple columns are quite a bit different, but in a nutshell:

If you have an index on (A, B) you don’t need another index on A because it is the most leftmost prefix of the index. However, you might need an index on B because the column is not the leftmost and thus, the column might not be redundant;
Indexes of different types do not “clash” with b-tree indexes. That is why you can have a, for example, b-tree index and a full text index – they would not “clash” because they are suited for different purposes.

Unused Indexes

By using the INFORMATION_SCHEMA.INDEX_STATISTICS table and enabling the userstats variable in MySQL you can see how much each index is used and then drop unnecessary indexes – this way you can gain some disk space, particularly if the table contains many rows. Indexes that are not used are just a waste of disk space.

Summary

Indexes in MySQL can be a very powerful tool – however, sometimes they might do more harm than good. If you opt to use indexes, know the differences, pros and cons between different index types carefully and adapt accordingly.