Heavily indexed table in MySQL, but the wrong index is being used resulting in a long execution time

This may have been asked before, but I haven't found a thread with my specific problem. So I have a heavily indexed table with literally 25+ million rows in it, and speed is very important. I have been working to optimize results as much as possible, but something weird keeps happening.

I have this table:

CREATE TABLE IF NOT EXISTS `externallinks_global` (
                              `url_id` BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
                              `paywall_id` INT UNSIGNED NOT NULL,
                              `url` VARCHAR(767) NOT NULL,
                              `archive_url` BLOB NULL,
                              `has_archive` TINYINT UNSIGNED NOT NULL DEFAULT '0',
                              `live_state` TINYINT UNSIGNED NOT NULL DEFAULT '4',
                              `last_deadCheck` TIMESTAMP NOT NULL DEFAULT '0000-00-00 00:00:00',
                              `archivable` TINYINT UNSIGNED NOT NULL DEFAULT '1',
                              `archived` TINYINT UNSIGNED NOT NULL DEFAULT '2',
                              `archive_failure` BLOB NULL DEFAULT NULL,
                              `access_time` TIMESTAMP NOT NULL,
                              `archive_time` TIMESTAMP NULL DEFAULT NULL,
                              `reviewed` TINYINT UNSIGNED NOT NULL DEFAULT '0',
                              PRIMARY KEY (`url_id` ASC),
                              UNIQUE INDEX `url_UNIQUE` (`url` ASC),
                              INDEX `LIVE_STATE` (`live_state` ASC),
                              INDEX `LAST_DEADCHECK` (`last_deadCheck` ASC),
                              INDEX `PAYWALLID` (`paywall_id` ASC),
                              INDEX `REVIEWED` (`reviewed` ASC),
                              INDEX `HASARCHIVE` (`has_archive` ASC),
                              INDEX `ISARCHIVED` (`archived` ASC),
                              INDEX `APIINDEX1` (`has_archive` ASC, `live_state` ASC, `paywall_id` ASC),
                              INDEX `APIINDEX2` (`live_state` ASC, `paywall_id` ASC),
                              INDEX `APIINDEX3` (`live_state` ASC, `paywall_id` ASC, `archived` ASC),
                              INDEX `APIINDEX4` (`live_state` ASC, `archived` ASC),
                              INDEX `APIINDEX5` (`live_state` ASC, `paywall_id` ASC, `reviewed` ASC),
                              INDEX `APIINDEX6` (`live_state` ASC, `reviewed` ASC),
                              INDEX `APIINDEX7` (`has_archive` ASC, `live_state` ASC, `paywall_id` ASC, `archived` ASC, `reviewed` ASC),
                              INDEX `APIINDEX8` (`has_archive` ASC, `live_state` ASC, `archived` ASC, `reviewed` ASC));

When doing queries with only one column in the WHERE clause, it works as it should, however, when I do EXPLAIN EXTENDED SELECT * FROM externallinks_global FORCE INDEX (APIINDEX1,APIINDEX2,APIINDEX3,APIINDEX4,APIINDEX5,APIINDEX6,APIINDEX7,APIINDEX8) LEFT JOIN externallinks_paywall ON externallinks_global.paywall_id=externallinks_paywall.paywall_id WHERE (live_stateIN (0, 7) OR externallinks_global.paywall_id IN (SELECT paywall_id FROM externallinks_paywall WHEREpaywall_statusIN (3)) ) ANDreviewed= 0 LIMIT 100000,1001;

For some reason it tries to use the REVIEWED index, instead of APIINDEX5. I tried to force it to use it, but then it decides to use no index. I'm a little lost. I'm clearly doing something wrong but I don't know what.

Here is the EXPLAIN from the above query:

+------+--------------+-----------------------+--------+---------------------------------------------------+---------------+---------+--------------------------------------------------+----------+----------+-------------+
| id   | select_type  | table                 | type   | possible_keys                                     | key           | key_len | ref                                              | rows     | filtered | Extra       |
+------+--------------+-----------------------+--------+---------------------------------------------------+---------------+---------+--------------------------------------------------+----------+----------+-------------+
|    1 | PRIMARY      | externallinks_global  | ALL    | APIINDEX3,APIINDEX4,APIINDEX6,APIINDEX2,APIINDEX5 | NULL          | NULL    | NULL                                             | 27193330 |   100.00 | Using where |
|    1 | PRIMARY      | externallinks_paywall | eq_ref | PRIMARY                                           | PRIMARY       | 4       | s51059__cyberbot.externallinks_global.paywall_id |        1 |   100.00 |             |
|    2 | MATERIALIZED | externallinks_paywall | ref    | PRIMARY,PAYWALLSTATUS                             | PAYWALLSTATUS | 1       | const                                            |      768 |   100.00 | Using index |
+------+--------------+-----------------------+--------+---------------------------------------------------+---------------+---------+--------------------------------------------------+----------+----------+-------------+
3 rows in set, 1 warning (0.00 sec)

1 answer

  • answered 2017-06-17 18:30 zeppelin

    The problem is in your WHERE clause:

    WHERE (live_state IN (0, 7) OR externallinks_global.paywall_id IN (SELECT paywall_id FROM externallinks_paywall WHEREpaywall_statusIN (3)) ) AND reviewed=0
    


    The composite index your have:

    INDEX `APIINDEX5` (`live_state` ASC, `paywall_id` ASC, `reviewed` ASC)
    

    can be used to do queries by any leftmost prefix (combination of columns):

    If the table has a multiple-column index, any leftmost prefix of the index can be used by the optimizer to look up rows.

    But what you are trying to do is query by live_state OR paywall_id, making it independent of live_state and leaving Mysql unable to use the index in question.

    Here is how Mysql documentation illustrates it:

    Suppose that a table has the following specification:

    CREATE TABLE test (
        id         INT NOT NULL,
        last_name  CHAR(30) NOT NULL,
        first_name CHAR(30) NOT NULL,
        PRIMARY KEY (id),
        INDEX name (last_name,first_name)
    );
    

    ...

    However, the name index is not used for lookups in the following queries:

    SELECT * FROM test WHERE first_name='Michael';
    
    SELECT * FROM test
       WHERE last_name='Widenius' OR first_name='Michael';
    

    What you may try to do to overcome that is to split your query in two parts, merged with UNION.

    The first part will query by:

    live_state IN (0, 7) AND reviewed=0
    

    (should use INDEX APIINDEX6)

    and the second part will query by:

    paywall_id IN (...) AND reviewed=0
    

    (using INDEX PAYWALLID unless you add a new paywall_id+reviewed index).