Multiple same table Left Joins very slow

I have two tables, lets just say a user table, and a dates table. They would look something like this:

User

ID_User | Title | Firstname | Surname | JobNumber
1       | Mr    | Bob       | Smith   | JOB001
2       | Mrs   | Bobbi     | Smythe  | JOB001
...
13000

Dates

ID_Date | ID_User | DateType | DateAssigned | JobNumber
1       | 1       | Intent   | 21-Jun-2016  | JOB001
2       | 1       | Reg      | 21-Apr-2017  | JOB001
3       | 1       | Flight   | 21-May-2017  | JOB001
4       | 2       | Intent   | 09-Dec-2016  | JOB001
5       | 2       | Flight   | 01-Jan-2017  | JOB001
...
5000

The unique index is ID_User+DateType+JobNumber.

There may be any amount of DateTypes.

When I do a query like the follow, it takes ages.

select
  ID_User,
  Title,
  Firstname,
  Surname,
  JobNumber,
  DI.DateAssigned as Date_Intent,
  DR.DateAssigned as Date_Reg,
  DF.DateAssigned as Date_Flight
from
  User as U
  left join Dates as DI on U.ID_User = DI.ID_User
    and DI.JobNumber = "JOB001"
    and DI.DateType = "Intent"
  left join Dates as DR on U.ID_User = DR.ID_User
    and DR.JobNumber = "JOB001"
    and DR.DateType = "Reg"
  left join Dates as DF on U.ID_User = DF.ID_User
    and DF.JobNumber = "JOB001"
    and DF.DateType = "Flight"
where
  U.JobNumber = "JOB001"
order by
  U.Surname,
  U.Firstname;

Each JobNumber will have only 300 people in it, with a max of say 5 different date types.

Why does it take so long? we're talking 2 minutes.

Is there another way of writing this?

Dates Table:

CREATE TABLE `ATL_V2_Assigned_Dates` (
  `ID_Date` bigint(7) unsigned NOT NULL AUTO_INCREMENT,
  `JobNumber` varchar(10) NOT NULL DEFAULT '',
  `ID_User` bigint(7) unsigned NOT NULL DEFAULT '0',
  `DateAssigned` datetime NOT NULL,
  `DateType` varchar(100) NOT NULL,
  `Comment` text NOT NULL,
  `Updated` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `Inserted` datetime NOT NULL,
  PRIMARY KEY (`ID_Date`),
  UNIQUE KEY `ID_Date` (`ID_Date`) USING BTREE,
  UNIQUE KEY `unq_idx` (`JobNumber`,`ID_User`,`DateType`) USING BTREE,
  KEY `JobNumber` (`JobNumber`) USING BTREE,
  KEY `ID_User` (`ID_User`) USING BTREE,
  KEY `DateType` (`DateType`) USING BTREE
) ENGINE=MyISAM AUTO_INCREMENT=3975 DEFAULT CHARSET=utf8;

UPDATE 12 Jan 2017

Very Strange, query is running in 0.06s now, and here's the output from:

explain select
  U.ID_User,
  U.Title,
  U.Firstname,
  U.Surname,
  U.JobNumber,
  DI.DateAssigned as Date_Intent,
  DR.DateAssigned as Date_Reg,
  DF.DateAssigned as Date_Flight
from
  ATL_Users as U
  left join ATL_V2_Assigned_Dates as DI on U.ID_User = DI.ID_User
    and DI.JobNumber = "ACI001"
    and DI.DateType = "Deadline - Intention"
  left join ATL_V2_Assigned_Dates as DR on U.ID_User = DR.ID_User
    and DR.JobNumber = "ACI001"
    and DR.DateType = "Event - Registration"
  left join ATL_V2_Assigned_Dates as DF on U.ID_User = DF.ID_User
    and DF.JobNumber = "ACI001"
    and DF.DateType = "Deadline - Flight"
where
  U.JobNumber = "ACI001"
order by
  U.Surname,
  U.Firstname;

+----+-------------+-------+--------+------------------------------------+-----------+---------+------------------------------------+------+----------------------------------------------------+
| id | select_type | table | type   | possible_keys                      | key       | key_len | ref                                | rows | Extra                                              |
+----+-------------+-------+--------+------------------------------------+-----------+---------+------------------------------------+------+----------------------------------------------------+
|  1 | SIMPLE      | U     | ref    | JobNumber                          | JobNumber | 32      | const                              |  506 | Using index condition; Using where; Using filesort |
|  1 | SIMPLE      | DI    | eq_ref | unq_idx,JobNumber,ID_User,DateType | unq_idx   | 342     | const,cclliveo_atl.U.ID_User,const |    1 | Using where                                        |
|  1 | SIMPLE      | DR    | eq_ref | unq_idx,JobNumber,ID_User,DateType | unq_idx   | 342     | const,cclliveo_atl.U.ID_User,const |    1 | Using where                                        |
|  1 | SIMPLE      | DF    | eq_ref | unq_idx,JobNumber,ID_User,DateType | unq_idx   | 342     | const,cclliveo_atl.U.ID_User,const |    1 | Using where                                        |
+----+-------------+-------+--------+------------------------------------+-----------+---------+------------------------------------+------+----------------------------------------------------+

I don't know what I/we've done can someone point me to who you think provided the answer and I'll tick it. Thanks guys.

4 answers

  • answered 2017-01-11 14:25 Thorsten Kettner

    You are probably lacking appropriate indexes. Try:

    create index idx_user (jobnumber, id_user);
    create index idx_dates (jobnumber, datetype, id_user, dateassigned);
    

  • answered 2017-01-11 14:25 Nandakumar

    This is the best way to join the same table, not sure about the time taken. Even if you query through 30,000 records, it wouldnt take 2 minutes. This must be due to some other problem, like multiple connections to the database.

  • answered 2017-01-11 14:25 P.Salmon

    You could try conditional aggregation to avoid all those joins Given

    drop table if exists Userjobs;
    create table userjobs (ID_User int, Title varchar(10), Firstname varchar(10), Surname varchar(10), JobNumber varchar(10));
    insert into userjobs values
    (1       , 'Mr'   ,  'Bob'  ,      'Smith'  ,  'JOB001'),
    (2       , 'Mrs'  ,  'Bobbi',      'Smythe' ,  'JOB001');
    
    
    drop table if exists jobDates;
    create table jobdates(ID_Date int, ID_User int, DateType varchar(10), DateAssigned date, JobNumber varchar(10));
    insert into jobdates values
    (1       , 1       , 'Intent'   , '2016-06-21'  , 'JOB001'),
    (2       , 1       , 'Reg'      , '2017-04-21'  , 'JOB001'),
    (3       , 1       , 'Flight'   , '2017-05-21'  , 'JOB001'),
    (4       , 2       , 'Intent'   , '2016-12-09'  , 'JOB001'),
    (5       , 2       , 'Flight'   , '2017-01-01'  , 'JOB001');
    
    MariaDB [sandbox]> select
        ->   u.ID_User,
        ->   Title,
        ->   Firstname,
        ->   Surname,
        ->   u.JobNumber,
        ->   max(case when datetype = 'intent' then dateassigned else null end) as intent,
        ->   max(case when datetype = 'reg' then dateassigned else null end) reg,
        ->   max(case when datetype = 'flight' then dateassigned else null end) as flight
        -> from
        ->   Userjobs as U
        -> left join jobDates as jd on U.ID_User = jd.ID_User
        ->     and jd.JobNumber = u.jobnumber
        -> where u.jobnumber = 'JOB001'
        -> group by   u.ID_User,
        ->   Title,
        ->   Firstname,
        ->   Surname,
        ->   u.JobNumber;
    +---------+-------+-----------+---------+-----------+------------+------------+------------+
    | ID_User | Title | Firstname | Surname | JobNumber | intent     | reg        | flight     |
    +---------+-------+-----------+---------+-----------+------------+------------+------------+
    |       1 | Mr    | Bob       | Smith   | JOB001    | 2016-06-21 | 2017-04-21 | 2017-05-21 |
    |       2 | Mrs   | Bobbi     | Smythe  | JOB001    | 2016-12-09 | NULL       | 2017-01-01 |
    +---------+-------+-----------+---------+-----------+------------+------------+------------+
    2 rows in set (0.00 sec)
    

  • answered 2017-01-11 14:25 Rick James

    U needs INDEX(JobNumber, Surname, Firstname). That should cover the WHERE and ORDER BY, thereby avoiding a 'filesort'.

    For Dates, you have UNIQUE(ID_User, DateType, JobNumber), correct? Let's get rid of id from that table, then replace the UNIQUE with

    PRIMARY KEY(JobNumber, ID_User, DateType)
    

    This will make the lookups more efficient because the bottom of the BTree will contain the DateAssigned and the three rows needed will be adjacent because of the "clustering" of the PK.

    Unless you have some other queries (read or modify) touching Dates, there should be no other indexes on that table.

    How big are these tables? You realize that you will be reading both of them entirely. However my suggestions will lead to reading each row only once, not many times.