This third part of MySQL optimizing tutorial mini series shows an importance of table order during JOIN. The difference between regular JOIN and optimized one is dramatic, so keep reading! This post may save your time and headaches in some MySQL problem handling.
Once upon a time I bought an application for article submissions, written in PHP. Yes, it sounds incredible, but it happened. Primary reason was my laziness to write another application from scratch, secondary one was a time pressure. Nevertheless, I did so. Today I knew that it was a mistake, because the time I spent correcting its issues was at least the same or very close to the time I would have spent developing an application from scratch. The bug described in this post was just one of tens found in this commercial (and not cheap!) web application. But let’s move deeper!
Neverending Page Loads
At beginning the application worked fine, but along with increasing number of database (MySQL) entries, the page loading time began increase too. After couple of months, when key database table had about 50.000 entries, the page loads exceded acceptable time and visitors patience. The site was absolutely useless! First I started to investigate security of the web site and server as itself. No security issues were found, so I moved into the application source code. I enabled MySQL log_slow_queries as described in my previous part of this MySQL optimization mini series.
What a Surprise!
After couple of hours I checked MySQL slow queries log and found very interesting entries in there. As I mentioned earlier, I set up the long_query_time to 5 seconds but the query_time found in slow_log was pretty far from this treshold. You probably wouldn’t believe (I did not too) but number which I saw there was 00:56:14. Yes, you can see right, it was almost one minute to perform MySQL query! I’m sure you all agree with me, that this query time is strongly NOT acceptable and must be fixed as soon as I lose all visitors.
What Now?
Once I found potencial problem described in paragraph above I started to investigate the problematic query. Slow log field named sql_text showed me that doubtful query and I immediately had my enemy in front of me.
select *, unix_timestamp(StartDate) as SD from al_articles as art inner join al_users as u on art.AuthorID = u.UserID left join al_categoryassociations as ca on art.ArticleID = ca.ArticleID inner join al_categories as c on ca.CategoryID = c.CategoryID where art.Featured=0 and art.Visible=1 and c.inrecent=1 and art.Status=1 and unix_timestamp(art.StartDate) <= unix_timestamp(now()) and art.IsFinished=1 and (unix_timestamp(art.ExpiryDate) >= unix_timestamp(now()) or art.EnableExpiry=0) order by art.StartDate DESC, art.ArticleID DESC limit 7
First I checked whether all indexes are set up properly, but there seemed to be everything OK. I’m not going to bother you with long story about hours of MySQL query, manual, google outputs and source code research. I rather jump at the end, to share the solution with you.
Remedial Action
Well, after couple of hours I found that problem will probably be the JOIN, more exactly – the MySQL optimizer puts the tables in the wrong order while joining them. This causes those neverending query executions. What I did now, is just add the STRAIGHT_JOIN into the SELECT statement as shown below.
select STRAIGHT_JOIN *, unix_timestamp(StartDate) as SD from al_articles as art inner join al_users as u on art.AuthorID = u.UserID left join al_categoryassociations as ca on art.ArticleID = ca.ArticleID inner join al_categories as c on ca.CategoryID = c.CategoryID where art.Featured=0 and art.Visible=1 and c.inrecent=1 and art.Status=1 and unix_timestamp(art.StartDate) <= unix_timestamp(now()) and art.IsFinished=1 and (unix_timestamp(art.ExpiryDate) >= unix_timestamp(now()) or art.EnableExpiry=0) order by art.StartDate DESC, art.ArticleID DESC limit 7
I was pretty excited reloading the page after this correction. It was done in a second and slow_log didn’t write any entry, which means that execution had to be under 5 seconds! What a difference comparing with 56 seconds, doesn’t it? I was pretty happy to solve this problem and saved my visitors!
Summary
As you can see from the story above, order of table joins may have dramatic influence on MySQL performance. The only sentence I found in MySQL manual was: STRAIGHT_JOIN is similar to JOIN, except that the left table is always read before the right table. This can be used for those (few) cases for which the join optimizer puts the tables in the wrong order. But you never know whether your case is not that one of those few! So, once you encounter disproportionately long page loads, I suggest to read my previous MySQL optimization article and let MySQL log slow queries. You’ll get extremely valuable information which could lead to fix an issue. Use the STRAIGHT_JOIN if it’s necessary to tell MySQL to join the tables in the order that they are specified in your statement.
unix_timestamp(art.StartDate) = unix_timestamp(now())
will NEVER use the fields index,
because the date is calculated/converted.
The indexed field must be “pure”,
for instance, if you have an indexed datetime field and do date comparison with it, it won’t uses the index on that field.
You should rewrite it as:
art.StartDate >= current_date
art.ExpiryDate >= current_date
or if it’s a DateTime
art.StartDate >= now()
art.ExpiryDate >= now()
If you really need some calculation to be done, then you must rewrite the condition such that the indexed field is pure and the condition is not and can be computed before the index is used…
For instance:
DATE_SUB(CURRENT_DATE,INTERVAL 30 DAY) <= art.ExpiryDate;
== (constant <= field)
is faster than:
CURRENT_DATE <= DATE_ADD(art.ExpiryDate,INTERVAL 30 DAY);
== (constant <= computed(field))
Also, you are using two fields to be ordered by which are not "explicitly" being stated in your SELECT uncomputed…
Normally you should never uses *,
except for debugging, if you upgrade your schema, then your application logic may break.
Did you try EXPLAIN SELECT ?
I absolutely understand what you are writing about, Fred. But in this case I’ve just pasted whole statement as it was originaly writen by the authors of that software. I realize that there are more deficiencies, but this one was used for demonstration purposes only. Nevertheless, I appreciate your comment, I’m sure it will be helpfull for all readers.
Do you proof read your articles?
If so, you should have someone else do it for you.
Good topic though.
Why? Is there something wrong with it?
Simple but great stuff. I hit with the same problem and your article helped me. Thanks Teddy.
Join order does not matter in MYSQL. The optimizer does a quite good job at figuring out the best join order. Do you have indexes on the columns you are using in the join condition? If not, MySQL has no data to use for choosing the best join order and seems to just use the same order as used in the query. I did not consider
such a case as you usually have those indexes to get those such queries done
in a reasonable time.If MySQL gets the join order wrong although there are indexes, try to run OPTIMIZE TABLE on the tables to get up to date statistics.
Great. I was in trouble by the same problem and your article helped me very much.
Thank you in advance.
Hello Teddy,
thank you very much for your article which has been very helpful to me.
I did not know about the STRAIGHT_JOIN – command in MySQL. I had written down my joins in the best order, used indexes and multiple join conditions to decrease the number of rows, and wondered why the execution time was still so long – until I read, that the tables can be put into the wrong order by MySQL.
After that I added STRAIGHT_JOIN to my command and voila – the execution time decreased from 2 minutes to 2 seconds…
Thank you so much.
Greetings from Germany.
I’ve never commented on an article before but this one saved my life. My queries were running so slow joining up 5 tables at a time and the STRAIGHT_JOIN saved the day. Henry, you are incorrect as it makes a world of difference in my queries running mySQL 5.0.
Thanks
I share Mack’s comment. I also never commented in an article and you truly saved me so much time. My query took more than 4 minutes and with your hint, now it takes only 1.2 seconds.
Thanks a lot
It was also useful for me, not for a performance issue, but because I wanted to keep the order of the records in my first table while attaching some information from other tables. STRAIGHT_JOIN did that for me.
Thanks!
P.S. I don’t know if you did that, but you could’ve helped the company and yourself by giving them your improvements, and negotiating some benefit they’d give you in return. Hmm?
Holly s**t! I spent several days trying to optimize one of the query which joins one large table multiple times. This query took longer than 30 minute!!! And what finally helped? It was enough to add “STRAIGHT_JOIN” and the query takes only 5 seconds! Many thanks for this tip!
Thanks for sharing your solution!
My query executed in 75 seconds,
the EXPLAIN showed me the tables were joined in reverse order, while the ORDER BY needed only columns from the first table in the query.
Adding STRAIGHT_JOIN to the query resulted in a execution time of 0,08 seconds, yes I can optimize my queries, and yes, mysql can be wrong sometimes 🙂