Long Query Execution Time in PHP + MySQL Project

cbinstrument · Oct 19, 2023, 01:00 AM

Good afternoon. To my surprise, one of the projects I'm working on is experiencing rapid development, but we now have a problem with the database. We are using a PHP + MySQL bundle.

In particular, there is a table that already contains around 25 million records. Here is the structure of the table:

+---------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------+---------------+------+-----+---------+-------+
| task_id | int(9) | NO | PRI | NULL | |
| user_id | int(9) | NO | PRI | NULL | |
| checked | enum('0','1') | NO | MUL | 0 | |
| taken | enum('0','1') | NO | MUL | 0 | |
+---------+---------------+------+-----+---------+-------+

The issue we are facing is that the execution time of the query is very long, approximately 3 minutes, which is unacceptable.

I have read about indexes and have made sure everything is properly set up:

- The first index (task_id) cannot be removed as it is based on.
- The hardware seems capable enough for faster performance: Intel® Core™ i7-2600, 16 GB of RAM, and the entire table should fit in memory.

I have also configured the machine with the mysqltuner script, which I have always used without any issues until now with the High Load.

Could you please advise me on what I might have done wrong and how to fix this? Thank you.

bombaysagar · Oct 19, 2023, 02:27 AM

It's quite possible that despite having an index, MySQL may not be using it optimally. This may be due to suboptimal queries, certain MySQL server settings, or table structure. That being said, below are some strategies to diagnose and improve the situation:

Analyze Your Query: Post the problematic query, and we can help you optimize it. It might be using suboptimal joins, not taking full advantage of existing indices or creating temporary tables on disk rather than in memory which can slow things down significantly.

EXPLAIN Query: Use the EXPLAIN statement in front of your problematic query. This command will give you insight into how MySQL is executing the query, including whether it's using your indices. If indices are not being used as expected, you may need to refactor your query.

Normalize Your Data: From your table structure, it appears that a task can have multiple users and each user can have multiple tasks (task_id and user_id are Primary keys). This is called a many-to-many relationship. This relationship usually requires a junction table. If you don't have this, consider refactoring your data. Ensure you're following normalization practices and avoid potential redundancy.

Adjusting MySQL Configuration: Review and possibly adjust your MySQL configuration settings. Look at variables like innodb_buffer_pool_size (should be about 70-80% of your system's memory), query_cache_size, query_cache_type, etc. MySQL Tuner is great, but there may still be some room for manual adjustments.

Index Tuning: You mentioned that your checked and taken fields are indexed. Since these are enum types with only two possible values, an index may not help much in terms of performance due to low cardinality, and might even degrade performance due to the overhead of maintaining the index. Consider dropping these indices and see if performance improves.

Partitioning Your Table: For larger tables, sometimes it becomes beneficial to partition the data based on certain frequently accessed columns (one of the primary keys, maybe). But be careful, partitioning comes with its set of considerations.

Upgrading Hardware: Lastly, while hardware doesn't seem to be a bottleneck, it's worth mentioning anyway: if your data growth is aggressive and persistent, you may need to consider scaling up your hardware or moving to a distributed DBMS that can handle larger quantities of data more efficiently.

Database Optimization Techniques:

Hold Off On Using COUNT(*): If you are executing count queries on a large table, it can be slow. Instead of COUNT(*), consider using COUNT(1) or better yet COUNT(pk) where pk is your primary key.

Query Optimization: Make sure you're restricting the scope of your queries as much as possible. If you're requesting more data than you need, you'll cause unnecessary load on the server. Use LIMIT when you can, and avoid SELECT *.

Delayed Insertions: If the majority of your load is from inserts and not selects, you might want to check into the INSERT DELAYED command. It allows the insert command to return immediately and the row is queued to be inserted when the table is not in use.

Use IN() instead of OR in your queries: If you have queries that use the OR operator, consider changing it to IN(). Using IN() can often be computationally less expensive than OR.

Server Adjustments:

Check MySQL's slow query log: This can be an excellent source of information to find out which queries are taking a long time to execute. You can then focus on optimizing these as they are likely to give you the best performance boost.

ssd vs hdd: SSD drives are faster than standard hard drives. If you're not already using one, moving the database to a solid state drive could improve performance.

About Your Schema:

Change ENUM to TINYINT: ENUMs are internally stored as integers, but they come with some overhead due to their string-like behavior. Consider using TINYINT instead, if practical.

Primary Keys consideration: Depending on your use case, you might want to reconsider having multiple primary keys. Composite primary keys can sometimes slow down queries, particularly for large data sets, compared to single primary keys.

Advanced techniques you could consider:

Create a Summary Table: If you have read-heavy applications with lots of aggregate queries, creating a table that summarizes your data could be beneficial. An example could be daily or monthly summaries, depending on the most common queries for your app.

Archiving Data: If the table contains historical data that's rarely accessed, consider moving this data to an archive table.

Introducing Caching Layer: Introduce a caching system to your application (like Redis or Memcached). This way, you can store the results of common queries in the cache, drastically reducing the computing needs and the risk of hitting the database every time.

Using MySQL Full-Text Search Indexes: If your performance issues are related to text search, consider using MySQL's full-text search indexes. These can be beneficial for speeding up text-based searches.

Right Size Your Fields: Ensure your field types are appropriately sized for the data contained within them. To save space, you could for example change INT fields that don't contain overly large values to MEDIUMINT or SMALLINT.

Using Stored Procedures: Offloading some of the heavy processing to the database itself might allow for optimization by reducing network traffic between your server and database.

Scale-Up or Scale-Out: If updating queries and indexes doesn't work and you're still facing issues, then you may need to consider scaling up (adding more resources like CPU, RAM to the current database server) or scaling out (distributing your database across multiple servers).

refkaz · Oct 19, 2023, 03:40 AM

task_id_2 is redundant; any query leveraging it, can instead exploit the initial portion of the primary index identically. Hence, task_id_2 is a duplicative element and its removal is required.

Lone indexes on checked, as well as on taken, don't provide significant utility.

The suggested index structure (checked, taken, task_id) as proposed by the initial commentator is indeed a feasible option to experiment with. This composite index may enhance the querying performances by enabling more efficient data retrieval.

Peertan · Oct 19, 2023, 05:37 AM

Consider configuring the composite index to be set to checked, taken, or perhaps even task_id, checked, taken.

This might optimize your database performance.

You could also alter your query to reflect something akin to this:
select task_id from tasks_pending where checked=0 and taken=0 group by task_id limit 50;

This is an efficient way to retrieve tasks that have not been checked or taken yet. By limiting the output to 50, you ensure that you're only tackling manageable amounts of data at a time. Furthermore, utilizing "group by" enhances the efficiency of the operation.

Long Query Execution Time in PHP + MySQL Project

cbinstrument

bombaysagar

refkaz

Peertan