Solutions for a Growing Database

Started by ericstuart, Mar 11, 2023, 01:24 AM

Previous topic - Next topic

ericstuartTopic starter

My experience in managing large projects is limited and I'm currently configuring LAMP based on online instructions. Unfortunately, my MySQL database has grown to 1.1TB, with one MyISAM table containing 340 million records.

Apache and MySQL are hosted on a Kimsufi server, but it's struggling to cope with an average load of 400 simultaneous INSERT connections. There have been instances where the base drops due to MyISAM and table lock, and my code is partly to blame. I believe sharding is a viable option for the project to keep growing within a $250-300 monthly budget, and I'm curious about the optimal size of one shard and the server characteristics that it depends on. Would migrating to a different DBMS be a better solution?


These are some suggestions to optimize your database:
1) Consider partitioning your data into separate tables according to ownerId or another relevant column, which can improve performance, save storage space, and make it easier to shard across multiple servers.
2) Utilize json archiving to reduce the amount of data stored in the database.
3) Archive old data by generating reports, caches, etc. for client requests, and moving the data itself to archive storage.
4) Try another database management system, such as PostgreSQL with compressed json indexing for optimized varchar usage, or NoSQL/MongoDB for potentially faster write speeds and storage efficiency.

Partitioning can also be used to group data by relevant columns, such as by day and owner, reducing index size and speeding up data retrieval. These tips have the potential to significantly reduce database size and improve performance, depending on your specific data and usage.


One approach to improving MySQL query performance is examining the mysql-slow.log, identifying heavy queries, and making adjustments to optimize them. Typically, there are many variables that can be adjusted to improve query efficiency. Additionally, it's helpful to log all requests and analyze each one for proper index use. By removing any extra indexes, it's possible to free up a significant amount of space (up to 20-30 percent). In general, optimizing should be the first priority before deciding to shard a database. Only if there are no further optimization options available should sharding be considered.


For those seeking to improve MySQL performance, exploring forked options such as MariaDB could be a useful approach. Additionally, partitioning tables can help optimize server resources and speed up query times. Other potential solutions include implementing Memcached or Redis caching, or considering the use of Mongo for specific tables or fields.


When dealing with big data, it's important to consider storage options that can accommodate millions of rows of information. For example, data such as a year's worth of store sales, monthly internal bank transactions, or years' worth of sociological research requires an appropriate storage system. There are a variety of popular big-data storage solutions, ranging from simple text documents suitable for one-time or test projects to full-fledged databases like MySQL, which can store and process hundreds of thousands of records.

ClickHouse is another option designed for fast analytics and big-date storage. While it offers high-speed processing, it requires more advanced knowledge of data storage formats and query systems. Ultimately, choosing the right storage solution depends on the specific needs of the project, including the amount and type of data to be stored and how quickly access to that data is required.


Managing a large project can be challenging, especially when dealing with a growing MySQL database. Sharding can indeed be a viable option to handle the increasing load and ensure scalability within your budget. Sharding involves distributing your data across multiple database servers, which can help alleviate the performance issues you are currently facing.

The optimal size of one shard depends on various factors, including the specifics of your project, the data access patterns, and the hardware resources available. Generally, it's recommended to shard based on logical divisions that make sense for your application, such as by customer, region, or another relevant criterion. Each shard should ideally be small enough to be efficiently managed and scaled, but large enough to provide a significant performance improvement compared to a single server setup.

Regarding migrating to a different DBMS, it is worth considering if your current database management system is unable to meet your performance requirements even after sharding. However, keep in mind that transitioning to a different DBMS may involve additional challenges, such as rewriting queries and adapting your application code. It's important to weigh the potential benefits against the cost and effort required for migration.

To make an informed decision, consider evaluating the performance gains achievable through sharding and compare them with the benefits offered by alternative DBMS options. Additionally, assess the long-term scalability potential of your chosen solution to ensure it can accommodate future growth within your budget constraints.

additional considerations when it comes to sharding and migrating to a different DBMS:

1. Sharding:
   - Choosing a sharding strategy: Determine how you want to divide your data across shards, considering factors like data distribution, query patterns, and maintenance requirements.
   - Shard key selection: Select a shard key that evenly distributes the data and balances the workload across your shards. Careful consideration of your application's access patterns is crucial in selecting an effective shard key.
   - Data consistency and integrity: Ensure that your sharding strategy maintains data consistency and integrity across shards, especially when dealing with related data or complex queries that span multiple shards.
   - Shard management: Consider how you will manage and monitor your shards efficiently, including data backups, recovery procedures, and load balancing mechanisms.

2. Migrating to a different DBMS:
   - Assessing compatibility: Evaluate whether your application code, queries, and data models can be easily migrated to the new DBMS, or if significant modifications will be required.
   - Performance considerations: Research the performance characteristics of the alternative DBMS and consider how it will handle your specific workload and query patterns. Ensure it offers adequate scalability options for future growth.
   - Cost and resources: Analyze the cost implications of migrating to a different DBMS, including licensing fees, hardware requirements, and any necessary retraining or consulting expenses.