If you like DNray Forum, you can support it by - BTC: bc1qppjcl3c2cyjazy6lepmrv3fh6ke9mxs7zpfky0 , TRC20 and more...

 

Creating a database that contains 100,000,000 (100 million) records

Started by yangss01, Apr 19, 2023, 12:12 AM

Previous topic - Next topic

yangss01Topic starter

We are currently working on setting up a database containing 100,000,000 records using php mysql. However, after two days of writing with this approach, it does not seem feasible to complete the task in a timely manner.
We are exploring other options and wondering if there is another database we can use instead. One option is to create a loop, but it takes too much time to process.

We need to find a more efficient method to complete the task quickly. Our main objective is to test the record search function which requires searching for data in two fields of the database: code and activation status.
At present, we have only been able to add 1 million records to the database in two days with a lot of trouble, leaving us with 79 million more to go.
  •  

iplaytheme

1) Display the structure of the table
2) Display the data that is being uploaded

The most efficient method is to directly upload data from a file using the command LOAD DATA INFILE 'mydata.csv' INTO TABLE 'xхx', which will rapidly populate the table.
  •  

Teeproria

technique for efficiently loading large amounts of data into MySQL.
They generated 80 million records of 4kb each by writing it to a RAM disk and then loading it into MySQL using LOAD DATA. This was much faster than using INSERT INTO, which would have taken around 11 hours. The author recommends using LOAD DATA for this type of task.
The only requirement is that the file must be local to the MySQL server.
  •  

Ghazala

Good day. You can choose any DBMS for your loads as long as it supports table partitioning. Postgres is a great option, although it does have some nuances that may depend on how you structure your SQL query. Like any other database, it has its own features that you should be aware of.

Using a category+key key instead of an ID is not recommended, especially since the category itself is already 100 characters long.

Since this is the primary key, an index will be built on it and searching for the right key in Yandex will require a bitwise comparison of all 100 characters. It's not critical, but it's not an ideal approach.

If the category value repeats, it's best to normalize the table by putting the category values in a separate entity table and storing the foreign key (key id) in the MainTable table.
  •  

greenNonstick

It sounds like you're facing some challenges with your current approach of using PHP and MySQL to set up a database with 100 million records. There are a few alternative options you can consider to improve efficiency.

One option is to explore different databases that are known for handling large datasets efficiently. Some popular choices for handling big data include MongoDB, Apache Cassandra, and Apache Hadoop. These databases are designed to scale horizontally and can handle high volumes of data more effectively than traditional relational databases like MySQL.

Another approach you can consider is optimizing your existing code to improve performance. Without specific details about your implementation, it's difficult to provide specific recommendations. However, some general best practices include batch processing, using indexes appropriately, optimizing queries, and ensuring efficient memory usage.

Additionally, you mentioned that you have been using a loop to add records, but it's taking too much time. In situations like these, it's often helpful to use bulk insertion methods provided by the database or its corresponding libraries. This allows you to insert multiple records in a single query, significantly improving the efficiency of data insertion.

Lastly, it's essential to ensure that the hardware and infrastructure you're using can support the task at hand. If you're working with such a large dataset, make sure you have enough resources, such as sufficient server memory and processing power, to handle the workload efficiently.

Overall, I would recommend exploring alternative databases designed for handling big data, optimizing your code, and utilizing bulk insertion methods to improve the efficiency of setting up your database with 100 million records.
  •  


If you like DNray forum, you can support it by - BTC: bc1qppjcl3c2cyjazy6lepmrv3fh6ke9mxs7zpfky0 , TRC20 and more...