In which DB is it better to store minute-by-minute cryptocurrencies values

Started by JanviArora, Mar 27, 2023, 03:51 AM

Previous topic - Next topic

JanviAroraTopic starter

The CoinMarketCap API is used to gather data for analysis in the system. Currently, the collected data is being displayed in the form of graphs.
Considering the rate at which new records are added (1600 per minute, or over 2.5 million per day), we are pondering the optimal storage solution.
Although we initially planned to use MySQL for storage, we believe there might be a more appropriate solution for handling these specific tasks.
Thus, we are seeking guidance from individuals experienced in database management and Big Data.
  •  

RickyChhajed

The choice of storage solution depends on various factors such as the database structure, data types, and planned operations with their frequencies.

When it comes to handling 1600 records per minute, MySQL can handle this load quite well. The capacity to store records depends on the key type used. For instance, if a bigInt key is employed, theoretically, up to 4,294,967,295 records can be accommodated in the database. At the frequency mentioned, this data would span approximately 5.6 years. However, it is important to note that the number of records alone does not provide a comprehensive assessment.

It's worth considering that you may encounter limitations in terms of hard drive space before exhausting the database resources.
  •  

jayden89

To handle a throughput of 1600 lines per minute on any database, one can employ batch insertion techniques that defer writing to the database until multiple records are accumulated. By adopting such an approach, even SQLite can achieve a rate of 100K rows per second per record.

For maximum efficiency, an alternative would be to sequentially store values in separate files for each currency, without including a date field that can be derived from the position of the value within the file.

If the workload is not too demanding, a viable solution could involve leveraging the table's index, such as the Clustered Index in PostgreSQL and MySQL or Index-Organized Tables in Oracle.

Furthermore, micro-optimizations can be implemented. For example, if it is known that data is received at minute intervals, instead of storing the time (7 bytes) or Unix epoch (4 bytes), one can store the measurement number.

It's important to strike a balance between optimizing storage and retrieval efficiency while considering the specific requirements and constraints of the system.
  •  

kr1e

ClickHouse stands out as a purpose-built database designed specifically for handling and analyzing this type of data. Notably, Yandex.Metrica utilizes ClickHouse for its operations.

One of the notable strengths of ClickHouse is its optimization for performing large transactions, capable of processing 400-500 thousand rows per second.

Additionally, ClickHouse excels in compressing substantial volumes of data, resulting in a smaller disk footprint compared to other database management systems. Moreover, it demonstrates efficient search capabilities even when dealing with millions of rows, significantly outpacing traditional RDBMSs like MySQL.

Scalability is another area where ClickHouse performs well by effectively implementing sharding techniques.

Furthermore, one of the significant advantages lies in the familiarity of ClickHouse's query language, which closely resembles SQL with minimal modifications.
  •