If you like DNray Forum, you can support it by - BTC: bc1qppjcl3c2cyjazy6lepmrv3fh6ke9mxs7zpfky0 , TRC20 and more...

 

In which DB is it better to store minute-by-minute cryptocurrencies values

Started by JanviArora, Mar 27, 2023, 03:51 AM

Previous topic - Next topic

JanviAroraTopic starter

The CoinMarketCap API is used to gather data for analysis in the system. Currently, the collected data is being displayed in the form of graphs.
Considering the rate at which new records are added (1600 per minute, or over 2.5 million per day), we are pondering the optimal storage solution.
Although we initially planned to use MySQL for storage, we believe there might be a more appropriate solution for handling these specific tasks.
Thus, we are seeking guidance from individuals experienced in database management and Big Data.
  •  


RickyChhajed

The choice of storage solution depends on various factors such as the database structure, data types, and planned operations with their frequencies.

When it comes to handling 1600 records per minute, MySQL can handle this load quite well. The capacity to store records depends on the key type used. For instance, if a bigInt key is employed, theoretically, up to 4,294,967,295 records can be accommodated in the database. At the frequency mentioned, this data would span approximately 5.6 years. However, it is important to note that the number of records alone does not provide a comprehensive assessment.

It's worth considering that you may encounter limitations in terms of hard drive space before exhausting the database resources.
  •  

jayden89

To handle a throughput of 1600 lines per minute on any database, one can employ batch insertion techniques that defer writing to the database until multiple records are accumulated. By adopting such an approach, even SQLite can achieve a rate of 100K rows per second per record.

For maximum efficiency, an alternative would be to sequentially store values in separate files for each currency, without including a date field that can be derived from the position of the value within the file.

If the workload is not too demanding, a viable solution could involve leveraging the table's index, such as the Clustered Index in PostgreSQL and MySQL or Index-Organized Tables in Oracle.

Furthermore, micro-optimizations can be implemented. For example, if it is known that data is received at minute intervals, instead of storing the time (7 bytes) or Unix epoch (4 bytes), one can store the measurement number.

It's important to strike a balance between optimizing storage and retrieval efficiency while considering the specific requirements and constraints of the system.
  •  

kr1e

ClickHouse stands out as a purpose-built database designed specifically for handling and analyzing this type of data. Notably, Yandex.Metrica utilizes ClickHouse for its operations.

One of the notable strengths of ClickHouse is its optimization for performing large transactions, capable of processing 400-500 thousand rows per second.

Additionally, ClickHouse excels in compressing substantial volumes of data, resulting in a smaller disk footprint compared to other database management systems. Moreover, it demonstrates efficient search capabilities even when dealing with millions of rows, significantly outpacing traditional RDBMSs like MySQL.

Scalability is another area where ClickHouse performs well by effectively implementing sharding techniques.

Furthermore, one of the significant advantages lies in the familiarity of ClickHouse's query language, which closely resembles SQL with minimal modifications.
  •  

Saurav Tiwari

While MySQL is a robust and widely-used relational database management system, for this specific use case, where the rate of data collection is exceptionally high, there are alternative solutions that may offer better scalability and performance.
One potential solution to consider is the implementation of a NoSQL database such as MongoDB or Cassandra. NoSQL databases are designed to handle large volumes of unstructured data and are well-suited for real-time data processing. Their flexible schema and distributed architecture make them ideal for accommodating the continuous stream of data from the CoinMarketCap API. Additionally, these databases are highly scalable and can efficiently handle the rapid growth of data without sacrificing performance.

Another consideration is the adoption of cloud-based storage and database solutions, such as Amazon S3 combined with Amazon DynamoDB or Google Cloud Storage with Google Cloud Bigtable. These cloud platforms offer virtually limitless scalability and can seamlessly accommodate the massive volume of data generated by the API. By leveraging cloud-based solutions, you can benefit from high availability, fault tolerance, and automated scaling to meet the dynamic demands of your data storage needs.

In parallel with selecting the appropriate database solution, it's crucial to devise efficient data processing and querying strategies to extract meaningful insights from the collected data. Techniques such as data partitioning, indexing, and data aggregation will be instrumental in optimizing the storage and retrieval of data, facilitating timely analysis and decision-making.
The optimal storage solution for handling the substantial data inflow from the CoinMarketCap API entails careful consideration of a NoSQL database or cloud-based storage solution. These alternatives can provide the necessary scalability, performance, and flexibility to manage the ever-increasing dataset effectively. It's essential to assess the specific requirements of your system and choose a solution that aligns with the velocity and volume of data, while also implementing efficient data processing and querying techniques to derive actionable insights from the collected data.
  •  


If you like DNray forum, you can support it by - BTC: bc1qppjcl3c2cyjazy6lepmrv3fh6ke9mxs7zpfky0 , TRC20 and more...