Storing data in database

jackgrant1 · Feb 13, 2023, 04:14 AM

Hello! The title of my question may be unclear, but essentially I'm curious about the most efficient way to store data in a database that has an affiliation to a certain type. For instance, an article could be categorized as both news and a blog post. To accommodate this scenario, we often create a field in the database that designates the object's type.

Now, I'm wondering if it would be more advantageous to assign an id number to each type so that we're not storing the entire word "blog" for example, but instead inserting "1". While it may be more clear to use the actual word, I'm wondering what would be the best approach if we're dealing with millions of records?

Kickera · Feb 13, 2023, 05:21 AM

Honestly, I'm not entirely sure how MySQL implements this, but in Oracle, integers can actually be larger than one byte (sometimes up to 9, depending on the bit depth). Here's an example:

Code Select

CREATE TABLE TEST (ID INTEGER, vc VARCHAR2(10 CHAR));
INSERT INTO TEST VALUES (1,'1');
SELECT DUMP(ID), DUMP(vc) FROM TEST;
DUMP(ID) DUMP(VC)
1 Typ=2 Len=2: 193,2 Typ=1 Len=1: 49

Notice the difference in length. Short strings are typically indexed for unique occurrences just as well as integers, although range predicates can cause some issues. This mainly applies to Oracle, but it's worth noting and considering for other database systems as well. It's always best to double-check information and not trust everything you read.

eetplus · Feb 13, 2023, 07:21 AM

Let's brainstorm together. To start, consider whether it's necessary to use the full int data type - how many different article types do you have? A TinyInt UNSIGNED may suffice and prevent any issues with blocking.

Another important factor to consider is the length of a varchar string. If it's only a few characters long, then this may not be an issue. However, for longer strings it's probably better to use a number type.

It's also quite likely that you'll need to use a sample and create an index that applies to several fields. In this case, the size of the index will also be an important consideration.

Newyorklimous · Feb 13, 2023, 08:48 AM

When choosing a data type for database records, it's important to keep in mind that the system and not a human will be selecting and reading these records. Therefore, it's better to opt for a smaller and more efficient data type that's better suited for machines.

For example, TINYINT UNSIGNED is a great option for all record types, as it's both fast and efficient. However, if you want to prioritize clarity, using ENUM is another good option.

It's best to avoid using VARCHAR because it takes longer to process - the length of the string must first be determined before it can be read. Additionally, comparing numbers is faster than comparing strings, and using VARCHAR takes up more space in both the table itself and the index.

tayabak · Jan 16, 2024, 01:14 AM

When it comes to storing data with affiliations to specific types, such as articles categorized as news or blog posts, the decision on how to represent these types in the database is crucial.
The approach of using numerical ids to represent types has several advantages, especially when dealing with millions of records. By assigning an id number to each type and storing the numerical id instead of the entire word, we can significantly reduce the storage space required. This is because numerical values typically take up less space than string values in most database systems. With a large number of records, this reduction in storage requirements can have a measurable impact on the overall database size and performance.

Furthermore, utilizing numerical ids for types can lead to faster query execution times. When querying the database based on type affiliations, comparing and indexing numerical values is generally more efficient than working with string values. This can result in improved query performance, especially when filtering, sorting, or joining records based on their type.

However, it's important to consider the trade-off between efficiency and maintainability. While using numerical ids can optimize storage and query performance, it can potentially introduce complexity in understanding the data, especially for those who are not familiar with the mapping between ids and their corresponding types. Therefore, it's essential to establish clear dоcumentation and communication strategies to ensure that the mapping is well-dоcumented and easily understandable by all stakeholders, including database administrators, developers, and analysts.
When dealing with a significant volume of records, employing numerical ids to represent types in a database can offer substantial advantages in terms of storage optimization and query performance. However, it's crucial to balance these benefits with the need for maintainability and clarity, and to proactively address potential challenges related to understanding and interpreting the numerical ids within the context of the data they represent.

Storing data in database

jackgrant1

Kickera

eetplus

Newyorklimous

tayabak