Storing data in database

Started by jackgrant1, Feb 13, 2023, 04:14 AM

Previous topic - Next topic

jackgrant1Topic starter

Hello! The title of my question may be unclear, but essentially I'm curious about the most efficient way to store data in a database that has an affiliation to a certain type. For instance, an article could be categorized as both news and a blog post. To accommodate this scenario, we often create a field in the database that designates the object's type.

Now, I'm wondering if it would be more advantageous to assign an id number to each type so that we're not storing the entire word "blog" for example, but instead inserting "1". While it may be more clear to use the actual word, I'm wondering what would be the best approach if we're dealing with millions of records?


Honestly, I'm not entirely sure how MySQL implements this, but in Oracle, integers can actually be larger than one byte (sometimes up to 9, depending on the bit depth). Here's an example:

1 Typ=2 Len=2: 193,2 Typ=1 Len=1: 49

Notice the difference in length. Short strings are typically indexed for unique occurrences just as well as integers, although range predicates can cause some issues. This mainly applies to Oracle, but it's worth noting and considering for other database systems as well. It's always best to double-check information and not trust everything you read.


Let's brainstorm together. To start, consider whether it's necessary to use the full int data type - how many different article types do you have? A TinyInt UNSIGNED may suffice and prevent any issues with blocking.

Another important factor to consider is the length of a varchar string. If it's only a few characters long, then this may not be an issue. However, for longer strings it's probably better to use a number type.

It's also quite likely that you'll need to use a sample and create an index that applies to several fields. In this case, the size of the index will also be an important consideration.


When choosing a data type for database records, it's important to keep in mind that the system and not a human will be selecting and reading these records. Therefore, it's better to opt for a smaller and more efficient data type that's better suited for machines.

For example, TINYINT UNSIGNED is a great option for all record types, as it's both fast and efficient. However, if you want to prioritize clarity, using ENUM is another good option.

It's best to avoid using VARCHAR because it takes longer to process - the length of the string must first be determined before it can be read. Additionally, comparing numbers is faster than comparing strings, and using VARCHAR takes up more space in both the table itself and the index.