Database

Sevad · Nov 14, 2023, 01:08 AM

What is a Database?
A database is an organized collection of data, stored and accessed electronically. Databases are more complex than spreadsheets because they can contain more complex data, and can store millions of records.

Relational Databases: The most common type of database is a relational database. These databases use a structure that allows us to identify and access data in relation to another piece of data in the database. For example, a customer record might be related to an order, which is related to a specific product.
Object-oriented Databases: These databases store data in the form of objects, as in object-oriented programming.
Hierarchical Databases: In these databases, data is organized into a tree-like structure, with a single root, to which all the other data is linked. The hierarchy starts from the root, and expands like a tree, adding child nodes to the parent nodes.

How Do Databases Work?
There are a few key components in working with any database:

Database Management System (DBMS): A DBMS is software that interacts with end users, applications, and the database itself to capture and analyze data.
Schema: A database schema is the structure described in a database language. It provides the blueprint that defines the manner of storing data in a database.
Tables: In relational databases, tables are where all the data in a database is stored. A table is a collection of related data entries and it consists of columns and rows where columns are the category of information and rows are individual data records.
Query Language: SQL (Structured Query Language) is the standard language for dealing with Relational Databases. SQL can be used to insert, search, update, and delete database records.

Types of Databases:

Centralized Databases: All data is located at a single site.
Distributed Databases: Distributes data across different sites, such as physical locations or across a network.
NoSQL Databases: NoSQL databases (aka "not only SQL") are non-tabular, and store data differently than relational tables. NoSQL databases come in a variety of types based on their data model.

Applications of Databases:

Online Transaction Processing (OLTP): These kinds of systems are application-oriented and designed for transaction-intensive applications.
Data Warehouses: These are systems used for reporting and data analysis, and are the fundamental components of business intelligence.
Airline Reservations: Databases provide the functionality for reservations of flights, their ticketing, and cancellations.
Banking Systems: Databases handle transactions, customer data, credit/debit card data, and so on.
E-commerce: Robust and scalable databases are at the heart of successful e-commerce businesses.

Database Structure & Principles

Understanding the structure and principles underlying databases is crucial to effectively using them:

Entities and Attributes: Databases are organized around entities, which represent objects or concepts. Each entity has attributes, which are pieces of information about that entity. These could be things like a person's name, a product's price, or the date and time of a transaction.
Keys: These are specific fields used to sort data, retrieve it and establish relationships between the different tables. The primary key uniquely identifies a record, while the foreign key is used to link two tables together.
Normalization: This is a process designed to prevent data redundancy and improve data integrity. It involves organizing data in a database into tables properly, to promote the efficient handling of the data.

Types of Databases

While we've already touched upon relational databases and NoSQL databases, there are also other types:

Graph Databases: These databases use graph structures for semantic queries. Nodes represent entities or instances such as people, businesses, accounts, or any other item to be tracked.
Time Series Databases (TSDB): These are optimized for handling time-series data, i.e., data that is indexed by time (a timestamp).
In-Memory Databases (IMDB): Data is stored in main memory instead of traditional disk storage, which allows for faster data access and processing. However, these databases can be limited by memory space.

Database Operations

Core database operations include:

Create: This involves creating new databases, tables within those databases, or new entries in those tables.
Read (or Retrieve): This operation fetches or reads the data from databases.
Update: This operation modifies or updates the existing records in a database.
Delete: This operation removes records or data in a database.

These operations are often abbreviated as CRUD, and many interfaces for interacting with databases are designed around these primary functions.

Database Security

Database security is a critical component in safeguarding the information they contain:

Access Control: This involves ensuring that only authorized individuals can access the database.
Data Encryption: Encrypting data makes it unreadable without the decryption key, safeguarding the data, even if it is compromised.
Audit and Monitoring: Regular audits can help ensure that the database is not being improperly accessed or altered. Various tools exist to help automate this process.

Distributed Databases

As data requirements grow, many systems opt for distributed databases to improve accessibility, efficiency, and scalability.

Horizontal Partitioning: Also known as sharding, this involves dividing the database into rows and distributing the data across different nodes. Each node holds a different set of tuples, but they all share the same schema.
Vertical Partitioning: This method involves dividing the database into columns, where each partition holds different attributes of the tuple. A common use case is to separate frequently accessed columns from those that are infrequently accessed.
Replication: Involves creating and maintaining duplicate copies of the same database, usually in different server instances, to ensure that the data remains available even in case of failure of one of the nodes.

Enhanced Security Features

Security is an utmost priority in databases to prevent unauthorized access and protect privacy.

Multi-factor Authentication: Many database systems now allow for multi-factor authentication, which adds an extra layer of security by requiring multiple forms of proof of identity when logging in.
Masking and Anonymization: To protect sensitive data, databases often employ masking and anonymization techniques. Masking replaces sensitive data with fabricated data, while anonymization aims to make it impossible to link data back to individual users.
Intrusion Detection Systems (IDS): These systems monitor network traffic for suspicious activity and issues alerts when they detect threats.

Cloud-Based Database Services

In contemporary times, the advent of cloud computing has led to a shift from traditional on-premises databases to cloud-based databases, providing several advantages:

Scalability: Cloud-based database services enable businesses to easily scale up or scale down their database operations based on their requirements.
Backup and Recovery: Data backup and recovery are much simpler and faster in cloud databases. Most cloud service providers offer automatic backup services along with quick disaster recovery mechanisms.
High Availability: Many service providers offer strong commitments to keep databases available with minimal downtime.

Examples of popular cloud-based database services include Amazon RDS, Google Cloud SQL, Microsoft Azure SQL Database, and many others.

Indexing in Databases

Indexing is a data structure technique used to quickly locate and access the data in a database.

B-Tree Indexes: This is the most common form of indexing. It allows for faster reads and is capable of ordering entries, which is very beneficial for range queries.
Hash Indexes: This is ideal for unique values and allows for faster constant time complexity searches. However, they aren't good for range of values as there's no ordering.
Bitmap Indexes: This type of index is used mostly in data warehousing environments. It performs well for queries that contain multiple conditions in a WHERE clause.

Schema in Databases

A database schema is the skeleton structure that represents the logical view of the entire database.

Physical Schema: This describes the database design on the physical level, including storage structures and access paths.
Logical Schema: This describes the database design at the logical level, defining how the conceptual elements are organized and interact.
View Schema: This pertains to the user interface level and can be very different from the logical schema.

Transactions in Databases

A database transaction is a unit of work, typically encapsulating a number of operations.

Atomicity: This ensures that all operations within the work unit are completed successfully; otherwise, the transaction is aborted at the point of error, and all the previous operations are rolled back.
Consistency: This ensures that the database properly changes states upon a successfully committed transaction.
Isolation: Mainly applicable when multiple transactions are happening simultaneously, it ensures that the result of successfully committing a transaction is independent of the order in which they happen.
Durability: Guarantees that once the transaction has been committed, it will remain so, even in the event of power loss, crashes, or errors.

Database Management System (DBMS)

A DBMS is software for creating and managing databases.

Relational Database Management System (RDBMS): Here, data is structured in database tables, fields, and records. Each RDBMS table represents a database entity. The rows in the table represent instances of that entity, while columns store the attributes of that instance.
NoSQL Databases: These types of DBMS are designed to handle data that doesn't fit neatly into a traditional RDBMS. Types include key-value databases, dоcument databases, column-store databases, and graph databases.
NewSQL Databases: NewSQL databases attempt to combine the scalable performance of NoSQL systems with the ACID guarantees of traditional DBMS.

Data Warehouses and Data Lakes

Data warehouses and data lakes are used for storing large amounts of data, but they serve different purposes and require different kinds of processing.

Data Warehouses: A data warehouse is a large store of data collected from a wide range of sources within a company and used to guide management decisions. It is structured to allow analysts, data scientists, and business owners to pull in-depth reports.
Data Lakes: A data lake is a system or repository of data stored in its natural/raw format, usually object blobs or files. It's a place to store all types of data, irrespective of the source and structure, with an ability to process them later as per need.

NoSQL Databases

NoSQL databases are used when large quantities of distributed data are needed. There are four main types:

Key-Value Stores: Data is stored as a collection of key-value pairs. The key is a simple, unique identifier. Examples include Redis, Riak, and Amazon DynamoDB.
Column Stores: Here, data is stored in columns rather than rows. It can quickly serve data for a specific range of columns. Examples include Cassandra and HBase.
dоcument Databases: In this type of database, data is stored semistructured as dоcuments and can be nested. The data in a dоcument database is self-describing and hierarchically organized. Examples include MongoDB and Couchbase.
Graph Databases: These are databases where relationships between data are as important as the data itself. It allows network and relationship analyses. Examples include Neo4j and Amazon Neptune.

Databases and Big Data

In handling big data, databases play a critical role. Big data can come from web clickstreams, social media, sensor data, and many other sources, and handling the volume, variety, and velocity of such data is a challenge.

Data Volume: We're talking about a phenomenal amount of data, and databases need to efficiently handle and store this.
Data Variety: The data can range from structured database data to unstructured text, images, or video. NoSQL databases and data lakes are often used to handle this variety.
Data Velocity: The speed at which new data is created and needs to be processed is rapid, and databases need to keep up.
SQL on Big Data: SQL-on-Hadoop solutions like Apache Hive help bring the advantages of SQL to big data processing. These often include connectors to integrate with existing SQL-based BI tools, supporting a bridge between big data and traditional database processing.

Understanding databases – from design and management through to use cases and data analysis methodologies – increases efficiency in handling data-related tasks within an organization, providing a major advantage in informed decision-making processes.

Nibintono · Mar 20, 2025, 10:51 AM

Databases are a major bottleneck in the data management process, slowing down queries and limiting the speed of analysis. They're often designed with the needs of the data analyst in mind, rather than the needs of the business. By prioritizing data processing speed and scalability over data integrity and security, databases can actually increase the risk of data breaches and errors.

Database

Sevad

Nibintono