top of page
hand-businesswoman-touching-hand-artificial-intelligence-meaning-technology-connection-go-

What is Star Schema?

Star schema is a way of arranging data in a database. In star schema, data is divided into one fact table and multiple dimension tables connected to the fact table. This schema is widely adapted in data warehouses as it is suitable for simple queries. Before delving deep into star schema let's understand the key terms Normalization and Denormalization.


Normalization and Denormalization

Normalization of data reduces data redundancy and improves data consistency. To achieve normalized data, data is divided into multiple tables so that data is not repeated in any of the tables. Whereas in denormalization, tables are combined into single table for easy access. Although normalization can be used to maintain consistency, it can impact time taken for simple queries since it requires multiple joins and complex queries. Let's take an example to understand these terms better. Say, we have a table named 'Medicine' which contains columns Medicine_id, name, Drug class, Side effects. In denormalized form we will have these columns in a single table but whereas in normalized form this data is divided into multiple tables like Drug and Medicine. Medicine table will have the foreign key in it to connect with Drug table.


Star Schema

Now let's see how is normalization and denormalization are related to star schema. Star schema uses denormalized data, hence making querying easier. Here we have a fact table which is also known as transactional table and multiple dimension tables that connect with the single fact table. The physical appearance of this database is like that of a star.


Fact tables generally record events in a period. They are usually very large tables having many rows since all the events are recorded on a single table. These tables usually contain numerical values and the foreign keys to dimension tables.


Dimension tables contain primary key, and the details related to that dimension. The size of dimension table is small compared to the fact table. All the dimension tables have Surrogate key as primary key. Surrogate key is an artificial key that is generated for data handling purposes, and they do not have any physical meaning or business importance. For example, Patient_id in a patient table is used for ease of handling the patient records in Database but they do mean anything in the real world.


Let us see an example of Patient Visit data for better understanding. In this example, Fact_Patient_visit is a Fact table. This table records information of every visit, along with foreign keys of dimension tables. This table contains redundant data. Patient, Doctor, Nurse, Medication are some of possible dimension tables. These dimension tables give descriptive information of each dimension like name. All the dimension tables are connected to one fact table.




If you see this Database relationship diagram, the center fact table is connected all the dimension tables with a foreign key. It gives easy access to the required information by simply using joins between dim tables with fact table.


Slowly changing Dimensions (SCD)

Some dimensions change does not change very frequently but they can change over a period. For example, a person's phone number ,email id or address. They do not change on a regular basis, but they may change in a span of a few years. Such dimensions are known as Slowly changing dimensions. Star schema perceives SCDs in two ways, Type 1 and Type 2 SCDs. In Type 1 SCD, the changed dimension is updated in the table, that means the existing value is replaced by the new value as if the older value never existed. In type 2 SCDs, a new row is generated with new values but old information is still stored. Example for Type 2 SCD is an employee work location or designation. They are prone to change and it is important that previous information is stored unlike email id or phone number. Usually, a new row with a different unique id is created in the employee table for the same employee with a different designation. Start time, End time are maintained to know what the start and end times of that designation for the employee are. Typically, a flag column which indicates the current designation is present.


Difference between Star and snowflake schemas

Snowflake schema uses normalized form of tables, that means there is no data redundancy in tables. To achieve normalization, a greater number of tables need to be created. In contrast to star schema where all tables are connected to one single table, in snowflake schema each table can be connected to more than one table. If we consider the above Patient visit example, the patient table can be further divided to Patient, Address, Gender tables.


Why choose Star Schema?

  • Easy to understand and implement when compared to snowflake schema.

  • Efficient query performance. Since the data is denormalized it does not require complex join queries to retrieve data and perform calculations/ analysis on it.

  • It is heavily used in OLAP systems

  • It is convenient to use where there is transactional data like hospital visits, sales, orders.

Drawbacks of Star Schema

The major drawback of this model is its bulky. Since it is denormalized data, there is lot of duplicate data that is stored, which has an impact on its size. Data redundency may lead to compromised data integrity.


Conclusion

Data models must be chosen based on the requirement. If the applications need complex outputs or storage space is a constraint or data integrity is vital, then star schema is not the right choice. But if the application demands a good amount of querying but simple querying, star schema is their best bet.

35 views0 comments

Recent Posts

See All
bottom of page