SQL or Structured Query Language is a critical tool for data professionals. It is undoubtedly the most important language for getting a job in the field of data analysis or data sciences.
Millions of data points are being generated every minute and raw data does not have any story to tell. After all this data gets stored in databases and professionals use SQL to extract this data for further analysis.
SQL manages a large amount of data, especially if there is a lot of data that is being written simultaneously and there are too many data transactions.
There are different versions and frameworks for SQL, the most commonly used is MySQL. MySQL is an open-source solution that helps facilitate SQL’s role in managing back-end data for web applications. Companies such as Facebook, Instagram, WhatsApp, etc. all use SQL for back-end data storage and data processing solutions. When an SQL query is written & run (or parsed), it is processed by a query optimiser. The query reaches SQL server, where it compiles in three phases; Parsing, Binding and Optimisation.
Parsing – A process to check the syntax
Binding – A process to check the query semantics
Optimization – A process to generate the query execution plan
In the third step, all possible permutations and combinations are generated to find the most effective query execution plan in a reasonable time. The shorter the query takes, the better it is.
What is SQL Used for?
Now that we understand what is SQL and how does it work, let’s try to see what SQL can do. This programming language has various uses for data analysts & data science professionals. It is particularly helpful because it can:
Execute queries against a database
Retrieve data from a database
Insert records into a database
Update records in a database
Delete records from a database
Create new databases or new tables in a database
Create stored procedures & views in a database
Set permissions on tables, procedures, and views
Imagine how difficult life would be if we did not have a way to control the database in this format. After getting a sense of what SQL is used for, let’s try our hands at learning SQL on an actual database.
People that master SQL understand the three main parts of a functioning SQL system:
First, they understand how and why a SQL database is used.
Second, they can identify the key structures of SQL databases.
Third, they get the information they need efficiently.
Now you might be wondering, where do you start? Here’s the deal:
We will flesh out what exactly the purpose of SQL (structured Query Language) is and how you are likely to interact with your company database.
Once you understand the use of the SQL database, we will look at how different databases are structured and why.
10 essential areas where we are focusing in this blog:
1. SQL is Relational Database
There are various database systems such as Relational, Hierarchical, Document, etc. Out of which SQL is a Relational Database. RDBMS stands for Relational Database Management System. RDBMS is the basis for SQL, and for all modern database systems like MS SQL Server, IBM DB2, Oracle, MySQL, and Microsoft Access. In simple terms, RDMS is a database where data is stored in the form of tables viz. In rows and columns. Thus, the table is basically a collection of related data entries and it consists of numerous columns and rows.
2. Keys in SQL
A key is a single or combination of multiple fields in a table. It is used to fetch or retrieve records/data rows from the data table according to the condition/requirement. Keys are also used to create a relationship among different database tables or views. Some of the SQL keys are,
Primary Key - A primary key is a field in a table that uniquely identifies each row/record in a database table. Primary keys must contain unique values. A table can have only one primary key, which may consist of single or multiple fields. When multiple fields are used as a primary key, they are the composite key.
Foreign Key - A foreign key is a key used to link two tables together. This is sometimes called a referencing key. The relationship between 2 tables matches the primary Key in one of the tables with a Foreign Key in the second.
Unique Key - A unique key is a set of one or more than one fields/column of a table uniquely identifies a record in a database table. You can say that it is a little like a primary key but it can accept only one null value and it has duplicate values.
3. SQL view
Views in SQL are a kind of virtual tables. A view also has rows and columns as they are in a real table in the database. We can create a view by selecting fields from one or more tables present in the database. Views improve the security of the database by showing only the intended data to authorized users. They hide sensitive data and Restrict access to the data in such a way that a user can see and (sometimes) modify exactly what they need and no more.
4. SQL joins
A join is how we connect two or more tables together. This allows us to summarize data that would normally be stored in different tables into one result set. This is useful when analyzing data for report creation. A JOIN is a means for combining fields from two tables by using values common to each. It helps to retrieve data from two or more database tables. The tables are mutually related using primary and foreign keys. There are various types of Joins such as Cross Joins, Inner Joins, Outer Joins, Left/Right outer Joins, Full Outer Join etc.
5.Normalization of Database
Normalization is the process of organizing data in a database. This includes creating tables and establishing relationships between those tables according to rules designed both to protect the data and to make the database more flexible by eliminating redundancy and inconsistent dependency. Normalization divides larger tables into smaller tables and links them using relationships.
The purpose of Normalization is to eliminate redundant (useless) data and ensure data is stored logically. There are various Normalization rules such as 1NF, 2NF, 3NF, and BCNF.
6. Table types:
When creating a database, we use two types of tables to split up our data: dimension tables and fact tables.
What is a Fact table?
A Fact table (also known as a measure table) as the name suggests stores a record of all the facts related to a subject. In our example the Fact table is the Orders table, here we have all the facts needed in a transaction.
We have the who (Customer ID), the what (Product ID), and the when (Order Date). This allows each line of the order table to summarise everything about a transaction.
As well as the who, what and when the Fact table holds the quantities and totals involved in the transaction e.g. Product quantity, Order total, Cost and Profit.
What is a Dimension table?
Dimension tables give structure to our Fact tables. If we look at our orders table, the Product ID and Customer ID do not mean much on their own. However, with our Dimension tables we know that Product ID 1 is a Stylish Kettle and Customer ID is Carl from Germany.
Using Dimension tables to store the details of each area of the transaction allows the database to save space and speed up processing time.
Why do we use fact and dimension tables?
We use Fact and Dimension tables to improve the performance of our database and allow for easy maintenance.
If we were to store all the text information from our dimension table in the fact table, it would look nice to the user. However, due to the duplication of text fields, the server would really struggle to handle large datasets or process queries.
Also, if an update to the details stored in our database was required, it would have to make many more changes, for example:
If Product ID 2 changed from being “Stylish Kettle” to “Normal Kettle”, we would have to find and change every row that has “Stylish Kettle”.
With Fact and Dimension tables, we only update one row within the Product Dimension table and all references within the Orders table are updated.
8. Cloning Tables in SQL
There may be a situation when you just want to create an exact copy or clone of an existing table to test or perform something without affecting the original table. If you are using MySQL, you can handle this situation by using to steps given below −
Use the SHOW CREATE TABLE command to get a CREATE TABLE statement that specifies the source table's structure, indexes, and all.
Modify the statement to change the table name to that of the clone table and execute the statement. This way you will have an exact clone table.
Optionally, if you need the table contents copied as well, issue an INSERT INTO or a SELECT statement too.
9. SQL Sequences
The sequence is a set of integers 1, 2, 3, … etc that is generated and supported by some database systems to produce unique values on demand. A sequence is a user-defined schema-bound object that generates a sequence of numeric values. They are frequently used in many databases because many applications require each row in a table to contain a unique value and sequences provide an easy way to generate them.
10. Temporary Tables in SQL
There is RDBMS, which supports temporary tables. Temporary Tables are a great feature that lets you store and process intermediate results by using the same selection, update, and join capabilities that you can use with typical SQL Server tables. The most important thing that should be known about temporary tables is that they will be deleted when the current client session terminates.