Cleaning data in SQL
SQL stands for Structured Query Language. SQL can be used to deal with large datasets as it can handle huge amount of data. SQL can process millions of rows of data in seconds.
SQL is a language which is used to interact with database programs to pull information from different sources in databases.
Cleaning and completing data are important step in Data analysis. SQL can help us do that in various ways. Following are some of the SQL statements can be used:
To Extract data from the database, we use SELECT statement.
SELECT - To specify the data we want to interact with in the database
FROM - We can choose any table from the database
We can also insert/add/update data in the database by using Insert function
Insert into - Specify the database name with table name and the tables to be added in parenthesis
Values - specify the values to add in
Similar way, we can update the existing data with the new information.
Update - Database file and table name
SET - The values to change
Where - the condition
If we need to create a new table for the database, we can do so with
Create if doesn’t exist -
To clean up the database for any duplicating tables:
Drop tables if exist
To Remove duplicates and extract unique data, we use distinct with our Select statement
Select DISTINCT <data type> From <Database name>
While handling String variables, to check if all string values are of same length, we use Length function
SELECT Len<the string column> as <result label name> FROM <database>
To filter sorted data in descending order
From <Database name. Column name>
Order by <Column name> desc