top of page
  • Upma

Cleaning data in SQL

SQL stands for Structured Query Language. SQL can be used to deal with large datasets as it can handle huge amount of data. SQL can process millions of rows of data in seconds.

SQL is a language which is used to interact with database programs to pull information from different sources in databases.

Cleaning and completing data are important step in Data analysis. SQL can help us do that in various ways. Following are some of the SQL statements can be used:

To Extract data from the database, we use SELECT statement.

SELECT - To specify the data we want to interact with in the database

FROM - We can choose any table from the database

We can also insert/add/update data in the database by using Insert function

Insert into - Specify the database name with table name and the tables to be added in parenthesis

Values - specify the values to add in

Similar way, we can update the existing data with the new information.

Update - Database file and table name

SET - The values to change

Where - the condition

If we need to create a new table for the database, we can do so with

Create if doesn’t exist -

To clean up the database for any duplicating tables:

Drop tables if exist

To Remove duplicates and extract unique data, we use distinct with our Select statement

Select DISTINCT <data type> From <Database name>

While handling String variables, to check if all string values are of same length, we use Length function

SELECT Len<the string column> as <result label name> FROM <database>

To filter sorted data in descending order

SELELCT <data>

From <Database name. Column name>

Order by <Column name> desc

81 views0 comments
bottom of page