Why Connecting SQL with Python ?
In the ever – evolving field of Data sciences and analytics , the ability to seamlessly integrating SQL databases with Python is not just a luxury – It’s a necessity
Overview
What is Structured Query Language :
SQL is a domain specific language used for managing and manipulating relational databases. This can be implemented in various DBMS like PostgreSQL, MySQL, SQLite, Oracle and Microsoft SQL server.
What is Jupyter :
Jupyter Notebook is a Interactive Development Environment (IDE) type , open source web application that allows to create and share documents that contain live code, equations, visualizations and narrative texts. It supports various programming languages ., most commonly Python.
The Purpose of this combination:
By combining SQL with Jupyter you seamlessly pull data from databases perform complex analysis and push results back to database if needed.
The robust data handling capabilities of SQL with powerful analytical tool like Python , including libraries for machine learning like scikit-learn, Tensor Flow , data visualization like matplotlib, seaborn and for statistical data like NumPy, Pandas and more ., achieve and enhances efficiency, scalability, flexibility.
This integration streamlines workflow easier to derive insights from the data .
In this blog , I would like to show you how to save the Pandas data frames directly to PostgreSQL using single line codes.
We will see 3 different methods of saving the databases through Jupyter .
Retrieve database from PostgreSQL into Jupyter
Creating new database into Postgres from Jupyter
Using SQLAlchemy combining both pandas data frame and Postgres
Psycopg2 is popular PostgreSQL database adapter for python programming language . It has feature rich capabilities in making it widely used tool for interacting with PostgreSQL databases in Python applications.
Psycopg2 enables to execute SQL commands and manage database connections.
The python package manager is !pip install psycopg2 .
SQLAlchemy is powerful and flexible SQL toolkit and ORM library (Object Relational Mapping) for python. This set of tools make it easier while working with databases interactions and to map python objects to database tables.
Its easier to perform CRUD operations (Create, Read, Update, Delete) and manage complex database schemas.
!pip install sqlalchemy
And installing both 'psycopg2' and 'sqlalchemy' at the same time is often necessary because they complement each other when working with PosgreSQL data base and Python data frames.
!pip install psycopg2 sqlalchemy
Part -1 Retrieve data from existing PostgreSQL data base
This code is used to establish connection to postgresql database.
psycopg2.connect is in built function enabling to perform database operations.
cursor object allows to execute SQL commands and queries. Its acts like an intermediate source between python code and the data base . pgconn.cursor() is the important method to follow for creating an object.
This code helps to fetch the results and convert to Pandas data frame.
pgcursor.execute("select * from actor") , is the sql command passed to retrieve all the records from the existing actor table from dvd Rental database.
rows = pgcursor.fetchall() , This line fetches all the rows returned by select query and stores them in variable "rows".
df is the data frame created from fetched rows.
The column parameter helps to set the column names
columns = desc[0] for desc , is the list comprehension iterates over each description .
pgcursor.description is the attribute helps to extract column names from cursor description, extracts the first item desc[0] with the column name.
To Modify the existing data through Pandas is also possible
pgconn.commit() is a function used after executing any SQL command to save the changes made in DB .
so, from the code above ia about updation to be changed to set last_name to 'Dove' whose actor_id is 1
so we can see the updation of the command is done successfully.
Old record
New updated record
The output from both previous and updated results are shown .
So, this method allows to retrieve and update data from PostgreSQL database and convert to Pandas data frame.
We can modify the data as per requirement .
Part-2 Creating new database into Postgres from Jupyter
This code above is in detailed steps to follow by creating the database to SQL.. then you can able to see this newly created DB in your PostgreSQL interface . (Refresh)
Once the creation is happened successfully , the cursor code allows to execute SQL commands and this "select current_database()" statement passed to clarify which database we are connected to.
Therefore , we successfully created and connected to following database . Now, lets insert data into the database .
This is the list of tuples , where each tuple contains , Location, School type, Grades level of the school and their Ranking .
Now, to created school_table passing if that table doesn't already existed create a new table ,
ID as the primary key with auto increment
Location as variable character with maximum length of 50 characters
School , Grade accordingly
Ranking as an integer field .
Now insert values from the school_list to the school_table
'execute_values' is the function from 'psycopg2.extras' is used to efficiently execute multi row insert operation into PostgreSQL . It can significantly improve performance compared to individually passing the insert statements in a loop.
Now , pgcursor.execute function and row in pg.cursor commands helps to execute all rows from school_table .
This approach is especially useful when dealing with large datasets or when performance is concern.
Part 3. SQLAlchemy combining Pandas data frames & PostgreSQL
Firstly, we are going to read the pandas data frame from CSV file
Skills is the sample excel data to work with our example on connecting and achieving result using SQLAlchemy
Setup:
Awaiting approach!! Most powerful SQL toolkit and ORM library for Python is SQLAlchemy !!
It simplifies the interaction with DB and provides a more pythonic way of querying and manipulations in database.
We just created a new database into PostgreSQL, named , 'skills_db'
Connect to the skills_db by your actual credentials
To create an engine using SQLAlchemy we typically use 'create_engine' function
To create our engine first we pass our dialect which is postgresql , then we pass our driver or database API , then username , password , host and then database .
Now the engine is successfully created and connected .
Lets see how to connect data frame to PgSQL
'to_sql ' is the method in pandas used to store data to sql data base from pandas dataframe.
data is the panda data frame name given
skills_table where data frame will be written.
engine SQLAlchemy engine is the one that provides connection to database.
if_exists is the parameter specifies to append the values if table already exists. We can use 'replace' or fail' according to the requirement.
'read_sql_query' is the method used to read the data back from sql table to pandas data frame to verify the data.
SQLAlchemy handles connection pooling and transaction management , ensuring that data base connections can be reused efficiently. And provides other powerful tools for building complex queries. SQL Alchemy's features can help with read, write , clean , maintain efficient database code whether it is smaller or larger project applications.
One more interesting point I would like to show to play with these data frames "Last but not least" !!
to_excel method in pandas help to export our updated version of data base into excel to our file explorer.
The same way we can convert and save our data into CSV file in our file explorer.
to_csv method , lets verify with file explorer
Periodic schedule of this data export to a location will definitely help the customers to retain or retrieve the data incase of any accidental data loss and if systems are unavailable.
References :
Comments