Getting started with Pandas

Updated: Jul 21, 2020

Pandas is an open source Python library providing high- performance, easy to use data structure and data analysis tools. Runs on top of NumPy. They are high level data structures and more streamlined handling of tabular data and time series functionality. Pandas are used for data alignment, missing -data friendly statistics, groupby, merge and join methods. It allows you do fast analysis as well as data cleaning and preparation.

Here we are going to look at some of the Pandas function helping us to predict the Gas consumption. I have used Kaggle for the below example.


Step 1: Import the library


import pandas as pd
import numpy as np

Step 2: Read the csv file:

../input/car-consume/measurements.csv – File I used


data = pd.read_csv('../input/car-consume/measurements.csv')

Step 3: Look for the details in the file.



data.info()



<class 'pandas.core.frame.DataFrame'>
RangeIndex: 388 entries, 0 to 387
Data columns (total 12 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   distance       388 non-null    object
 1   consume        388 non-null    object
 2   speed          388 non-null    int64 
 3   temp_inside    376 non-null    object
 4   temp_outside   388 non-null    int64 
 5   specials       93 non-null     object
 6   gas_type       388 non-null    object
 7   AC             388 non-null    int64 
 8   rain           388 non-null    int64 
 9   sun            388 non-null    int64 
 10  refill liters  13 non-null     object
 11  refill gas     13 non-null     object
dtypes: int64(5), object(7)
memory usage: 36.5+ KB

Step 4: Describe the data. In this way we can read the data carefully before running our train and test.


Conclusion:

I have just pen down the basics and with the help of other libraries like seaborn and matplotlib.pylot we can derive our train and test module to complete our linear regression.



35 views0 comments

Recent Posts

See All
 

© Numpy Ninja.