Getting started with Pandas
Pandas is an open source Python library providing high- performance, easy to use data structure and data analysis tools. Runs on top of NumPy. They are high level data structures and more streamlined handling of tabular data and time series functionality. Pandas are used for data alignment, missing -data friendly statistics, groupby, merge and join methods. It allows you do fast analysis as well as data cleaning and preparation.
Here we are going to look at some of the Pandas function helping us to predict the Gas consumption. I have used Kaggle for the below example.
Step 1: Import the library
import pandas as pd import numpy as np
Step 2: Read the csv file:
../input/car-consume/measurements.csv – File I used
data = pd.read_csv('../input/car-consume/measurements.csv')
Step 3: Look for the details in the file.
<class 'pandas.core.frame.DataFrame'> RangeIndex: 388 entries, 0 to 387 Data columns (total 12 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 distance 388 non-null object 1 consume 388 non-null object 2 speed 388 non-null int64 3 temp_inside 376 non-null object 4 temp_outside 388 non-null int64 5 specials 93 non-null object 6 gas_type 388 non-null object 7 AC 388 non-null int64 8 rain 388 non-null int64 9 sun 388 non-null int64 10 refill liters 13 non-null object 11 refill gas 13 non-null object dtypes: int64(5), object(7) memory usage: 36.5+ KB
Step 4: Describe the data. In this way we can read the data carefully before running our train and test.
I have just pen down the basics and with the help of other libraries like seaborn and matplotlib.pylot we can derive our train and test module to complete our linear regression.