Python is known for its simple syntax and readability, which is a major benefit. It cuts down the time data analysts otherwise spend familiarizing themselves with a programming language. The gentle learning curve makes it stand out among old programming languages with complicated syntax. So Python is widely used in data analytics due to its versatility, ease of use, and rich ecosystem of libraries.
Python offers powerful libraries like "pandas" and "NumPy" for data manipulation and preprocessing tasks. Lets dive into the "NumPy" one of the ultimate data analysis library.
What is "NumPy"?
NumPy stands for Numerical Python, is an open-source Python library that provides support for large, multi-dimensional arrays and matrices and widely used in science and engineering.
Why use NumPy?
In Python we have lists that serve the purpose of arrays, but they are slow to process. NumPy aims to provide an array object that is up to 50x faster than traditional Python lists. NumPy is very useful for performing logical and mathematical calculations on arrays and matrices. This tool performs these operations much faster and more efficiently than Python lists. NumPy uses less memory and storage space, which is the main advantage.
Why is NumPy faster than list?
· Faster to read and less bytes of memory.
· No type checking when iterating through objects.
Numpy arrays are stored at one continuous place in memory unlike lists,so processes can access and manupulate them very efficiently. This behaviour is called locality of reference in computer science.
This is the reason why Numpy is faster than list. Also it is optimized to work with latest CPU architectures.
How are List different from Numpy:
Lists – insertion, deletion, sort, appending, concatenation, etc. the method of options are minimal.
Numpy- insertion, deletion, appending, concatenation, etc. lots of options available.
Key Features:
1. Multi-dimensional arrays and matrices
2. Vectorized operations (fast element-wise computations)
3. Broadcasting (operating on arrays with different shapes)
4. Useful linear algebra functions
5. Random number generation
6. Data type support (e.g., integers, floats, complex numbers)
Install Python NumPy:
Numpy can be installed using the pip package installer. we have to run the following line in command prompt or terminal
"pip install NumPy". This will download and install most recent NumPy version.
Once NumPy is installed, import NumPy in an application by adding the 'import'
keyword "import NumPy as np".
Data structure:
The main data structure in NumPy is the ndarray, which is a shorthand name for N-dimensional array. When working with NumPy, data in an ndarray is simply referred to as an array.
It is a fixed-sized array in memory that contains data of the same type, such as integers or floating point values. The data type supported by an array can be accessed via the “dtype” attribute on the array. The dimensions of an array can be accessed via the “shape” attribute that returns a tuple describing the length of each dimension.
What is an “array”?
A NumPy array is a grid of values, all of the same type, and is indexed by a tuple of nonnegative integers. A list is the Python equivalent of an array, but it is resizable and can contain elements of different types.
The Basic of Numpy Arrays:
· Attributes of arrays : Determining the size, shape, memory consumption and data types of arrays.
· Indexing of arrays : Getting and setting the value of an individual array elements.
· Slicing of arrays : Getting and setting smaller subarrays within a larger array.
· Reshaping of arrays : Changing the shape of a given array.
· Joining and splitting of arrays : Combining multiple arrays into one and splitting one array into many.
Numpy Array Attributes:
First let’s discuss some useful array attributes. We will start by defining three random arrays, a one-dimensional, two-dimensional and three dimensional array. NumPy arrays can be defined using Python sequences such as lists and tuples. Lists and tuples are defined using [...] and (...), respectively. Lists and tuples can define ndarray creation:
· a list of numbers will create a 1D array,
· a list of lists will create a 2D array,
· further nested lists will create higher-dimensional arrays. In general, any array object is called an ndarray in NumPy.
Example:
x1 = np.array([1, 2, 3, 4])# one dimensional array
x2= np.array([[1, 2, 3], [4, 5,6],[7,8,9]])# two dimensional array
x3 = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])# three dimensional array
import numpy as np
#Array declaration
X1 = np.array([1, 2, 3, 4])
Print(x1)
Output:
[1,2,3,4]
Example:
x2= np.array([[1, 2, 3], [4, 5,6],[7,8,9]])
print(x2)
Output:
[[1,2,3]
[4,5,6]
[7,8,9]]
Each array has attributes ndim(the no.of.dimension),shape (the size of each dimension),and size (the total size of the array)
Example:
print(x2.ndim)
print(x2.shape)
print(x2.size)
Output:-
x2.ndim - 2
x2.shape – (3,3)
x2.size – 9
Another useful attribute is the dtype, the dtype of the array informs us about the layout of the array. This means it gives us information about: Type of the data (integer, float, Python object, etc.) Size of the data (number of bytes).
How to find the data type :
Example:
x4=np.array([5.5,4.5])
Print(x4.dtype)
Output:
Float64
Size of the data:
Example:
x2= np.array([[1, 2, 3], [4, 5,6],[7,8,9]])
print(x2.itemsize)
print(x2.nbytes)
Output:
8
72
In general we expect that nbytes is equal to itemize times size.
Array Indexing:
Array indexing is the same as accessing an array element. You can access an array element by referring to its index number. The indexes in NumPy arrays start with 0, meaning that the first element has index 0, and the second has index 1 etc.
Example:
a) x1=
array([1, 2, 3, 4])
x1=[3]
Output:
4
Example:
b) arr = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
print(arr[0, 1, 2])
Output:
6
Explanation:
arr[0, 1, 2] prints the value 6.
And this is why:
The first number represents the first dimension, which contains two arrays:
[[1, 2, 3], [4, 5, 6]]
and:
[[7, 8, 9], [10, 11, 12]]
Since we selected 0, we are left with the first array:
[[1, 2, 3], [4, 5, 6]]
The second number represents the second dimension, which also contains two arrays:
[1, 2, 3]
and:
[4, 5, 6]
Since we selected 1, we are left with the second array:
[4, 5, 6]
The third number represents the third dimension, which contains three values:
4
5
6
Since we selected 2, we end up with the third value:
6
Negative Indexing:-
To index from the end of the array ,we can use negative indices.
Example:
a) x1=
array([1, 2, 3, 4])
x1=[-1]
Output:
4
Array Slicing:
We can use square brackets to access individual array elements. we can also use them to access subarrays with the slice notation ,marked by the colon (:) character. The NumPy slicing syntax follows that the standard Python list to access a slice of an array x, use this
X[start:stop:step]
If any of these are unspecified ,they default to the values Start = 0, Stop = Size of the dimension, Step =1
Example:
Slice elements from index 1 to index 5 from the following array:
a=np.array([1, 2, 3, 4, 5, 6, 7])
print(a[1:5])
Output:-
[2,3,4,5]
STEP: using step value to determine the step of the slicing.
Example:
Returns every other element from index 1 to index 5:
a=np.array([1, 2, 3, 4, 5, 6, 7])
print(a[1:5:2])
Output:
[2,4]
Reshaping of Arrays:
Another useful type of operation is reshaping of arrays. The most flexible way of doing this is with the reshape method.
Note: the size of the initial array must match the size of the reshaped array.
Example:
X=np.array([1,2,3,4])
#row vectoe via reshape
x.reshape((2,2))
Output:
Array([[1,2]
[3,4]])
Array concatenation and splitting:
All of the proceeding routines worked on simple arrays. It’s also possible to combine multiple arrays into one, and to conversely split a single array into multiple arrays .we‘ll take a look at those operations here.
Concatenation of arrays:
Concatenation or joining of two arrays in Numpy ,is primarily accomplished using the routines np.concatenate,np.vstack and np.hstack.
np.concatenate takes a tuple or lists of arrays as its first argument ,as we can see here
Example:
X=np.array([1,2,3])
Y=np.array([8.12,14])
np.concatenate([x,y])
Output:
Array([1,2,3,8,12,14])
We can also concatenate more than two arrays at once.
Z=[55,55,55]
Print(np.concatenate([x,y,z])
Output:
[1,2,3,8,12,14,55,55,55]
Splitting of arrays:
The opposite of concatenation is splitting .Which is implemented by the functions np.split,np.hsplit and np.vsplit.For each of these,we can pass a list of indices giving the split points.
Example:
X=[1,2,3,55,55,8,9,7]
x1,x2= np.split(x,[2])
print (x1,x2)
Output:
[1,2] [3,55,55,8,9,7]
Conclusion:
In this overview, we explored the fundamentals of NumPy arrays. This fundamentals operation of creation, indexing, slicing and manipulation enable data analysts to clean and process data .By understanding the arrays, data professional can enhance their analytical workflow and drive business value.