Feature scaling
It is a data preprocessing technique that transforms feature values to a similar scale, ensuring all features contribute equally to the model.
Feature scaling is a crucial step prior to training the machine learning model.
Algorithms that depend on distance metrics (like k-NN, SVM) or utilize gradient descent (such as linear regression, and neural networks) perform better when the features are scaled to a uniform range.
The main feature scaling techniques are normalization and standarization.
Normalization
It is the process of scaling data into a range of [0, 1]. Often called as min-max scaling.
It's more useful and common for regression tasks.
Scikit-Learn provides the MinMaxScaler for this purpose.
from sklearn.preprocessing import MinMaxScaler
norm = MinMaxScaler()
transformed_data = norm.fit_transform(data)
Standardization
It is the process of scaling data so that they have a mean value of 0 and a standard deviation of 1.
It's more useful and common for classification tasks.
Scikit-Learn provides us with the StandardScaler for this purpose.
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaled_data = scaler.fit_transform(data)
Scaling vs Normalization:
Scaling, you're changing the range of your data, while
Normalization, you're changing the shape of the distribution of your data.
Here’s an example of how to perform both standardization and normalization using Python:
The Python code has effectively transformed the BMI data using both standardization and normalization techniques. Below, you'll find the results of these transformations:
The plotted histograms highlight how the data distribution and range have changed after applying both transformations:
The original BMI histogram shows the raw distribution of the data.
The standardized BMI histogram demonstrates how the data has been rescaled to center around 0 with unit variance.
The normalized BMI histogram illustrates how the data fits within a [0, 1] range, standardizing the scale without significantly altering the shape of the distribution.
These changes enhance the model’s ability to learn efficiently and accurately without being skewed by differing features. Additionally, scaling should always be done after splitting the data to prevent information from the test set leaking into the training process.
Common Pitfalls and Best Practices
Don’t fit the scaler on test data, Always fit the scaler on the training data and then apply it to the test data.
Misunderstanding the difference between normalization and standardization. StandardScaler standardizes data, meaning it adjusts to a standard normal distribution (mean = 0, variance = 1), whereas normalization scales data to a fixed range (usually [0, 1]).
Why is Feature Scaling Important?
Some machine learning algorithms are sensitive to the scale of input data. For example, if one feature has a much larger range than others, it could dominate the learning process, leading to a biased model. By standardizing the data, we ensure that all features contribute equally to the model.
Reasons to perform feature scaling :
It improves model convergence: Feature scaling allows models, particularly gradient-descent based models to efficiently find optimal parameters and converge more easily.
Prevents feature dominance: This makes sure that no one feature is given more emphasis or dominates the presence of other features due to differences in unit measurements.
Reduces sensitivity to outliers: Normalization reduces the sensitivity of your model to the outliers in your data.
Enhances Generalization: It helps your model generalize better, allowing it to perform well on unseen data.
Improves Algorithms Performance: It improves the performance of models such as regression models, and neural networks.
When Should we scale our data?
Feature Scaling is not necessary all the time for all models but only the models that are sensitive to scaling variations in the input features, the reason we do it so often is most of the popular models are sensitive to differences in scale of features like linear regression, logistic regression, due to their nature of how they optimize their parameter and learn.
Here are some algorithms based on their learning nature:
Gradient-Descent Based Algorithms: Machine Learning algorithms that use gradient descent as their optimization techniques such as Linear Regression, Logistic Regression, Neural Networks, etc., require data to be scaled.
Distance-Based Algorithms: Distance-based algorithms such as k-nearest neighbour, clustering, support vector machines, etc are most affected by the range of features.
Tree-Based Algorithms: Decision Trees, Naive Bayesetc, are fairly insensitive to the scale of features, scaling isn’t always necessary.
Models that may not need Scaling
Tree Models
Naive Bayes
Some features in your dataset can be normalized while some standardized, it depends on the nature of the feature which you will have to identify and choose what scaling method to apply.
Feature Scaling Through Scikit-Learn Pipelines
Finally, let's go ahead and train a model with scaling features. When working on Machine Learning projects - we typically have a pipeline for the data before it arrives at the model we're fitting.
We'll be using the Pipeline class which lets us automate this process, even though we have just two steps - scaling the data, and fitting a model:
Feature Scaling is the process of scaling the values of features to a more manageable scale and is performed before feeding these features into algorithms that are affected by scale, during the preprocessing phase. Any learning algorithm that depends on the scale of features will typically see major benefits from Feature Scaling.