Gestational Diabetics can lead to complications for both mother and baby. The treatment always includes special meal plans and scheduled physical activity, and it may also include daily blood glucose testing and insulin injections. Early screening improved the pregnancy outcomes, such as emergency cesarean section, neonatal hypoglycemia and macrosomia. So while working on gestational diabetics data, there is a small doubt that if we can predict the GDM in patients at their first visit based on some basic biomarkers it might be helpful for the patients. So machine Learning has been used on the gestational diabetics data to predict their chances of getting GDM in future trimesters.
Importing Libraries :
import numpy as np import pandas as pd from sklearn.impute import KNNImputer from sklearn.model_selection import train_test_split, StratifiedKFold from sklearn.ensemble import RandomForestClassifier from xgboost import XGBClassifier from sklearn.datasets import make_classification from sklearn.impute import SimpleImputer import plotly.graph_objs as go import plotly.offline as py import seaborn as sns import matplotlib.pyplot as plt from sklearn.metrics import f1_score from sklearn.metrics import mean_squared_error from sklearn import metrics from sklearn.metrics import roc_auc_score from sklearn.metrics import accuracy_score from sklearn.metrics import mean_absolute_error from sklearn.metrics import precision_score from imblearn.under_sampling import RandomUnderSampler from sklearn import preprocessing from collections import Counter
#Loading data from google.colab import files uploaded = files.upload()
Reading the data :
Understanding the dataset :
Transforming all categorical columns into numerical columns :
In the same way label encoder can be fitted to the column 'Vit D Deficiency' or else one hot encoder can also be used.
Considering the biomarkers(SystolicBP, DiastolicBP, Weight,BMI,Age>30,Vit D Deficiency) from visit 1 as X.
GDM_Diagnoised as y.
And while training a model in Machine Learning null or missing values cannot be present.
From the given dataset the patients with and without gdm are not balancing , to balance we can use either undersampling or oversampling. Here implementation of undersampling on model can be observed.
pip install imblearn from imblearn import under_sampling, over_sampling from collections import Counter
from imblearn.under_sampling import RandomUnderSampler rus=RandomUnderSampler(random_state=0) X_resampled, y_resampled = rus.fit_resample(X,y) print(sorted(Counter(y_resampled).items()),y_resampled.shape)
df2 = pd.DataFrame(X_resampled) df2.head()
Sampled data from undersampling will be used for training and testing of the model. For which logistic regression can be implemented as shown below with an accuracy of around 67%.
From the dataset, the biomarkers data can be given as the input and the can predict whether the patient is with GDM or wihout GDM by resulting the column with 'Yes' if patient is with GDM, if the patient is without GDM then the GDM diagnoised column is "no'.
If the patient is known with the chances of gestational diabetics then necessary precautions can be taken.
input_data = (165.0,112.0,60.6,20.407797,0,0) def gdm_diagnosis(input_data1): input_data_as_numpy_array = np.asarray(input_data) #changing input data to numpy array input_data_reshaped = input_data_as_numpy_array.reshape(1,-1) #reshape the array as we are predicting for one instance prediction = logmodel.predict(input_data_reshaped) print(prediction)
input_data = (138.0,63.0,94.5,38.387155,1.0,0.0) def gdm_diagnosis(input_data1): input_data_as_numpy_array = np.asarray(input_data) #changing input data to numpy array input_data_reshaped = input_data_as_numpy_array.reshape(1,-1) #reshape the array as we are predicting for one instance prediction = logmodel.predict(input_data_reshaped) print(prediction)
If the patient is known with the chances of gestational diabetics then necessary precautions can be taken ahead only.