We live in a world in which machine learning is at the core of the fourth industrial revolution.
Linear regression is one of the simplest and most widely used machine learning techniques.
There are a plethora of practical applications of linear regression.
For example, obesity can be used to predict the chances of developing type 2 diabetes.
Or, a student’s GPA can be predicted based on the number of hours he/she spends studying.
Simply put, linear regression shows a relationship between two variables.
There is an independent variable called predictor and a dependent variable called response.
X is an independent variable and y is a dependent variable.
In the previous blog post, we learned how to build a linear regression model in Qlik Sense using built-in statistical functions.
Linear Regression in Python
Let’s create a linear regression model using the same data set, using Jupyter notebook so that we can compare both models for accuracy.
Step1: Let’s import required Python libraries first and then create a data frame using the read_csv method.
Step2: Now, let’s create a scatterplot chart to visualize the relationship between the independent and dependent variables.
Step3: Let’s calculate the coefficient of correlation and coefficient of determination.
The coefficient of correlation indicates a positive or negative correlation between the two variables. A value closer to 1 indicates a stronger positive correlation.
The coefficient of determination indicates if the data set is a good fit for the simple linear regression. A value closer to 1 is a good fit whereas a value closer to 0 is not.
Since our data set has a positive correlation between the independent and the dependent variables, and since it has a positive coefficient of determination, we can proceed further.
step 4: Let’s create a linear regression object using the linear_model method. We will use this object to predict the teenage birth rate for the age group 15-17 based on a given poverty rate.
step 5: Now that we have trained the model, let’s predict teenage birth rate for the poverty rate of 15% so that we can compare prediction with the model we have built-in Qlik Sense.
This teenage birth rate of 24.867% matches precisely with one that our Qlik Sense linear regression model has.
We accomplished our goal of comparing a linear regression model that we created in Qlik Sense against the one that we created in Python and both models output identical teenage birth rates.
Do you aspire to become a certified Qlik Data Architect?
Join our Academy
Become a Pro member and let us help you take your Qlik skills to the next level