Download Set Analysis Primer

x

Linear Regression

We live in a world in which machine learning is at the core of the fourth industrial revolution.

Basics

Linear regression is one of the simplest and most widely used machine learning techniques.

There are a plethora of practical applications of linear regression.

For example, obesity can be used to predict the chances of developing type 2 diabetes.

Or, a student’s GPA can be predicted based on the number of hours he/she spends studying.

Simply put, linear regression shows a relationship between two variables.

There is an independent variable called predictor and a dependent variable called response.

X is an independent variable and y is a dependent variable.

In the previous blog post, we learned how to build a linear regression model in Qlik Sense using built-in statistical functions.

Linear Regression in Python

Let’s create a linear regression model using the same data set, using Jupyter notebook so that we can compare both models for accuracy.

Step1: Let’s import required Python libraries first and then create a data frame using the read_csv method.

Step2: Now, let’s create a scatterplot chart to visualize the relationship between the independent and dependent variables.

Step3: Let’s calculate the coefficient of correlation and coefficient of determination.

The coefficient of correlation indicates a positive or negative correlation between the two variables. A value closer to 1 indicates a stronger positive correlation.

The coefficient of determination indicates if the data set is a good fit for the simple linear regression. A value closer to 1 is a good fit whereas a value closer to 0 is not.

Since our data set has a positive correlation between the independent and the dependent variables, and since it has a positive coefficient of determination, we can proceed further.

step 4: Let’s create a linear regression object using the linear_model method. We will use this object to predict the teenage birth rate for the age group 15-17 based on a given poverty rate.

step 5: Now that we have trained the model, let’s predict teenage birth rate for the poverty rate of 15% so that we can compare prediction with the model we have built-in Qlik Sense.

This teenage birth rate of 24.867% matches precisely with one that our Qlik Sense linear regression model has.

Summary

We accomplished our goal of comparing a linear regression model that we created in Qlik Sense against the one that we created in Python and both models output identical teenage birth rates.

Watch this video to see how I built this linear regression model in Python.

Do you aspire to become a certified Qlik Data Architect?

Join our Academy

Become a Pro member and let us help you take your Qlik skills to the next level

This image has an empty alt attribute; its file name is probenefits.png

If you enjoyed this article, get email updates (it’s free).

* indicates required
About the author

Leave a Reply