Regression Definition Regression

Regression Definition

SHARE THE ARTICLE ON

Table of Contents

What is Regression?

Regression is a statistical tool that is leveraged in many different disciplines to help determine the strength and direction of the relationship between different variables; independent and dependent:

  • Dependent Variable: In a cause-and-effect relationship between two variables, the dependent variable is the effect. 
  • Independent Variable: In a cause-and-effect relationship between two variables, the dependent variable is the cause. 

Transform your insight generation process

Create an actionable feedback collection process.

online survey

Types of Regression

Regression Definition Regression

There are two basic types of regression:

  1. Simple Linear Regression: In this type of regression, there is only one x and one y variable. 
  2. Multiple Linear Regression: In this type of regression, there is one y variable and two or more x variables. 

It is important to note that the aforementioned regressions are methods of linear regression and cannot be used for non-linear data. Linear regression involves relating variables with a straight line while nonlinear regression relates the variables in a nonlinear (curved) relationship. For more complicated data and analysis, there are other methods of non-linear regression. 

Simple Linear Regression

Simple linear regression involves using one independent variable (x) to explain the outcome of the dependent variable (y). 

The formula for simple linear regression is: 

Y = a + bX + u

Where,

  • Y = the variable that you are trying to predict (dependent variable).
  • X = the variable that you are using to predict Y (independent variable).
  • a = the intercept.
  • b = the slope.
  • u = the regression residual

To understand when the appropriate use of linear regression, let’s consider the following example: 

If we were to assume height as the singular determinant of body weight, we could use the simple linear regression model to predict or explain the impact of a change in height on weight. 

Multiple Linear Regression

Multiple linear regression involves using two or more independent variables (x) to explain the outcome of the dependent variable (y).  

The formula for multiple linear regression is as follows: 

Y = a + b1X1 + b2X2 + b3X3 + … + btXt + u

Multiple linear regression is used when simple linear regression is not enough to account for the multiple real-life factors that influence the outcome of a dependent variable. 

Let’s continue with the previous example involving height and weight. Realistically, height is not the only determinant of weight. There are a lot of different factors that influence a person’s weight, such as diet and exercise, and therefore a more realistic model would contain multiple x variables (independent variable). 

Download Market Research Toolkit

Get market research trends guide, Online Surveys guide, Agile Market Research Guide & 5 Market research Template

Making the most of your B2B market research in 2021 PDF 3 s 1.png

Overfitting in Regression

Overfitting is a modelling error that occurs quite frequently in regression analysis. It takes place when a function or a model is too complex for the data and too many parameters are being estimated from a sample size that is too small. Although an overfitted model may fit your data well, it won’t align with additional test samples or the overall target population. 

When a model is overfitted, its p-values, R-Squared, and regression coefficients are likely to be very misleading. So how can we avoid overfitting? 

These are a few ways in which you can avoid overfitting your data: 

  • Gather more Data: Gathering more data will increase the accuracy of your model and minimize errors. 
  • Cross-Validation: Cross-validation involves using initial training data to generate multiple smaller train-test splits that can be used to tune your model. 
  • Data Augmentation: Data augmentation involves making available data sets appear diverse by making a sample’s data slightly different every time before processing it through the model. 
  • Feature Selection: This is a technique that involves penalizing the loss function to discourage the complexity of the overfitted model. 

 

See Voxco survey software in action with a Free demo.

FAQs on Regression

 Regression refers to the approach of modelling the relationship between variables to determine the strength and direction of their relationship.

The two main types of linear regression are simple linear regression and multiple linear regression.

Simple linear regression involves modelling the relationship between one independent variable (x) and one dependent variable (y). It is used when a dependent variable only has one determinant.

 Multiple linear regression involves modelling the relationship between two or more independent variables (x) and one dependent variable (y). It is used when a dependent variable has multiple determinants.

 Linear regression involves relating variables with a straight line while nonlinear regression relates the variables in a nonlinear (curved) relationship.

Net Promoter®, NPS®, NPS Prism®, and the NPS-related emoticons are registered trademarks of Bain & Company, Inc., Satmetrix Systems, Inc., and Fred Reichheld. Net Promoter Score℠ and Net Promoter System℠ are service marks of Bain & Company, Inc., Satmetrix Systems, Inc., and Fred Reichheld.

Explore all the survey question types
possible on Voxco

Read more