# Basics of Multiple Regression and Underlying Assumptions

## Refresher reading access

## Introduction

Multiple linear regression uses two or more independent variables to describe the variation of the dependent variable rather than just one independent variable, as in simple linear regression. It allows the analyst to estimate using more complex models with multiple explanatory variables and, if used correctly, may lead to better predictions, better portfolio construction, or better understanding of the drivers of security returns. If used incorrectly, however, multiple linear regression may yield spurious relationships, lead to poor predictions, and offer a poor understanding of relationships.

The analyst must first specify the model and make several decisions in this process. The analyst must answer the following questions: What is the dependent variable of interest? What independent variables are important? What form should the model take? What is the goal of the model—prediction or understanding of the relationship?

The analyst specifies the dependent and independent variables and then employs software to estimate the model and produce related statistics. The good news is that the software does the estimation (see Exhibit 1). The analyst’s primary tasks are to specify the model and interpret the output from this software, which are the main subjects of this content.

Software Programs/Functions Excel Data Analysis > Regression Pythonscipy.stats.linregress

statsmodels.lm

sklearn.linear_model.LinearRegression R lm SASPROC REG

PROC GLM

STATA regress## Learning Outcomes

- describe the types of investment problems addressed by multiple linear regression and the regression process;
- formulate a multiple linear regression model, describe the relation between the dependent variable and several independent variables, and interpret estimated regression coefficients; and
- explain the assumptions underlying a multiple linear regression model and interpret residual plots indicating potential violations of these assumptions.

## Summary

- Multiple linear regression is used to model the linear relationship between one dependent variable and two or more independent variables.
- In practice, multiple regressions are used to explain relationships between financial variables, to test existing theories, or to make forecasts.
- The regression process covers several decisions the analyst must make, such as identifying the dependent and independent variables, selecting the appropriate regression model, testing if the assumptions behind linear regression are satisfied, examining goodness of fit, and making needed adjustments.
- We have presented the multiple linear regression model and discussed violations of regression assumptions, model specification and misspecification, and models with qualitative variables.
- A multiple regression model is represented by the following equation:

*Y**i *= *b*0 + *b*1*X*1*i *+ *b*2*X*2*i *+ *b*3*X*3*i *+ … + *b**k**X**ki *+ ε*i*, *i *= 1, 2, 3, …, *n*,

where *Y* is the dependent variable, *X*s are the independent variables from 1 to *k*, and the model is estimated using *n* observations.

- Coefficient
*b*0 is the model’s “intercept,” representing the expected value of*Y*if all independent variables are zero. - Parameters
*b*1 to*b**k*are the slope coefficients (or partial regression coefficients) for independent variables*X*1 to*X**k*. Slope coefficient*b**j*describes the impact of independent variable*X**j*on*Y*, holding all the other independent variables constant. - Five main assumptions underlying multiple regression models must be satisfied: (1) linearity, (2) homoskedasticity, (3) independence of errors, (4) normality, and (5) independence of independent variables.
- Diagnostic plots can help detect whether these assumptions are satisfied. Scatterplots of dependent versus and independent variables are useful for detecting nonlinear relationships, while residual plots are useful for detecting violations of homoskedasticity and independence of errors.

## 0.75 PL Credit

If you are a CFA Institute member don’t forget to record Professional Learning (PL) credit from reading this article.