Extensions of Multiple Regression

2025 Curriculum CFA® Program Level II Quantitative Methods

Refresher reading access

Access the Full Reading in the Learning Ecosystem

Introduction

Two kinds of observations may potentially influence regression results: (1) a high-leverage point, an observation with an extreme value of an independent variable, and (2) an outlier, an observation with an extreme value of the dependent variable.
A measure for identifying a high-leverage point is leverage, the distance of an extreme independent variable from the average of the independent variable. If leverage is greater than 3(k + 1 _ n ) , where k is the number of independent variables and n is the number of observations, then the observation is potentially influential. A measure for identifying an outlier is studentized residuals, residuals from a model divided by its adjusted standard error. If the studentized residual is greater than a certain critical value of the t-statistic with (n − k − 2) degrees of freedom, then the observation is potentially influential.
Dummy, or indicator, variables represent qualitative independent variables and take a value of 1 (for true) or 0 (for false) to indicate whether a specific condition applies, such as whether a company belongs to a certain industry sector. To capture n possible categories, the model must include (n – 1) dummy variables.
An intercept dummy adds to or reduces the original intercept if a specific condition is met. When the intercept dummy is 1, the regression line shifts up or down parallel to the base regression line.
A slope dummy allows for a changing slope if a specific condition is met. When the slope dummy is 1, the slope changes to (dj + bj) × Xj, where dj is the coefficient on the dummy variable and bj is the slope of Xj in the original regression line.
A logistic regression model is one with a qualitative (i.e., categorical) dependent variable, so logistic regression is often used in binary classif ication problems, which are common in machine learning and neural networks.
To estimate a logistic regression, the logistic transformation of the event probability (P) into the log odds, ln[P/(1 − P)], is applied, which linearizes the relation between the transformed dependent variable and the independent variables.
Logistic regression coefficients are typically estimated using the maximum likelihood estimation (MLE) method, and slope coefficients are interpreted as the change in the log odds that the event happens per unit change in the independent variable, holding all other independent variables constant.

Learning Outcomes

The candidate should be able to:

describe influence analysis and methods of detecting influential data points;
formulate and interpret a multiple regression model that includes qualitative independent variables;
formulate and interpret a logistic regression model.

1.25 PL Credit

If you are a CFA Institute member don’t forget to record Professional Learning (PL) credit from reading this article.