We're using cookies, but you can turn them off in your browser settings. Otherwise, you are agreeing to our use of cookies. Learn more in our Privacy Policy

Extensions of Multiple Regression

2024 Curriculum CFA Program Level II Quantitative Methods

Learning Outcomes

The member should be able to:
  • describe influence analysis and methods of detecting influential data points
  • formulate and interpret a multiple regression model that includes qualitative independent variables
  • formulate and interpret a logistic regression model

Summary

  • Two kinds of observations may potentially influence regression results: (1) a high-leverage point, an observation with an extreme value of an independent variable, and (2) an outlier, an observation with an extreme value of the dependent variable. 
  • A measure for identifying a high-leverage point is leverage. If leverage is greater than 3 ( k + 1 _ n ) , where k is the number of independent variables, then the observation is potentially influential.
  • A measure for identifying an outlier is studentized residuals. If the studentized residual is greater than the critical value of the t-statistic with n – k – 2 degrees of freedom, then the observation is potentially influential. 
  • Cook’s distance, or Cook’s D (Di), is a metric for identifying influential data points. It measures how much the estimated values of the regression change if observation i is deleted. If 
    D i > 2 √ _ k / n , then it is highly likely to be influential. An influence plot visually presents leverage, studentized residuals, and Cook’s D for each observation. 
  • Dummy, or indicator, variables represent qualitative independent variables and take a value of 1 (for true) or 0 (for false) to indicate whether a specific condition applies, such as whether a company belongs to a certain industry sector. To capture n possible categories, the model must include n – 1 dummy variables. 
  • An intercept dummy adds to or reduces the original intercept if a specific condition is met. When the intercept dummy is 1, the regression line shifts up or down parallel to the base regression line.
  • A slope dummy allows for a changing slope if a specific condition is met. When the slope dummy is 1, the slope changes to (dj + bj) × Xj, where dj is the coefficient on the dummy variable and bj is the slope of Xj in the original regression line. 
  • A logistic regression model is one with a qualitative (i.e., categorical) dependent variable, so logistic regression is often used in binary classification problems, which are common in machine learning and neural networks. 
  • To estimate a logistic regression, the logistic transformation of the event probability (P) into the log odds, ln[P/(1 − P)], is applied, which linearizes the relation between the transformed dependent variable and the independent variables. 
  • Logistic regression coefficients are typically estimated using the maximum likelihood estimation (MLE) method, and slope coefficients are interpreted as the change in the log odds that the event happens per unit change in the independent variable, holding all other independent variables constant.
 

1.25 PL Credit

If you are a CFA Institute member don’t forget to record Professional Learning (PL) credit from reading this article.