Refresher Reading
Machine Learning
2020 Curriculum CFA Program Level II Quantitative Methods
Introduction
Investment firms are increasingly using financial technology (fintech) at every step of the investment management value chain—from improving their understanding of clients, to uncovering new sources of alpha, to executing trades more efficiently. Machine learning techniques, a central part of that technology, are the subject of this reading. These techniques first appeared in finance in the 1990s and have since flourished with the explosion of data and cheap computing power.
This reading provides a highlevel view of machine learning (ML). It covers a selection of key ML algorithms and their investment applications. Investment practitioners should be equipped with a basic understanding of the types of investment problems that machine learning can address, an idea of how the algorithms work, and the vocabulary to interact with machine learning and data science experts. While investment practitioners need not master the details and mathematics of machine learning, as domain experts in investments they can play an important role by being able to source appropriate model inputs, interpret model outputs, and translate outputs into appropriate investment actions.
Section 2 gives an overview of machine learning in investment management. Section 3 defines machine learning and the types of problems that can be addressed by supervised and unsupervised learning. Section 4 describes evaluating machine learning algorithm performance. Key supervised machine learning algorithms are covered in Section 5, while Section 6 describes key unsupervised machine learning algorithms. Neural networks, deep learning nets, and reinforcement learning are covered in Section 7. The reading concludes with a summary.
Learning Outcomes
The member should be able to:
 distinguish between supervised machine learning, unsupervised machine learning, and deep learning;

describe overfitting and identify methods of addressing it;

describe supervised machine learning algorithms—including penalized regression, support vector machine, knearest neighbor, classification and regression tree, ensemble learning, and random forest—and determine the problems for which they are best suited;

describe unsupervised machine learning algorithms—including principal components analysis, kmeans clustering, and hierarchical clustering—and determine the problems for which they are best suited;
 describe neural networks, deep learning nets, and reinforcement learning.
Summary
Machine learning methods are gaining usage at many stages in the investment management value chain. Among the major points made are the following:

Machine learning aims at extracting knowledge from large amounts of data by learning from known examples to determine an underlying structure in the data. The emphasis is on generating structure or predictions without human intervention. An elementary way to think of ML algorithms is to “find the pattern, apply the pattern.”

Supervised learning depends on having labeled training data as well as matched sets of observed inputs (X’s, or features) and the associated output (Y, or target). It can be divided into two categories: regression and classification. If the target variable to be predicted is continuous, then the task is one of regression. If the target variable is categorical or ordinal (e.g., determining a firm’s rating), then it is a classification problem.

With unsupervised learning, algorithms are trained with no labeled data, so they must infer relations between features, summarize them, or present an interesting underlying structure in their distributions that has not been explicitly provided. Two important types of problems well suited to unsupervised ML are dimension reduction and clustering.

Another category of ML algorithm includes deep learning (based on neural networks) in which a computer learns from interacting with itself. Sophisticated algorithms address such highly complex tasks as image classification, face recognition, speech recognition and natural language processing, and reinforcement learning.

Generalization describes the degree to which an ML model retains its explanatory power when predicting outofsample. Overfitting, a primary reason for lack of generalization, is the tendency of ML algorithms to tailor models to the training data at the expense of generalization to new data points.

Bias error is the degree to which a model fits the training data. Variance error describes how much a model’s results change in response to new data from validation and test samples. Base error is due to randomness in the data. Outofsample error equals bias error plus variance error plus base error.

Kfold crossvalidation is a technique for mitigating the holdout sample problem (excessive reduction of the training set size). The data (excluding test sample and fresh data) are shuffled randomly and then divided into k equal subsamples, with k – 1 samples used as training samples and one sample, the kth, used as a validation sample.

Regularization describes methods that reduce statistical variability in high dimensional data estimation or prediction problems.

LASSO (least absolute shrinkage and selection operator) is a popular type of penalized regression where the penalty term involves summing the absolute values of the regression coefficients. The greater the number of included features, the larger the penalty. So, a feature must make a sufficient contribution to model fit to offset the penalty from including it.

Support vector machine (SVM) is a linear classifier that aims to seek the optimal hyperplane—the one that separates the two sets of data points by the maximum margin (and thus is typically used for classification).

Knearest neighbor (KNN) is a supervised learning technique most often used for classification. The idea is to classify a new observation by finding similarities (“nearness”) between it and its knearest neighbors in the existing data set.

Classification and regression tree (CART) can be applied to predict either a categorical target variable, producing a classification tree, or a continuous target variable, producing a regression tree.

A binary CART is a combination of an initial root node, decision nodes, and terminal nodes. The root node and each decision node represent a single feature (f) and a cutoff value (c) for that feature. The CART algorithm iteratively partitions the data into subgroups until terminal nodes are formed that contain the predicted label.

Ensemble learning is a technique of combining the predictions from a collection of models. It typically produces more accurate and more stable predictions than the best single model.

A random forest classifier is a collection of many different decision trees generated by a bagging method or by randomly reducing the number of features available during training.

Principal components analysis (PCA) is an unsupervised ML algorithm that reduces highly correlated features into fewer uncorrelated composite variables by transforming the feature covariance matrix. PCA produces eigenvectors that define the principal components (i.e., the new uncorrelated composite variables) and eigenvalues, which give the proportion of total variance in the initial data that is explained by each eigenvector and its associated principal component.

Kmeans is an unsupervised ML algorithm that partitions observations into a fixed number (k) of nonoverlapping clusters. Each cluster is characterized by its centroid, and each observation belongs to the cluster with the centroid to which that observation is closest.

Hierarchical clustering is an unsupervised iterative algorithm that is used to build a hierarchy of clusters. Two main strategies are used to define the intermediary clusters (i.e., those clusters between the initial data set and the final set of clustered data).

Agglomerative (bottomup) hierarchical clustering begins with each observation being its own cluster. Then, the algorithm finds the two closest clusters, defined by some measure of distance, and combines them into a new, larger cluster. This process is repeated until all observations are clumped into a single cluster.

Divisive (topdown) hierarchical clustering starts with all observations belonging to a single cluster. The observations are then divided into two clusters based on some measure of distance. The algorithm then progressively partitions the intermediate clusters into smaller clusters until each cluster contains only one observation.

Neural networks consist of nodes connected by links. They have three types of layers: an input layer, hidden layers, and an output layer. Learning takes place in the hidden layer nodes, each of which consists of a summation operator and an activation function. Neural networks have been successfully applied to a variety of investment tasks characterized by nonlinearities and complex interactions among variables.

Neural networks with many hidden layers (at least 3 but often more than 20) are known as deep learning nets (DLNs) and are the backbone of the artificial intelligence revolution.

The RL algorithm involves an agent that should perform actions that will maximize its rewards over time, taking into consideration the constraints of its environment.