CFA Level 1 — Quantitative Methods
Intuition-First Study Guide · All 11 Readings
Rates and Returns
Before you can analyse any investment, you need to measure its return. This reading gives you a toolkit of return measures, each suited to a different question: "How did this one security do?" (HPR), "What did the average manager return over time?" (time-weighted), "How did my portfolio do given my specific cash flows?" (money-weighted), and "How should I compare apples to oranges across compounding frequencies?" (EAR/APR conversions). Each measure tells a different story — the exam loves testing whether you pick the right one.
Holding Period Return (HPR) — The Foundation
The most basic return measure: what did you earn on one investment over one holding period? It captures both income (dividends/coupons) and price appreciation in a single number.
Where \(P_0\) = beginning price, \(P_1\) = ending price, \(CF_1\) = cash received (dividend, coupon).
You buy a stock at €50. It pays a €2 dividend and rises to €56 at year-end.
HPR = (56 − 50 + 2) / 50 = 8/50 = 16%
Without the dividend you might think: "I earned 12% on price appreciation." The HPR correctly shows you earned 16% total.
Three Types of Mean Returns Critical
1. Arithmetic Mean Return
Simple average of periodic returns. Best use: estimate the expected return in a single future period, given a history of returns.
2. Geometric Mean Return
The compound annual growth rate (CAGR). Best use: describe the actual historical growth rate of a portfolio that was left to compound. Always ≤ arithmetic mean (equal only when all returns are identical).
Year 1: +50%. Year 2: −50%.
Arithmetic mean = (50% + (−50%)) / 2 = 0% — sounds fine.
Geometric mean = √(1.50 × 0.50) − 1 = √0.75 − 1 = −13.4%
You invested €100. After Year 1: €150. After Year 2: €75. You lost money! The geometric mean reflects reality; the arithmetic mean was dangerously misleading.
3. Harmonic Mean Return
Used specifically for dollar-cost averaging — when you invest a fixed €amount each period (not a fixed number of shares). The average cost per share when investing equal amounts is the harmonic mean of prices.
Relationship: Harmonic Mean ≤ Geometric Mean ≤ Arithmetic Mean (equality holds only if all values are identical).
Money-Weighted vs. Time-Weighted Returns Critical
This is one of the most heavily tested concepts in the entire CFA curriculum. The key question: whose decision are we evaluating?
⏱ Time-Weighted Return (TWR)
Purpose: Evaluate the portfolio manager's skill — eliminates the impact of the investor's own cash flow timing decisions.
Method: Divide the period at each external cash flow. Calculate HPR for each sub-period. Compound them.
CFA standard for comparing managers.
💰 Money-Weighted Return (MWR)
Purpose: Evaluate the investor's actual experience — includes timing of contributions and withdrawals.
Method: It is the IRR of all cash flows (outflows = investments, inflows = withdrawals + ending value).
Best for personal wealth tracking.
Setup: Start of Year 1: Invest €100. Year 1 return = +50% → Portfolio = €150. End of Year 1: Invest additional €150 → Portfolio = €300. Year 2 return = −10% → Portfolio = €270.
TWR: Sub-period 1 HPR = +50%. Sub-period 2 HPR = −10%.
TWR = (1.50 × 0.90)^(1/2) − 1 = (1.35)^0.5 − 1 = 16.2% per year
MWR: Cash flows: t=0: −100, t=1: −150, t=2: +270. Solve for IRR.
−100 − 150/(1+r) + 270/(1+r)² = 0. Solving: r ≈ −4.1% per year
The investor had poor timing (invested big just before the bad year). TWR says the manager was skilled (+16%); MWR says the investor's experience was poor (−4%).
Interest Rate Conversions — EAR, APR, and Compounding
Effective Annual Rate (EAR)
EAR is the true annual return after accounting for intra-year compounding. It's the standardised way to compare investments with different compounding frequencies.
Where \(m\) = number of compounding periods per year.
Bank A offers 6% compounded monthly. Bank B offers 6.1% compounded annually. Which is better?
Bank A EAR = (1 + 0.06/12)^12 − 1 = (1.005)^12 − 1 = 6.168%
Bank B EAR = 6.1% (already annual)
Bank A is better despite a lower stated rate. Compounding frequency matters enormously.
Real vs. Nominal Returns
A nominal return includes inflation. A real return strips it out, showing purchasing power growth.
The Time Value of Money in Finance
TVM is the engine behind every valuation model in the CFA curriculum. Bond pricing, equity valuation (dividend discount model), capital budgeting (NPV), mortgage payments, pension funding — all reduce to one question: how do we move cash flows across time? Master the mechanics here and every subsequent topic becomes easier. This reading is also extremely high-yield: TVM calculator questions appear in almost every exam.
The Core Principle
Future Value and Present Value Formula
Where \(r\) = interest rate per period, \(N\) = number of periods.
Annuities — Repeated Equal Cash Flows Critical
An annuity is a series of equal payments at equal intervals. Mortgages, lease payments, pension distributions — all are annuities.
Ordinary Annuity (payments at END of period — most common)
Annuity Due (payments at BEGINNING of period)
Simply multiply the ordinary annuity by (1+r) — you receive each payment one period earlier, so each is worth more by one period of interest.
A €200,000 mortgage at 5% annually, monthly payments, 30-year term. What is the monthly payment?
Monthly rate: r = 5%/12 = 0.4167%. Periods: N = 360.
200,000 = PMT × [1 − (1.004167)^{−360}] / 0.004167
200,000 = PMT × 186.28
PMT = 200,000 / 186.28 = €1,073.64/month
Note: Over 30 years you pay 360 × €1,073.64 = €386,510 — nearly double the loan amount!
Perpetuities — Payments Forever
A perpetuity pays the same amount forever. While no real instrument literally pays forever, this formula is used to value preferred stocks and is the basis for the Gordon Growth Model.
Net Present Value (NPV) and Internal Rate of Return (IRR) Critical
NPV Decision Rule: Invest if NPV > 0 (project adds value). IRR is the discount rate that makes NPV = 0. IRR Decision Rule: Invest if IRR > required rate of return (hurdle rate).
NPV — Preferred Method
Directly measures value added in currency (€/£/$). Assumes cash flows are reinvested at the discount rate. Always gives the correct decision for independent projects.
IRR — Popular but Flawed
Gives a percentage return — easy to communicate. Assumes reinvestment at the IRR (often unrealistic). Can give multiple values or misleading rankings for mutually exclusive projects.
Loan Amortisation
Each loan payment consists of an interest component and a principal repayment component. Over time, the proportion shifts: early payments are mostly interest; later payments are mostly principal. This is why you barely reduce your mortgage principal in the early years.
€10,000 loan, 8% annual, 5-year term, annual payments.
Annual PMT = 10,000 / [(1 − 1.08^{−5})/0.08] = 10,000 / 3.9927 = €2,504.56
Year 1: Interest = 10,000 × 8% = €800. Principal = 2,504.56 − 800 = €1,704.56. Remaining balance = €8,295.44.
Year 2: Interest = 8,295.44 × 8% = €663.64. Principal = 2,504.56 − 663.64 = €1,840.92. Remaining balance = €6,454.52.
Statistical Measures of Asset Returns
Risk in finance is primarily measured through statistics. This reading gives you the vocabulary: mean (where returns tend to be), variance/standard deviation (how spread out they are), skewness (are losses worse than gains?), and kurtosis (how often do extreme events occur?). These measures directly connect to risk management — a portfolio with negative skew and high kurtosis (fat tails) is far more dangerous than its standard deviation alone suggests.
Measures of Central Tendency
Arithmetic Mean
Geometric Mean (for returns — covered in R1)
Use geometric mean for compounding problems; arithmetic mean for expected value of a single future period.
Weighted Mean
Used for portfolio returns: weight each asset's return by its portfolio weight.
Median and Mode
Median = middle value when sorted (50th percentile). Mode = most frequently occurring value. For skewed distributions, the median is a better measure of "typical" because it's less affected by outliers.
Measures of Dispersion
Variance and Standard Deviation Formula
Coefficient of Variation (CV) — Relative Risk Exam Favourite
CV answers: "How much risk am I taking per unit of return?" Lower CV = better risk-adjusted performance (all else equal). Unlike standard deviation, CV allows comparison across investments with different return scales.
Fund A: Mean return 10%, Std Dev 8% → CV = 8/10 = 0.80
Fund B: Mean return 20%, Std Dev 14% → CV = 14/20 = 0.70
Fund B has higher absolute risk, but less risk per unit of return. An investor focused on risk-adjusted returns would prefer Fund B.
Sharpe Ratio — Risk-Adjusted Performance
The Sharpe ratio measures excess return (above the risk-free rate) per unit of total risk. Unlike CV, it uses excess return, making it the standard for comparing risky portfolios.
Skewness — Asymmetry of the Distribution Critical
| Skewness | Shape | Mean vs. Median vs. Mode | Implication for Finance |
|---|---|---|---|
| Positive (right skew) | Long right tail | Mode < Median < Mean | Rare large gains; most returns are below the mean. Think: lottery tickets, venture capital. |
| Zero (symmetric) | Normal distribution | Mode = Median = Mean | The ideal assumption. Reality rarely looks like this. |
| Negative (left skew) | Long left tail | Mean < Median < Mode | Rare large losses; most returns are above the mean. Think: most equity portfolios, credit strategies, short volatility. |
Kurtosis — Fat Tails Critical
Kurtosis measures the "tailedness" of a distribution — how much weight is in the extreme outcomes compared to a normal distribution.
| Type | Excess Kurtosis | Tails vs. Normal | Finance Example |
|---|---|---|---|
| Leptokurtic | > 0 | Fatter tails (more extreme events) | Most financial return series — market crashes happen more often than the normal distribution predicts |
| Mesokurtic | = 0 | Normal distribution tails | The theoretical benchmark |
| Platykurtic | < 0 | Thinner tails (fewer extremes) | Rare in finance |
Probability Trees and Conditional Expectations
Probability is the mathematical language of uncertainty. Every investment decision involves uncertain outcomes, and this reading gives you the formal tools to reason about them. The practical payoff: probability trees help you model scenarios (recession/expansion, default/no default); expected value helps you price risky assets; Bayes' theorem lets you update beliefs rationally as new information arrives — which is exactly what good analysts do.
Core Probability Rules
| Concept | Definition | Key Property |
|---|---|---|
| Mutually exclusive | Cannot both occur: P(A∩B) = 0 | P(A∪B) = P(A) + P(B) (no subtraction needed) |
| Exhaustive | Cover all possibilities: P(A) + P(B) + … = 1 | Forms a complete probability space |
| Independent | P(A|B) = P(A) — knowing B tells you nothing about A | P(A∩B) = P(A) × P(B) |
| Dependent | Knowing B changes the probability of A | Must use conditional probability |
Conditional Probability
"The probability of A given B has occurred" — we've restricted our probability space to only the scenarios where B is true.
60% of companies in a sector are investment grade (IG). Of IG companies, 5% default within 5 years. Of non-IG companies, 40% default within 5 years.
P(Default | IG) = 5%; P(Default | Not IG) = 40%; P(IG) = 0.60; P(Not IG) = 0.40.
P(Default) = P(D|IG)×P(IG) + P(D|Not IG)×P(Not IG) = 0.05×0.60 + 0.40×0.40 = 0.03 + 0.16 = 19%
This is the Total Probability Rule — a fundamental tool for building scenario models.
Bayes' Theorem — Updating Beliefs Exam Favourite
Bayes' theorem answers: "Given that an event occurred, what is the revised probability of its cause?" It's how rational people update beliefs when new information arrives.
Prior probability of a recession: P(Rec) = 30%. During recessions, a yield curve inverts 80% of the time: P(Invert|Rec) = 80%. During expansions, inversions still occur 20% of the time: P(Invert|Exp) = 20%. The yield curve has just inverted. What is the revised probability of recession?
P(Invert) = P(Invert|Rec)×P(Rec) + P(Invert|Exp)×P(Exp) = 0.80×0.30 + 0.20×0.70 = 0.24 + 0.14 = 0.38
P(Rec|Invert) = P(Invert|Rec) × P(Rec) / P(Invert) = (0.80 × 0.30) / 0.38 = 0.24 / 0.38 = 63.2%
Prior: 30% recession. After observing inversion: 63.2% recession. Bayes updated our belief rationally.
Expected Value and Variance of a Random Variable
Properties of Expected Value
- \(E(aX + b) = a \cdot E(X) + b\)
- \(E(X + Y) = E(X) + E(Y)\) (always, regardless of dependence)
- \(Var(aX + b) = a^2 \cdot Var(X)\) (adding a constant doesn't change variance)
- \(Var(X + Y) = Var(X) + Var(Y) + 2 \cdot Cov(X,Y)\) (the portfolio variance formula!)
Var(R) = 0.4×(30−8.75)² + 0.35×(5−8.75)² + 0.25×(−20−8.75)² = 0.4×451.56 + 0.35×14.06 + 0.25×826.56 = 180.63 + 4.92 + 206.64 = 392.19 (so σ = 19.8%)
Portfolio Mathematics
This is the mathematical core of Modern Portfolio Theory. The key insight: portfolio risk is NOT the weighted average of individual risks. Because assets don't move in perfect lockstep (they're not perfectly correlated), combining them reduces total risk. The lower the correlation, the more risk reduction diversification provides. This is why diversification is sometimes called "the only free lunch in finance."
Portfolio Expected Return
Portfolio expected return IS the weighted average of individual expected returns — simple and linear.
Portfolio Variance — The Non-Trivial Part Critical
Two-Asset Portfolio
So the formula can also be written:
Asset A: E(R) = 10%, σ = 20%. Asset B: E(R) = 8%, σ = 15%. Correlation ρ = 0.3. Portfolio: 50/50 split.
Portfolio Expected Return = 0.5×10% + 0.5×8% = 9%
Portfolio Variance = (0.5)²(0.20)² + (0.5)²(0.15)² + 2(0.5)(0.5)(0.3)(0.20)(0.15)
= 0.25×0.04 + 0.25×0.0225 + 2×0.25×0.3×0.03 = 0.01 + 0.005625 + 0.0045 = 0.020125
Portfolio Std Dev = √0.020125 = 14.19%
Simple average of standard deviations = 0.5×20% + 0.5×15% = 17.5%. Portfolio std dev = 14.19% — lower than the weighted average! That's the diversification benefit from ρ = 0.3 < 1.
The Role of Correlation in Diversification Critical
| Correlation (ρ) | Diversification Benefit | Can Portfolio Risk = 0? |
|---|---|---|
| ρ = +1 (perfect positive) | None — portfolio risk = weighted average of individual risks | No — this is the maximum risk scenario |
| 0 < ρ < 1 | Partial — risk is below weighted average | No — but significantly reduced |
| ρ = 0 (uncorrelated) | Maximum for long positions without shorting | No (unless infinite assets) |
| −1 < ρ < 0 | Greater than zero-correlation case | No — but substantial reduction |
| ρ = −1 (perfect negative) | Maximum possible | Yes — perfect hedge exists |
Covariance Matrix Formula
For a portfolio with n assets, variance requires n variances and n(n−1)/2 unique covariances. With 100 assets, you need 100 variances + 4,950 covariances = 5,050 pieces of information. This is why covariance estimation is the hardest part of portfolio construction in practice.
Note: when i = j, Cov(R_i, R_i) = Var(R_i) = σ_i² — so the variance terms are just special cases of the covariance matrix diagonal.
Minimum Variance Portfolio (Two Assets)
The weight in Asset 1 that minimises portfolio variance:
σ²_p = (0.6)²(0.25)² + (0.4)²(0.15)² + 2(0.6)(0.4)(0.015)
= 0.36×0.0625 + 0.16×0.0225 + 0.48×0.015
= 0.0225 + 0.0036 + 0.0072 = 0.0333 → σ_p = 18.25%
Weighted average = 0.6×25%+0.4×15% = 21%. Diversification saved you 2.75 percentage points of risk.
Simulation Methods
Many financial problems — option pricing with complex payoffs, portfolio risk under non-normal distributions, retirement planning under uncertainty — cannot be solved analytically. Simulation methods attack these problems by generating thousands of possible scenarios and observing the resulting distribution of outcomes. The three methods (historical simulation, Monte Carlo, and bootstrap) differ in how they generate those scenarios, and each has strengths and weaknesses the exam tests directly.
Historical Simulation
Use actual past data (historical returns, historical factor moves) directly — resample from the empirical distribution, preserving the real-world distribution including skewness, kurtosis, and any correlations present in the data.
✅ Strengths
No distributional assumptions needed. Automatically captures fat tails, skewness, and real correlations. Includes actual historical crises (2008, 2000, 1987).
❌ Weaknesses
Constrained by history — if a scenario has never occurred, it has zero probability. Can't simulate scenarios more extreme than the historical worst case. Sample size is limited.
Monte Carlo Simulation Critical
Specify a probability model (e.g., normal distribution with given mean and variance, or a correlated multivariate distribution), then use a computer to draw thousands of random samples from that model and simulate outcomes.
Assume stock returns are normally distributed: μ = 7%, σ = 15%. Simulate 10,000 possible 30-year return paths. For each path, calculate whether the investor's portfolio lasts through retirement. Count the percentage of paths that succeed. Result: "There's a 73% probability your portfolio lasts 30 years at your current spending rate."
✅ Strengths
Can simulate ANY scenario — including ones that have never occurred. Can incorporate complex dependencies, option payoffs, path-dependent features. Highly flexible.
❌ Weaknesses
"Garbage in, garbage out" — wrong distributional assumptions produce wrong results. Computationally intensive. The normal distribution assumption is particularly dangerous for tail risk.
Bootstrap Simulation
A hybrid approach: resample WITH REPLACEMENT from the actual historical data. Unlike historical simulation which uses each observation once, bootstrap can draw the same observation multiple times, creating new synthetic histories.
| Method | Source of Randomness | Key Advantage | Key Limitation |
|---|---|---|---|
| Historical | Actual past returns, used sequentially | No assumptions; captures real crises | Limited to what happened; can't exceed historical extremes |
| Monte Carlo | Specified probability distribution | Flexible; can model any scenario | Results depend entirely on assumed distribution |
| Bootstrap | Past data, resampled with replacement | Grounded in data; more scenarios than history | Can't generate truly new tail events |
Estimation and Inference
Every financial analysis is built on estimates — the expected return, the beta, the mean earnings. But estimates from samples are imperfect. This reading teaches you to quantify that imperfection: how large is the sampling error? How confident can we be that the true parameter lies within a given range? These concepts (standard error, confidence intervals, the Central Limit Theorem) underpin all of hypothesis testing and regression analysis.
Sampling and the Central Limit Theorem
Point Estimates
A single number used to estimate a population parameter. The sample mean \(\bar{x}\) estimates population mean \(\mu\). The sample variance \(s^2\) estimates population variance \(\sigma^2\).
Standard Error of the Mean Formula
The standard deviation of the sampling distribution of the mean — how much would \(\bar{x}\) vary across different random samples?
Central Limit Theorem (CLT) Critical
Regardless of the shape of the underlying population distribution, the sampling distribution of the sample mean approaches a normal distribution as n increases (roughly n ≥ 30). This is the foundational justification for using normal distribution tools in hypothesis testing.
Confidence Intervals Critical
A confidence interval gives a range of values that is expected to contain the true population parameter with a specified probability (confidence level).
| Confidence Level | α | z-critical value (two-tailed) |
|---|---|---|
| 90% | 0.10 | ±1.645 |
| 95% | 0.05 | ±1.960 |
| 99% | 0.01 | ±2.576 |
A fund's monthly returns over 36 months have a mean of 1.2% and a sample standard deviation of 3.6%. Construct a 95% CI for the true mean monthly return.
SE = 3.6% / √36 = 3.6% / 6 = 0.6%
Since n = 36 (large), use z: 95% CI = 1.2% ± 1.96 × 0.6% = 1.2% ± 1.176% = (0.024%, 2.376%)
Interpretation: We are 95% confident the true mean monthly return lies between 0.024% and 2.376%. Since the entire interval is above zero, we have evidence the manager generates positive returns.
When to Use z vs. t Distribution Exam Favourite
| Scenario | Distribution | Reasoning |
|---|---|---|
| Population σ known, normally distributed population | z | Exact z-test applies |
| Population σ unknown, n ≥ 30 | z or t | CLT makes z approximately valid; t is technically more correct but difference is small |
| Population σ unknown, n < 30, normally distributed | t (with n−1 df) | t distribution has fatter tails to account for extra uncertainty from estimating σ |
| Population σ unknown, n < 30, non-normal | Neither — use non-parametric | Both z and t assume approximate normality |
Hypothesis Testing
Hypothesis testing is the engine behind quantitative research. Every claim in finance — "this manager beats the benchmark," "momentum is a real factor," "this economic variable predicts returns" — must survive a hypothesis test. The framework is always the same: state a null hypothesis (the boring baseline), compute how unlikely your sample result would be if the null were true, and decide whether to reject the null. This reading is one of the most conceptually dense in the curriculum — and one of the most heavily tested.
The Hypothesis Testing Framework
Step 1: State the Hypotheses
H₀ (null hypothesis): The baseline claim to be tested. Usually the "no effect" or "no difference" statement. Always includes equality (=, ≤, ≥).
Hₐ (alternative hypothesis): What we're trying to find evidence for. Sets the direction of the test.
| Test Type | H₀ | Hₐ | Rejection Region |
|---|---|---|---|
| Two-tailed | θ = θ₀ | θ ≠ θ₀ | Both tails (split α/2 each side) |
| One-tailed (right) | θ ≤ θ₀ | θ > θ₀ | Right tail only (full α) |
| One-tailed (left) | θ ≥ θ₀ | θ < θ₀ | Left tail only (full α) |
Step 2: Choose Significance Level (α)
α = the maximum probability of making a Type I error (rejecting a true H₀) you're willing to accept. Common choices: 1%, 5%, 10%.
Step 3: Compute the Test Statistic
Step 4: Compare to Critical Value and Make Decision
If |test statistic| > critical value (or equivalently, p-value < α): Reject H₀. Otherwise: Fail to reject H₀.
Type I and Type II Errors Critical
| H₀ is TRUE | H₀ is FALSE | |
|---|---|---|
| Reject H₀ | Type I Error (α) — False Positive | Correct Decision (Power = 1−β) |
| Fail to reject H₀ | Correct Decision (1−α) | Type II Error (β) — False Negative |
The key trade-off: Lowering α (being more strict) reduces Type I errors but increases Type II errors. You can only reduce BOTH by increasing sample size (n).
p-Value — The Modern Standard Exam Favourite
The p-value is the probability of obtaining a test statistic as extreme or more extreme than the one observed, assuming H₀ is true.
Tests Beyond the Mean
Chi-Square Test (χ²) — Testing Variance
Tests whether a population variance equals a hypothesised value. The chi-square distribution is always positive and right-skewed, so critical values are asymmetric for one-tailed tests.
F-Test — Comparing Two Variances
Tests whether two population variances are equal. Convention: put the larger variance in the numerator (F ≥ 1). Used heavily in regression to test overall model fit.
Parametric and Non-Parametric Tests of Independence
Much of investment research tests for relationships: "Does past performance predict future returns?" "Is this factor correlated with stock returns?" "Do earnings revisions lead price movements?" This reading covers the statistical tools for answering such questions: parametric tests (which assume distributions) and non-parametric alternatives (which make no distributional assumptions). Knowing when to use which is a key exam skill.
Parametric Tests — When Distribution Assumptions Hold
t-Test for Correlation Formula
Tests whether the population correlation coefficient ρ is significantly different from zero (i.e., whether two variables are actually linearly related).
Where \(r\) = sample correlation coefficient, \(n\) = number of observations.
A researcher calculates r = 0.45 between GDP growth and stock returns using n = 30 observations. Is this significant at α = 5% (two-tailed)?
t = 0.45 × √(30−2) / √(1−0.45²) = 0.45 × √28 / √(0.7975) = 0.45 × 5.292 / 0.893 = 2.67
Critical value: t₀.₀₂₅, 28 ≈ 2.048. Since 2.67 > 2.048, reject H₀ (ρ = 0). The correlation is statistically significant.
Paired Comparison t-Test
Used when observations come in natural pairs (e.g., the same company before and after a policy change, or matched pairs of funds). You test whether the mean difference is zero.
Where \(\bar{d}\) = mean of paired differences, \(s_d\) = std dev of differences, \(\mu_{d_0}\) = hypothesised mean difference (usually 0).
Non-Parametric Tests — When Assumptions Break Down
Spearman Rank Correlation Exam Favourite
The non-parametric analogue of Pearson correlation. Rank both variables, then apply the standard Pearson correlation formula to the ranks.
Where \(d_i\) = difference in ranks for the i-th pair. This formula assumes no tied ranks.
| Condition | Use Parametric (Pearson t-test) | Use Non-Parametric (Spearman) |
|---|---|---|
| Distribution | Approximately normal | Non-normal, unknown, or heavily skewed |
| Sample size | Any (CLT helps for large n) | Small samples especially |
| Data type | Continuous ratio/interval data | Ordinal data, or ranked data |
| Outliers | Sensitive to outliers | Robust to outliers (uses ranks) |
| Power | Higher (when assumptions met) | Lower (loses information by ranking) |
Runs Test — Testing for Independence/Randomness
Tests whether a sequence of observations is random (independent). Counts "runs" — consecutive sequences of the same sign (e.g., consecutive positive or negative returns).
Too few runs: positive serial correlation (trending). Too many runs: negative serial correlation (mean-reverting). Exact random walk: number of runs consistent with random probability.
Simple Linear Regression
Regression is the workhorse of quantitative finance. Factor models (CAPM, Fama-French), earnings forecasting, economic prediction — all rely on regression. Simple linear regression (one dependent variable, one independent variable) gives you the conceptual foundation. Understand the mechanics here and multiple regression (not in Level 1) follows naturally. The exam tests both the mechanics AND interpretation of outputs.
The Regression Model
Where \(Y_i\) = dependent variable (what we're predicting), \(X_i\) = independent variable (the predictor), \(b_0\) = intercept, \(b_1\) = slope coefficient, \(\varepsilon_i\) = error term (residual).
Ordinary Least Squares (OLS) Estimation Formula
OLS minimises the sum of squared residuals (SSE). This gives the "best fit" line through the data.
Measuring Fit: R², SEE, and ANOVA Critical
The total variation in Y (SST) can be decomposed into explained variation (SSR) and unexplained variation (SSE):
R² = the proportion of variation in Y explained by X. R² = 1 means perfect fit; R² = 0 means X explains nothing about Y.
SEE = the typical size of a prediction error. Lower SEE = better model fit.
Regression of excess stock returns (Y) on market excess returns (X):
ŷ = 0.5% + 1.3X, with R² = 0.65, SEE = 4.2%
• Intercept (0.5%): The stock earns 0.5% per period even when the market return equals zero (Jensen's α)
• Slope (1.3): For every 1% move in the market, this stock is expected to move 1.3% (β > 1, aggressive stock)
• R² = 0.65: 65% of the variation in the stock's return is explained by market movements; 35% is idiosyncratic
• SEE = 4.2%: Typical prediction error is ±4.2%
Testing the Slope Coefficient Critical
H₀: b₁ = 0 (the independent variable has no linear relationship with Y). The t-test:
Where \(s_{\hat{b}_1}\) = standard error of the slope estimate.
F-Test — Overall Regression Significance
In simple regression, the F-test and the t-test for the slope coefficient test the same hypothesis. F = t². Always one-tailed (reject if F > critical value).
Assumptions of OLS Regression
| Assumption | What It Means | If Violated… |
|---|---|---|
| Linearity | The true relationship is linear | Model is misspecified; residuals show patterns |
| No heteroskedasticity | Var(ε) is constant across all X values | Standard errors are wrong; t-tests and F-tests invalid |
| No autocorrelation | Residuals are not correlated with each other over time | Standard errors understated; t-statistics inflated |
| No multicollinearity | Predictors are not highly correlated (multiple regression) | Coefficients unstable, large standard errors |
| Normal residuals | ε ~ N(0, σ²) | t and F tests may be unreliable in small samples |
Introduction to Big Data Techniques
The investment industry is being transformed by big data. Hedge funds analyse satellite imagery of parking lots to forecast retail earnings. Algorithms mine earnings call transcripts for sentiment signals before human analysts can read them. Machine learning models identify non-linear patterns that classical regression misses. This reading introduces the vocabulary and core concepts — you won't be asked to code, but you must understand how these methods work, their strengths and limitations, and the critical problem of overfitting.
Types of Data
📊 Structured Data
Organised in rows and columns; easily queried. Examples: financial statements, price/volume data, economic indicators.
Traditional statistical methods designed for this.
📝 Unstructured Data
Does not fit neatly into tables. Examples: earnings call transcripts, news articles, social media, satellite images, web traffic.
Requires ML/NLP to extract information.
Machine Learning Categories Critical
| Category | What It Does | Investment Application | Example Algorithms |
|---|---|---|---|
| Supervised Learning | Learns from labelled training data (input → known output). Predicts outputs for new inputs. | Credit scoring, stock return prediction, fraud detection | Linear regression, logistic regression, decision trees, random forests, neural networks |
| Unsupervised Learning | Finds patterns in data without labelled outputs. Groups similar observations. | Portfolio clustering, market regime detection, factor discovery | k-means clustering, principal component analysis (PCA), hierarchical clustering |
| Deep Learning | Multi-layer neural networks; learns complex non-linear patterns automatically. | Image recognition (satellite), speech recognition, NLP | Convolutional networks (CNN), recurrent networks (RNN), transformer models |
Overfitting — The Most Dangerous Problem Critical
Overfitting occurs when a model learns the training data too well — including the noise — and fails to generalise to new data. In investment research, an overfit model shows spectacular backtested returns but performs poorly (or disastrously) in live trading.
Bias-Variance Tradeoff
| High Bias (Underfitting) | High Variance (Overfitting) | |
|---|---|---|
| Training error | High (doesn't fit training data) | Low (fits training data perfectly) |
| Test error | High (poor predictions) | High (fails on new data) |
| Problem | Model too simple; misses real patterns | Model too complex; memorises noise |
| Solution | Increase model complexity | Regularisation, more data, cross-validation |
Preventing Overfitting: Regularisation and Cross-Validation
Regularisation — Penalise Complexity
Add a penalty to the loss function that discourages large coefficients. Two main types:
- LASSO (L1 regularisation): Penalty = λ × Σ|βᵢ|. Drives some coefficients exactly to zero — performs automatic variable selection. Use when you want a sparse model.
- Ridge (L2 regularisation): Penalty = λ × Σβᵢ². Shrinks all coefficients toward zero but rarely to exactly zero. Use when you believe all variables are relevant.
Higher λ = stronger regularisation = simpler model. The optimal λ is chosen by cross-validation.
k-Fold Cross-Validation
Divide the training data into k equal "folds." Train on k−1 folds, test on the remaining fold. Repeat k times, rotating the test fold. Average the test errors. This gives an unbiased estimate of out-of-sample performance — the true test of a model's value.
Natural Language Processing (NLP)
NLP converts unstructured text into structured, analysable data. Applications in investment analysis:
- Sentiment analysis: Classify text as positive/negative/neutral. Mine earnings call transcripts, analyst reports, news articles for market-moving tone shifts.
- Named entity recognition: Automatically identify companies, people, dates, and amounts in text.
- Word embeddings (Word2Vec, BERT): Represent words as numeric vectors that capture semantic meaning — "excellent" and "outstanding" will have similar vectors.
Neural Networks
Inspired by the human brain: layers of interconnected nodes (neurons). Each connection has a weight; the network learns by adjusting weights to minimise prediction error (backpropagation).
| Layer Type | Role |
|---|---|
| Input layer | Receives raw features (prices, fundamentals, text embeddings) |
| Hidden layers | Extract increasingly abstract patterns; more layers = "deeper" network = more complex representations |
| Output layer | Produces final prediction (buy/sell signal, return forecast, default probability) |
Master Formula Sheet — All 11 Readings
| Formula | Description | Reading |
|---|---|---|
| \(HPR = \frac{P_1 - P_0 + CF_1}{P_0}\) | Holding period return | 1 |
| \(\bar{R}_G = \left[\prod(1+R_t)\right]^{1/T} - 1\) | Geometric mean return | 1 |
| \(\bar{R}_H = \frac{N}{\sum 1/R_t}\) | Harmonic mean return (dollar-cost averaging) | 1 |
| \((1+r_{real}) = \frac{1+r_{nom}}{1+r_{inf}}\) | Fisher equation — exact real return | 1 |
| \(EAR = (1 + r_s/m)^m - 1\) | Effective annual rate | 1 |
| \(EAR = e^{r_s} - 1\) | EAR with continuous compounding | 1 |
| \(FV = PV(1+r)^N\) | Future value | 2 |
| \(PV = FV/(1+r)^N\) | Present value | 2 |
| \(PV_{annuity} = PMT \cdot \frac{1-(1+r)^{-N}}{r}\) | Present value of ordinary annuity | 2 |
| \(FV_{annuity} = PMT \cdot \frac{(1+r)^N-1}{r}\) | Future value of ordinary annuity | 2 |
| \(PV_{perp} = PMT/r\) | Present value of perpetuity | 2 |
| \(PV_{grow.perp} = PMT_1/(r-g)\) | Growing perpetuity (r > g required) | 2 |
| \(NPV = \sum CF_t / (1+r)^t\) | Net present value | 2 |
| \(s^2 = \sum(x_i-\bar{x})^2/(n-1)\) | Sample variance (n−1 denominator) | 3 |
| \(CV = s/\bar{x}\) | Coefficient of variation | 3 |
| \(Sharpe = (\bar{R}_p - R_f)/\sigma_p\) | Sharpe ratio | 3 |
| \(P(A|B) = P(A \cap B)/P(B)\) | Conditional probability | 4 |
| \(P(A|B) = P(B|A)P(A)/P(B)\) | Bayes' theorem | 4 |
| \(E(X) = \sum x_i P(x_i)\) | Expected value of a random variable | 4 |
| \(Var(X) = E(X^2) - [E(X)]^2\) | Variance of a random variable | 4 |
| \(E(R_p) = \sum w_i E(R_i)\) | Portfolio expected return | 5 |
| \(\sigma_p^2 = w_1^2\sigma_1^2 + w_2^2\sigma_2^2 + 2w_1w_2\rho_{12}\sigma_1\sigma_2\) | Two-asset portfolio variance | 5 |
| \(Cov(R_1,R_2) = \rho_{12}\sigma_1\sigma_2\) | Covariance from correlation | 5 |
| \(SE = \sigma/\sqrt{n}\) | Standard error of the mean | 7 |
| \(CI = \bar{x} \pm t_{\alpha/2} \cdot s/\sqrt{n}\) | Confidence interval (σ unknown) | 7 |
| \(t = (\bar{x} - \mu_0)/(s/\sqrt{n})\) | t-test statistic for mean | 8 |
| \(\chi^2 = (n-1)s^2/\sigma_0^2\) | Chi-square test for variance | 8 |
| \(F = s_1^2/s_2^2\) | F-test for equality of variances | 8 |
| \(t = r\sqrt{n-2}/\sqrt{1-r^2}\) | t-test for correlation coefficient | 9 |
| \(r_S = 1 - 6\sum d_i^2/[n(n^2-1)]\) | Spearman rank correlation | 9 |
| \(b_1 = Cov(X,Y)/Var(X)\) | OLS slope coefficient | 10 |
| \(b_0 = \bar{Y} - b_1\bar{X}\) | OLS intercept | 10 |
| \(R^2 = SSR/SST = 1-SSE/SST\) | Coefficient of determination | 10 |
| \(SEE = \sqrt{SSE/(n-2)}\) | Standard error of estimate | 10 |
| \(F = MSR/MSE\) | F-test for regression significance | 10 |
| \(t = \hat{b}_1/s_{\hat{b}_1}\) | t-test for slope coefficient | 10 |
| \(SST = SSR + SSE\) | Total = Explained + Unexplained variation | 10 |
- TWR vs. MWR: Use TWR to evaluate manager skill; MWR for investor's personal experience. TWR is the CFA GIPS standard for comparing managers.
- Geometric ≤ Arithmetic mean: Always use geometric for historical compounding; arithmetic for expected future single-period returns.
- Sample variance uses n−1: Not n. The extra division ensures the estimator is unbiased. Population variance uses N.
- "Fail to reject" ≠ "Accept H₀": Lack of evidence against H₀ ≠ proof of H₀. Always say "fail to reject."
- Portfolio variance ≠ weighted average: The covariance/correlation term makes portfolio variance lower than the weighted average of individual variances (for ρ < 1). This IS the diversification benefit.