CFA Level 1 — Quantitative Methods

Intuition-First Study Guide · All 11 Readings

Every formula with worked examples · Exam traps called out explicitly · Logic explained before mechanics

Reading 1

Rates and Returns

How we measure performance — and why the method you choose can dramatically change the number you get.

🗺 Big Picture

Before you can analyse any investment, you need to measure its return. This reading gives you a toolkit of return measures, each suited to a different question: "How did this one security do?" (HPR), "What did the average manager return over time?" (time-weighted), "How did my portfolio do given my specific cash flows?" (money-weighted), and "How should I compare apples to oranges across compounding frequencies?" (EAR/APR conversions). Each measure tells a different story — the exam loves testing whether you pick the right one.

Holding Period Return (HPR) — The Foundation

The most basic return measure: what did you earn on one investment over one holding period? It captures both income (dividends/coupons) and price appreciation in a single number.

\[ HPR = \frac{P_1 - P_0 + CF_1}{P_0} \]

Where $P_0$ = beginning price, $P_1$ = ending price, $CF_1$ = cash received (dividend, coupon).

✏️ Worked Example

You buy a stock at €50. It pays a €2 dividend and rises to €56 at year-end.

HPR = (56 − 50 + 2) / 50 = 8/50 = 16%

Without the dividend you might think: "I earned 12% on price appreciation." The HPR correctly shows you earned 16% total.

HPR can be calculated over ANY holding period — a day, a month, 3 years. It is NOT annualised by default. Be careful: a 20% HPR over 4 years is very different from 20% per year.

Three Types of Mean Returns Critical

1. Arithmetic Mean Return

Simple average of periodic returns. Best use: estimate the expected return in a single future period, given a history of returns.

\[ \bar{R}_A = \frac{1}{T} \sum_{t=1}^{T} R_t \]

2. Geometric Mean Return

The compound annual growth rate (CAGR). Best use: describe the actual historical growth rate of a portfolio that was left to compound. Always ≤ arithmetic mean (equal only when all returns are identical).

\[ \bar{R}_G = \left[\prod_{t=1}^{T}(1+R_t)\right]^{1/T} - 1 \]

✏️ Worked Example — Why They Differ

Year 1: +50%. Year 2: −50%.

Arithmetic mean = (50% + (−50%)) / 2 = 0% — sounds fine.

Geometric mean = √(1.50 × 0.50) − 1 = √0.75 − 1 = −13.4%

You invested €100. After Year 1: €150. After Year 2: €75. You lost money! The geometric mean reflects reality; the arithmetic mean was dangerously misleading.

3. Harmonic Mean Return

Used specifically for dollar-cost averaging — when you invest a fixed €amount each period (not a fixed number of shares). The average cost per share when investing equal amounts is the harmonic mean of prices.

\[ \bar{R}_H = \frac{N}{\sum_{t=1}^{N}\frac{1}{R_t}} \]

Relationship: Harmonic Mean ≤ Geometric Mean ≤ Arithmetic Mean (equality holds only if all values are identical).

🎯 Likely Exam Question

An investor's portfolio returns +20%, −10%, and +15% over three years. What is the geometric mean annual return?

Answer: G = (1.20 × 0.90 × 1.15)^(1/3) − 1 = (1.2420)^(1/3) − 1 = 1.0752 − 1 = 7.52%. The arithmetic mean would be (20 − 10 + 15)/3 = 8.33% — always higher than geometric. Use geometric to measure actual wealth accumulation.

Money-Weighted vs. Time-Weighted Returns Critical

This is one of the most heavily tested concepts in the entire CFA curriculum. The key question: whose decision are we evaluating?

⏱ Time-Weighted Return (TWR)

Purpose: Evaluate the portfolio manager's skill — eliminates the impact of the investor's own cash flow timing decisions.

Method: Divide the period at each external cash flow. Calculate HPR for each sub-period. Compound them.

CFA standard for comparing managers.

💰 Money-Weighted Return (MWR)

Purpose: Evaluate the investor's actual experience — includes timing of contributions and withdrawals.

Method: It is the IRR of all cash flows (outflows = investments, inflows = withdrawals + ending value).

Best for personal wealth tracking.

If an investor puts in more money right before a bad period and withdraws right before a good period, their MWR will be terrible even if the manager had great skill. TWR strips this out, making it the fair way to compare managers who had different client cash flow patterns.

✏️ Worked Example — TWR vs MWR

Setup: Start of Year 1: Invest €100. Year 1 return = +50% → Portfolio = €150. End of Year 1: Invest additional €150 → Portfolio = €300. Year 2 return = −10% → Portfolio = €270.

TWR: Sub-period 1 HPR = +50%. Sub-period 2 HPR = −10%.

TWR = (1.50 × 0.90)^(1/2) − 1 = (1.35)^0.5 − 1 = 16.2% per year

MWR: Cash flows: t=0: −100, t=1: −150, t=2: +270. Solve for IRR.

−100 − 150/(1+r) + 270/(1+r)² = 0. Solving: r ≈ −4.1% per year

The investor had poor timing (invested big just before the bad year). TWR says the manager was skilled (+16%); MWR says the investor's experience was poor (−4%).

Interest Rate Conversions — EAR, APR, and Compounding

Effective Annual Rate (EAR)

EAR is the true annual return after accounting for intra-year compounding. It's the standardised way to compare investments with different compounding frequencies.

\[ EAR = \left(1 + \frac{r_{stated}}{m}\right)^m - 1 \]

Where $m$ = number of compounding periods per year.

\[ EAR_{\text{continuous}} = e^{r_s} - 1 \]

✏️ Worked Example — EAR Comparison

Bank A offers 6% compounded monthly. Bank B offers 6.1% compounded annually. Which is better?

Bank A EAR = (1 + 0.06/12)^12 − 1 = (1.005)^12 − 1 = 6.168%

Bank B EAR = 6.1% (already annual)

Bank A is better despite a lower stated rate. Compounding frequency matters enormously.

The stated rate (APR) is what banks advertise. The EAR is what you actually earn. They are only equal when m = 1 (annual compounding). More frequent compounding → higher EAR for same APR. Continuous compounding gives the maximum possible EAR for a given stated rate.

Real vs. Nominal Returns

A nominal return includes inflation. A real return strips it out, showing purchasing power growth.

\[ (1 + r_{real}) = \frac{1 + r_{nominal}}{1 + r_{inflation}} \quad \text{(Fisher equation — exact form)} \]

\[ r_{real} \approx r_{nominal} - r_{inflation} \quad \text{(approximation, valid for small rates)} \]

🎯 Likely Exam Question

An investment returns 8% nominally while inflation is 3%. What is the exact real return?

Answer: (1.08 / 1.03) − 1 = 1.0485 − 1 = 4.85%. The approximation gives 8% − 3% = 5%. The exam will sometimes give you the approximate method as the wrong answer. Always use the exact Fisher equation unless told otherwise.

Reading 2

The Time Value of Money in Finance

A dollar today is worth more than a dollar tomorrow — this single principle underlies all of finance.

🗺 Big Picture

TVM is the engine behind every valuation model in the CFA curriculum. Bond pricing, equity valuation (dividend discount model), capital budgeting (NPV), mortgage payments, pension funding — all reduce to one question: how do we move cash flows across time? Master the mechanics here and every subsequent topic becomes easier. This reading is also extremely high-yield: TVM calculator questions appear in almost every exam.

The Core Principle

You're given a choice: €1,000 today or €1,000 in 3 years. You'd take today, obviously. Why? You could invest it and have more than €1,000 in 3 years. This "opportunity cost of waiting" is the interest rate. TVM formalises this: every cash flow has a time-specific value, and we can only compare them after adjusting to the same point in time.

Future Value and Present Value Formula

\[ FV = PV \times (1 + r)^N \]

\[ PV = \frac{FV}{(1 + r)^N} \]

Where $r$ = interest rate per period, $N$ = number of periods.

The interest rate $r$ and the number of periods $N$ must be in the same time unit. If you have a 6% annual rate and a 3-month problem, use r = 6%/4 = 1.5% and N = 1 (or r = 6% and N = 0.25 — not both!).

Annuities — Repeated Equal Cash Flows Critical

An annuity is a series of equal payments at equal intervals. Mortgages, lease payments, pension distributions — all are annuities.

Ordinary Annuity (payments at END of period — most common)

\[ PV_{\text{annuity}} = PMT \times \frac{1 - (1+r)^{-N}}{r} \]

\[ FV_{\text{annuity}} = PMT \times \frac{(1+r)^N - 1}{r} \]

Annuity Due (payments at BEGINNING of period)

\[ PV_{\text{annuity due}} = PV_{\text{ordinary}} \times (1+r) \]

Simply multiply the ordinary annuity by (1+r) — you receive each payment one period earlier, so each is worth more by one period of interest.

✏️ Worked Example — Mortgage Payment

A €200,000 mortgage at 5% annually, monthly payments, 30-year term. What is the monthly payment?

Monthly rate: r = 5%/12 = 0.4167%. Periods: N = 360.

200,000 = PMT × [1 − (1.004167)^{−360}] / 0.004167

200,000 = PMT × 186.28

PMT = 200,000 / 186.28 = €1,073.64/month

Note: Over 30 years you pay 360 × €1,073.64 = €386,510 — nearly double the loan amount!

Perpetuities — Payments Forever

A perpetuity pays the same amount forever. While no real instrument literally pays forever, this formula is used to value preferred stocks and is the basis for the Gordon Growth Model.

\[ PV_{\text{perpetuity}} = \frac{PMT}{r} \]

\[ PV_{\text{growing perpetuity}} = \frac{PMT_1}{r - g} \quad \text{(requires } r > g \text{)} \]

A perpetuity paying €100/year at a 5% discount rate is worth €100/0.05 = €2,000. Why? If you invested €2,000 at 5%, you'd earn exactly €100/year — same as the perpetuity. The formula just inverts this relationship.

Net Present Value (NPV) and Internal Rate of Return (IRR) Critical

\[ NPV = \sum_{t=0}^{N} \frac{CF_t}{(1+r)^t} = -CF_0 + \frac{CF_1}{(1+r)} + \frac{CF_2}{(1+r)^2} + \cdots \]

NPV Decision Rule: Invest if NPV > 0 (project adds value). IRR is the discount rate that makes NPV = 0. IRR Decision Rule: Invest if IRR > required rate of return (hurdle rate).

NPV — Preferred Method

Directly measures value added in currency (€/£/$). Assumes cash flows are reinvested at the discount rate. Always gives the correct decision for independent projects.

IRR — Popular but Flawed

Gives a percentage return — easy to communicate. Assumes reinvestment at the IRR (often unrealistic). Can give multiple values or misleading rankings for mutually exclusive projects.

When NPV and IRR conflict on mutually exclusive projects (e.g., NPV says choose A, IRR says choose B), always follow NPV. NPV directly measures value creation. IRR can be distorted by timing and scale differences. The CFA curriculum consistently reinforces: NPV is theoretically superior.

🎯 Likely Exam Question

A project costs €10,000 today and generates €4,000 per year for 4 years. The required return is 10%. Should the project be accepted?

NPV = −10,000 + 4,000 × [1−(1.10)^{−4}/0.10] = −10,000 + 4,000 × 3.1699 = −10,000 + 12,679 = +€2,679. NPV > 0, so accept. The IRR would also be > 10% (approximately 21.9%), confirming the decision. Both methods agree when there is no conflict.

Loan Amortisation

Each loan payment consists of an interest component and a principal repayment component. Over time, the proportion shifts: early payments are mostly interest; later payments are mostly principal. This is why you barely reduce your mortgage principal in the early years.

✏️ Worked Example — First Two Payments of a Loan

€10,000 loan, 8% annual, 5-year term, annual payments.

Annual PMT = 10,000 / [(1 − 1.08^{−5})/0.08] = 10,000 / 3.9927 = €2,504.56

Year 1: Interest = 10,000 × 8% = €800. Principal = 2,504.56 − 800 = €1,704.56. Remaining balance = €8,295.44.

Year 2: Interest = 8,295.44 × 8% = €663.64. Principal = 2,504.56 − 663.64 = €1,840.92. Remaining balance = €6,454.52.

Reading 3

Statistical Measures of Asset Returns

To describe a distribution of returns, you need its centre, its spread, its asymmetry, and the thickness of its tails.

🗺 Big Picture

Risk in finance is primarily measured through statistics. This reading gives you the vocabulary: mean (where returns tend to be), variance/standard deviation (how spread out they are), skewness (are losses worse than gains?), and kurtosis (how often do extreme events occur?). These measures directly connect to risk management — a portfolio with negative skew and high kurtosis (fat tails) is far more dangerous than its standard deviation alone suggests.

Measures of Central Tendency

Arithmetic Mean

\[ \bar{x} = \frac{\sum_{i=1}^{n} x_i}{n} \]

Geometric Mean (for returns — covered in R1)

Use geometric mean for compounding problems; arithmetic mean for expected value of a single future period.

Weighted Mean

\[ \bar{x}_w = \sum_{i=1}^{n} w_i x_i \quad \text{where } \sum w_i = 1 \]

Used for portfolio returns: weight each asset's return by its portfolio weight.

Median and Mode

Median = middle value when sorted (50th percentile). Mode = most frequently occurring value. For skewed distributions, the median is a better measure of "typical" because it's less affected by outliers.

If a portfolio of 10 stocks has 9 returning +5% and one returning −100% (went bankrupt), the mean return is −4.5%. But the median is +5%. The mean is distorted by the one catastrophic loss. This is exactly why return distributions in practice are negatively skewed — the few catastrophic outcomes drag the mean below the median.

Measures of Dispersion

Variance and Standard Deviation Formula

\[ \sigma^2 = \frac{\sum_{i=1}^{N}(x_i - \mu)^2}{N} \quad \text{(population variance)} \]

\[ s^2 = \frac{\sum_{i=1}^{n}(x_i - \bar{x})^2}{n-1} \quad \text{(sample variance — uses n−1)} \]

Why n−1 (not n) for sample variance? Because the sample mean is calculated from the same data, which uses up one "degree of freedom." Dividing by n−1 makes the sample variance an unbiased estimator of the true population variance. This is called Bessel's correction. The CFA exam will test whether you use the correct denominator.

Coefficient of Variation (CV) — Relative Risk Exam Favourite

\[ CV = \frac{s}{\bar{x}} = \frac{\text{Standard Deviation}}{\text{Mean Return}} \]

CV answers: "How much risk am I taking per unit of return?" Lower CV = better risk-adjusted performance (all else equal). Unlike standard deviation, CV allows comparison across investments with different return scales.

✏️ Worked Example — CV Comparison

Fund A: Mean return 10%, Std Dev 8% → CV = 8/10 = 0.80

Fund B: Mean return 20%, Std Dev 14% → CV = 14/20 = 0.70

Fund B has higher absolute risk, but less risk per unit of return. An investor focused on risk-adjusted returns would prefer Fund B.

Sharpe Ratio — Risk-Adjusted Performance

\[ Sharpe\ Ratio = \frac{\bar{R}_p - R_f}{\sigma_p} = \frac{\text{Excess Return}}{\text{Total Risk}} \]

The Sharpe ratio measures excess return (above the risk-free rate) per unit of total risk. Unlike CV, it uses excess return, making it the standard for comparing risky portfolios.

CV vs. Sharpe Ratio: CV uses total return in the numerator; Sharpe uses excess return (above Rf). If the risk-free rate is 0%, they give the same ranking. In practice, Sharpe is preferred for portfolio evaluation because it correctly benchmarks against the opportunity cost of capital (the risk-free rate).

Skewness — Asymmetry of the Distribution Critical

Skewness	Shape	Mean vs. Median vs. Mode	Implication for Finance
Positive (right skew)	Long right tail	Mode < Median < Mean	Rare large gains; most returns are below the mean. Think: lottery tickets, venture capital.
Zero (symmetric)	Normal distribution	Mode = Median = Mean	The ideal assumption. Reality rarely looks like this.
Negative (left skew)	Long left tail	Mean < Median < Mode	Rare large losses; most returns are above the mean. Think: most equity portfolios, credit strategies, short volatility.

Equity markets tend to exhibit negative skew — they drift up most of the time (which is good), but occasionally crash severely (which is catastrophic). A strategy that looks good on mean and standard deviation alone can still be dangerous if it has significant negative skew. This is why kurtosis and skewness matter beyond just variance.

Kurtosis — Fat Tails Critical

Kurtosis measures the "tailedness" of a distribution — how much weight is in the extreme outcomes compared to a normal distribution.

\[ \text{Excess Kurtosis} = \text{Kurtosis} - 3 \]

Type	Excess Kurtosis	Tails vs. Normal	Finance Example
Leptokurtic	> 0	Fatter tails (more extreme events)	Most financial return series — market crashes happen more often than the normal distribution predicts
Mesokurtic	= 0	Normal distribution tails	The theoretical benchmark
Platykurtic	< 0	Thinner tails (fewer extremes)	Rare in finance

Most real financial return series are leptokurtic (fat tails) with negative skewness. This means the normal distribution UNDERESTIMATES the probability of extreme losses. This is the central insight behind why standard Value-at-Risk (VaR) models failed in the 2008 crisis — they assumed normality when the true distribution had much fatter left tails.

🎯 Likely Exam Question

A return series has excess kurtosis of +2.5 and negative skewness. Compared to a normal distribution, this series is best described as having:

Answer: Fatter tails (leptokurtic) with more extreme negative outcomes. Excess kurtosis > 0 means leptokurtic (fatter tails). Negative skewness means the left tail (losses) is longer than the right. Together: more frequent extreme losses than a normal distribution would predict. This is the typical financial return profile.

Reading 4

Probability Trees and Conditional Expectations

How to reason about uncertainty — from basic probability rules to Bayes' theorem and expected value calculations.

🗺 Big Picture

Probability is the mathematical language of uncertainty. Every investment decision involves uncertain outcomes, and this reading gives you the formal tools to reason about them. The practical payoff: probability trees help you model scenarios (recession/expansion, default/no default); expected value helps you price risky assets; Bayes' theorem lets you update beliefs rationally as new information arrives — which is exactly what good analysts do.

Core Probability Rules

\[ P(A \cup B) = P(A) + P(B) - P(A \cap B) \quad \text{(Addition Rule)} \]

\[ P(A \cap B) = P(A|B) \cdot P(B) \quad \text{(Multiplication Rule)} \]

\[ P(A \cap B) = P(A) \cdot P(B) \quad \text{(if A and B are independent)} \]

Concept	Definition	Key Property
Mutually exclusive	Cannot both occur: P(A∩B) = 0	P(A∪B) = P(A) + P(B) (no subtraction needed)
Exhaustive	Cover all possibilities: P(A) + P(B) + … = 1	Forms a complete probability space
Independent	P(A\|B) = P(A) — knowing B tells you nothing about A	P(A∩B) = P(A) × P(B)
Dependent	Knowing B changes the probability of A	Must use conditional probability

Conditional Probability

\[ P(A|B) = \frac{P(A \cap B)}{P(B)} \quad \text{(requires } P(B) > 0\text{)} \]

"The probability of A given B has occurred" — we've restricted our probability space to only the scenarios where B is true.

✏️ Worked Example — Credit Analysis

60% of companies in a sector are investment grade (IG). Of IG companies, 5% default within 5 years. Of non-IG companies, 40% default within 5 years.

P(Default | IG) = 5%; P(Default | Not IG) = 40%; P(IG) = 0.60; P(Not IG) = 0.40.

P(Default) = P(D|IG)×P(IG) + P(D|Not IG)×P(Not IG) = 0.05×0.60 + 0.40×0.40 = 0.03 + 0.16 = 19%

This is the Total Probability Rule — a fundamental tool for building scenario models.

Bayes' Theorem — Updating Beliefs Exam Favourite

Bayes' theorem answers: "Given that an event occurred, what is the revised probability of its cause?" It's how rational people update beliefs when new information arrives.

\[ P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)} \]

Imagine a medical test. 1% of people have a disease (prior). The test is 90% accurate. You test positive. What's the probability you actually have the disease? Bayes gives you the answer — and it's far lower than 90%, because the disease is rare. Analysts use the same logic: "Given that earnings were strong (the test result), what's the revised probability the company is a strong long-term performer (the disease)?

✏️ Worked Example — Bayes' Theorem in Finance

Prior probability of a recession: P(Rec) = 30%. During recessions, a yield curve inverts 80% of the time: P(Invert|Rec) = 80%. During expansions, inversions still occur 20% of the time: P(Invert|Exp) = 20%. The yield curve has just inverted. What is the revised probability of recession?

P(Invert) = P(Invert|Rec)×P(Rec) + P(Invert|Exp)×P(Exp) = 0.80×0.30 + 0.20×0.70 = 0.24 + 0.14 = 0.38

P(Rec|Invert) = P(Invert|Rec) × P(Rec) / P(Invert) = (0.80 × 0.30) / 0.38 = 0.24 / 0.38 = 63.2%

Prior: 30% recession. After observing inversion: 63.2% recession. Bayes updated our belief rationally.

Expected Value and Variance of a Random Variable

\[ E(X) = \sum_i x_i \cdot P(x_i) \]

\[ Var(X) = E[(X - \mu)^2] = \sum_i (x_i - \mu)^2 \cdot P(x_i) = E(X^2) - [E(X)]^2 \]

Properties of Expected Value

$E(aX + b) = a \cdot E(X) + b$
$E(X + Y) = E(X) + E(Y)$ (always, regardless of dependence)
$Var(aX + b) = a^2 \cdot Var(X)$ (adding a constant doesn't change variance)
$Var(X + Y) = Var(X) + Var(Y) + 2 \cdot Cov(X,Y)$ (the portfolio variance formula!)

🎯 Likely Exam Question

A stock returns +30% with probability 0.4, +5% with probability 0.35, and −20% with probability 0.25. What is the expected return and variance?

E(R) = 0.4×30 + 0.35×5 + 0.25×(−20) = 12 + 1.75 − 5 = 8.75%
Var(R) = 0.4×(30−8.75)² + 0.35×(5−8.75)² + 0.25×(−20−8.75)² = 0.4×451.56 + 0.35×14.06 + 0.25×826.56 = 180.63 + 4.92 + 206.64 = 392.19 (so σ = 19.8%)

Reading 5

Portfolio Mathematics

How diversification works mathematically — and why combining two risky assets can produce a less risky portfolio.

🗺 Big Picture

This is the mathematical core of Modern Portfolio Theory. The key insight: portfolio risk is NOT the weighted average of individual risks. Because assets don't move in perfect lockstep (they're not perfectly correlated), combining them reduces total risk. The lower the correlation, the more risk reduction diversification provides. This is why diversification is sometimes called "the only free lunch in finance."

Portfolio Expected Return

\[ E(R_p) = \sum_{i=1}^{n} w_i \cdot E(R_i) = w_1 E(R_1) + w_2 E(R_2) + \cdots \]

Portfolio expected return IS the weighted average of individual expected returns — simple and linear.

Portfolio Variance — The Non-Trivial Part Critical

Two-Asset Portfolio

\[ \sigma_p^2 = w_1^2\sigma_1^2 + w_2^2\sigma_2^2 + 2w_1w_2\text{Cov}(R_1, R_2) \]

\[ \text{Cov}(R_1, R_2) = \rho_{12} \cdot \sigma_1 \cdot \sigma_2 \]

So the formula can also be written:

\[ \sigma_p^2 = w_1^2\sigma_1^2 + w_2^2\sigma_2^2 + 2w_1w_2\rho_{12}\sigma_1\sigma_2 \]

✏️ Worked Example — Diversification Benefit

Asset A: E(R) = 10%, σ = 20%. Asset B: E(R) = 8%, σ = 15%. Correlation ρ = 0.3. Portfolio: 50/50 split.

Portfolio Expected Return = 0.5×10% + 0.5×8% = 9%

Portfolio Variance = (0.5)²(0.20)² + (0.5)²(0.15)² + 2(0.5)(0.5)(0.3)(0.20)(0.15)

= 0.25×0.04 + 0.25×0.0225 + 2×0.25×0.3×0.03 = 0.01 + 0.005625 + 0.0045 = 0.020125

Portfolio Std Dev = √0.020125 = 14.19%

Simple average of standard deviations = 0.5×20% + 0.5×15% = 17.5%. Portfolio std dev = 14.19% — lower than the weighted average! That's the diversification benefit from ρ = 0.3 < 1.

The Role of Correlation in Diversification Critical

Correlation (ρ)	Diversification Benefit	Can Portfolio Risk = 0?
ρ = +1 (perfect positive)	None — portfolio risk = weighted average of individual risks	No — this is the maximum risk scenario
0 < ρ < 1	Partial — risk is below weighted average	No — but significantly reduced
ρ = 0 (uncorrelated)	Maximum for long positions without shorting	No (unless infinite assets)
−1 < ρ < 0	Greater than zero-correlation case	No — but substantial reduction
ρ = −1 (perfect negative)	Maximum possible	Yes — perfect hedge exists

When ρ = +1, both assets move identically — combining them is like putting all eggs in one basket (no benefit). When ρ = −1, when one goes up, the other goes down — you can construct a portfolio that never fluctuates. Real assets have correlations between these extremes, so real diversification is always beneficial but never perfect.

Covariance Matrix Formula

For a portfolio with n assets, variance requires n variances and n(n−1)/2 unique covariances. With 100 assets, you need 100 variances + 4,950 covariances = 5,050 pieces of information. This is why covariance estimation is the hardest part of portfolio construction in practice.

\[ \sigma_p^2 = \sum_i \sum_j w_i w_j \text{Cov}(R_i, R_j) \]

Note: when i = j, Cov(R_i, R_i) = Var(R_i) = σ_i² — so the variance terms are just special cases of the covariance matrix diagonal.

Minimum Variance Portfolio (Two Assets)

The weight in Asset 1 that minimises portfolio variance:

\[ w_1^* = \frac{\sigma_2^2 - \text{Cov}(R_1, R_2)}{\sigma_1^2 + \sigma_2^2 - 2\text{Cov}(R_1, R_2)} \]

🎯 Likely Exam Question

Asset X has σ = 25%, Asset Y has σ = 15%. Their correlation is +0.4. What is the portfolio variance if you invest 60% in X and 40% in Y?

Cov(X,Y) = 0.4 × 0.25 × 0.15 = 0.015.
σ²_p = (0.6)²(0.25)² + (0.4)²(0.15)² + 2(0.6)(0.4)(0.015)
= 0.36×0.0625 + 0.16×0.0225 + 0.48×0.015
= 0.0225 + 0.0036 + 0.0072 = 0.0333 → σ_p = 18.25%
Weighted average = 0.6×25%+0.4×15% = 21%. Diversification saved you 2.75 percentage points of risk.

Reading 6

Simulation Methods

When closed-form mathematics breaks down, simulation lets you generate the answer empirically.

🗺 Big Picture

Many financial problems — option pricing with complex payoffs, portfolio risk under non-normal distributions, retirement planning under uncertainty — cannot be solved analytically. Simulation methods attack these problems by generating thousands of possible scenarios and observing the resulting distribution of outcomes. The three methods (historical simulation, Monte Carlo, and bootstrap) differ in how they generate those scenarios, and each has strengths and weaknesses the exam tests directly.

Historical Simulation

Use actual past data (historical returns, historical factor moves) directly — resample from the empirical distribution, preserving the real-world distribution including skewness, kurtosis, and any correlations present in the data.

✅ Strengths

No distributional assumptions needed. Automatically captures fat tails, skewness, and real correlations. Includes actual historical crises (2008, 2000, 1987).

❌ Weaknesses

Constrained by history — if a scenario has never occurred, it has zero probability. Can't simulate scenarios more extreme than the historical worst case. Sample size is limited.

Monte Carlo Simulation Critical

Specify a probability model (e.g., normal distribution with given mean and variance, or a correlated multivariate distribution), then use a computer to draw thousands of random samples from that model and simulate outcomes.

✏️ How Monte Carlo Works — Retirement Planning

Assume stock returns are normally distributed: μ = 7%, σ = 15%. Simulate 10,000 possible 30-year return paths. For each path, calculate whether the investor's portfolio lasts through retirement. Count the percentage of paths that succeed. Result: "There's a 73% probability your portfolio lasts 30 years at your current spending rate."

✅ Strengths

Can simulate ANY scenario — including ones that have never occurred. Can incorporate complex dependencies, option payoffs, path-dependent features. Highly flexible.

❌ Weaknesses

"Garbage in, garbage out" — wrong distributional assumptions produce wrong results. Computationally intensive. The normal distribution assumption is particularly dangerous for tail risk.

Bootstrap Simulation

A hybrid approach: resample WITH REPLACEMENT from the actual historical data. Unlike historical simulation which uses each observation once, bootstrap can draw the same observation multiple times, creating new synthetic histories.

Imagine the historical data is a bag of 250 coloured balls (one per trading day). Historical simulation draws all 250 once. Bootstrap draws 250 times WITH replacement — you might draw "October 19, 1987" three times in your simulated year. This allows you to explore scenarios beyond what literally happened in sequence, while still being grounded in real data.

Method	Source of Randomness	Key Advantage	Key Limitation
Historical	Actual past returns, used sequentially	No assumptions; captures real crises	Limited to what happened; can't exceed historical extremes
Monte Carlo	Specified probability distribution	Flexible; can model any scenario	Results depend entirely on assumed distribution
Bootstrap	Past data, resampled with replacement	Grounded in data; more scenarios than history	Can't generate truly new tail events

🎯 Likely Exam Question

An analyst wants to estimate the distribution of a portfolio's Value-at-Risk but is concerned the returns are non-normally distributed with fat tails. Which simulation method is most appropriate?

Answer: Historical simulation or Bootstrap. Both avoid imposing a specific distribution on the data, so they naturally capture any non-normality, fat tails, or skewness present in the historical data. Monte Carlo is inappropriate here because it requires specifying a distribution — if you choose normal, you've assumed away the very problem you're trying to capture.

Reading 7

Estimation and Inference

We never know the true population parameters — we only have samples. Statistics tells us how confident to be in our estimates.

🗺 Big Picture

Every financial analysis is built on estimates — the expected return, the beta, the mean earnings. But estimates from samples are imperfect. This reading teaches you to quantify that imperfection: how large is the sampling error? How confident can we be that the true parameter lies within a given range? These concepts (standard error, confidence intervals, the Central Limit Theorem) underpin all of hypothesis testing and regression analysis.

Sampling and the Central Limit Theorem

Point Estimates

A single number used to estimate a population parameter. The sample mean $\bar{x}$ estimates population mean $\mu$. The sample variance $s^2$ estimates population variance $\sigma^2$.

Standard Error of the Mean Formula

The standard deviation of the sampling distribution of the mean — how much would $\bar{x}$ vary across different random samples?

\[ SE_{\bar{x}} = \frac{\sigma}{\sqrt{n}} \quad \text{(when population σ is known)} \]

\[ SE_{\bar{x}} = \frac{s}{\sqrt{n}} \quad \text{(when using sample standard deviation)} \]

Standard error shrinks as sample size grows (by the square root of n). Doubling your sample size cuts the standard error by √2 ≈ 29%. To halve the standard error, you need to quadruple the sample size. This is why large samples give much more reliable estimates — but with diminishing returns.

Central Limit Theorem (CLT) Critical

Regardless of the shape of the underlying population distribution, the sampling distribution of the sample mean approaches a normal distribution as n increases (roughly n ≥ 30). This is the foundational justification for using normal distribution tools in hypothesis testing.

The CLT does NOT say the data is normally distributed. It says the MEAN of a large sample will be approximately normally distributed, regardless of how the individual observations are distributed. This distinction is critical: you can apply normal-distribution-based hypothesis tests to stock returns (which are not normally distributed) as long as your sample is large enough.

Confidence Intervals Critical

A confidence interval gives a range of values that is expected to contain the true population parameter with a specified probability (confidence level).

\[ CI = \bar{x} \pm z_{\alpha/2} \cdot \frac{\sigma}{\sqrt{n}} \quad \text{(use z when σ known and/or n is large)} \]

\[ CI = \bar{x} \pm t_{\alpha/2, n-1} \cdot \frac{s}{\sqrt{n}} \quad \text{(use t when σ unknown — the most common case)} \]

Confidence Level	α	z-critical value (two-tailed)
90%	0.10	±1.645
95%	0.05	±1.960
99%	0.01	±2.576

✏️ Worked Example — Confidence Interval

A fund's monthly returns over 36 months have a mean of 1.2% and a sample standard deviation of 3.6%. Construct a 95% CI for the true mean monthly return.

SE = 3.6% / √36 = 3.6% / 6 = 0.6%

Since n = 36 (large), use z: 95% CI = 1.2% ± 1.96 × 0.6% = 1.2% ± 1.176% = (0.024%, 2.376%)

Interpretation: We are 95% confident the true mean monthly return lies between 0.024% and 2.376%. Since the entire interval is above zero, we have evidence the manager generates positive returns.

Correct interpretation of a 95% CI: "If we repeated this sampling process many times, 95% of the intervals constructed this way would contain the true parameter." It does NOT mean "there is a 95% chance the true mean is in this interval" — the true mean is fixed; it's either in the interval or not. This subtle distinction is a favourite exam trap.

When to Use z vs. t Distribution Exam Favourite

Scenario	Distribution	Reasoning
Population σ known, normally distributed population	z	Exact z-test applies
Population σ unknown, n ≥ 30	z or t	CLT makes z approximately valid; t is technically more correct but difference is small
Population σ unknown, n < 30, normally distributed	t (with n−1 df)	t distribution has fatter tails to account for extra uncertainty from estimating σ
Population σ unknown, n < 30, non-normal	Neither — use non-parametric	Both z and t assume approximate normality

Reading 8

Hypothesis Testing

The formal framework for asking: "Is this result real, or just noise?"

🗺 Big Picture

Hypothesis testing is the engine behind quantitative research. Every claim in finance — "this manager beats the benchmark," "momentum is a real factor," "this economic variable predicts returns" — must survive a hypothesis test. The framework is always the same: state a null hypothesis (the boring baseline), compute how unlikely your sample result would be if the null were true, and decide whether to reject the null. This reading is one of the most conceptually dense in the curriculum — and one of the most heavily tested.

The Hypothesis Testing Framework

Step 1: State the Hypotheses

H₀ (null hypothesis): The baseline claim to be tested. Usually the "no effect" or "no difference" statement. Always includes equality (=, ≤, ≥).

Hₐ (alternative hypothesis): What we're trying to find evidence for. Sets the direction of the test.

Test Type	H₀	Hₐ	Rejection Region
Two-tailed	θ = θ₀	θ ≠ θ₀	Both tails (split α/2 each side)
One-tailed (right)	θ ≤ θ₀	θ > θ₀	Right tail only (full α)
One-tailed (left)	θ ≥ θ₀	θ < θ₀	Left tail only (full α)

Step 2: Choose Significance Level (α)

α = the maximum probability of making a Type I error (rejecting a true H₀) you're willing to accept. Common choices: 1%, 5%, 10%.

Step 3: Compute the Test Statistic

\[ \text{Test Statistic} = \frac{\text{Sample estimate} - \text{Hypothesised value}}{\text{Standard error of the estimate}} \]

\[ t = \frac{\bar{x} - \mu_0}{s / \sqrt{n}} \quad \text{(t-test for population mean)} \]

Step 4: Compare to Critical Value and Make Decision

If |test statistic| > critical value (or equivalently, p-value < α): Reject H₀. Otherwise: Fail to reject H₀.

Never say "accept H₀." You either reject H₀ or fail to reject H₀. Failing to reject doesn't mean H₀ is true — it only means you don't have sufficient evidence to reject it. This is an extremely common source of exam errors.

Type I and Type II Errors Critical

	H₀ is TRUE	H₀ is FALSE
Reject H₀	Type I Error (α) — False Positive	Correct Decision (Power = 1−β)
Fail to reject H₀	Correct Decision (1−α)	Type II Error (β) — False Negative

Type I error = False alarm. The drug doesn't work, but your test says it does. You paid for a useless treatment. Type II error = Miss. The drug works, but your test didn't detect it. You missed a beneficial treatment. In finance: Type I = you think the manager has skill when they don't (you hire them and waste fees). Type II = you think the manager has no skill when they actually do (you pass on a great manager).

The key trade-off: Lowering α (being more strict) reduces Type I errors but increases Type II errors. You can only reduce BOTH by increasing sample size (n).

p-Value — The Modern Standard Exam Favourite

The p-value is the probability of obtaining a test statistic as extreme or more extreme than the one observed, assuming H₀ is true.

The p-value answers: "If the null hypothesis is true, how surprising is this data?" A p-value of 0.02 means: if H₀ were true, there's only a 2% chance of seeing results this extreme by random chance. That's suspicious enough to reject H₀ at the 5% level.

Decision rule using p-value: If p-value < α, reject H₀. If p-value ≥ α, fail to reject H₀. The p-value does NOT give the probability that H₀ is true. It gives the probability of the data given H₀ is true — not the probability of H₀ given the data. This distinction is subtle but important for the exam.

Tests Beyond the Mean

Chi-Square Test (χ²) — Testing Variance

\[ \chi^2 = \frac{(n-1)s^2}{\sigma_0^2} \quad \text{with } (n-1) \text{ degrees of freedom} \]

Tests whether a population variance equals a hypothesised value. The chi-square distribution is always positive and right-skewed, so critical values are asymmetric for one-tailed tests.

F-Test — Comparing Two Variances

\[ F = \frac{s_1^2}{s_2^2} \quad \text{with } (n_1-1, n_2-1) \text{ degrees of freedom} \]

Tests whether two population variances are equal. Convention: put the larger variance in the numerator (F ≥ 1). Used heavily in regression to test overall model fit.

🎯 Likely Exam Question

A researcher tests whether a fund's mean monthly return is greater than 0%. The sample mean is 0.8%, sample std dev is 2.4%, and n = 36. The test statistic is ___. At α = 5%, the critical value for a one-tailed test is 1.645. Conclusion?

t = (0.8 − 0) / (2.4/√36) = 0.8 / 0.4 = 2.0. Since 2.0 > 1.645 (critical value), reject H₀. There is sufficient evidence at the 5% significance level that the fund's mean monthly return is greater than zero. p-value ≈ 0.023 < 0.05.

Reading 9

Parametric and Non-Parametric Tests of Independence

Are two variables related — and how do we test that rigorously?

🗺 Big Picture

Much of investment research tests for relationships: "Does past performance predict future returns?" "Is this factor correlated with stock returns?" "Do earnings revisions lead price movements?" This reading covers the statistical tools for answering such questions: parametric tests (which assume distributions) and non-parametric alternatives (which make no distributional assumptions). Knowing when to use which is a key exam skill.

Parametric Tests — When Distribution Assumptions Hold

t-Test for Correlation Formula

Tests whether the population correlation coefficient ρ is significantly different from zero (i.e., whether two variables are actually linearly related).

\[ t = \frac{r\sqrt{n-2}}{\sqrt{1-r^2}} \quad \text{with } (n-2) \text{ degrees of freedom} \]

Where $r$ = sample correlation coefficient, $n$ = number of observations.

✏️ Worked Example

A researcher calculates r = 0.45 between GDP growth and stock returns using n = 30 observations. Is this significant at α = 5% (two-tailed)?

t = 0.45 × √(30−2) / √(1−0.45²) = 0.45 × √28 / √(0.7975) = 0.45 × 5.292 / 0.893 = 2.67

Critical value: t₀.₀₂₅, 28 ≈ 2.048. Since 2.67 > 2.048, reject H₀ (ρ = 0). The correlation is statistically significant.

Paired Comparison t-Test

Used when observations come in natural pairs (e.g., the same company before and after a policy change, or matched pairs of funds). You test whether the mean difference is zero.

\[ t = \frac{\bar{d} - \mu_{d_0}}{s_d / \sqrt{n}} \quad \text{with } (n-1) \text{ degrees of freedom} \]

Where $\bar{d}$ = mean of paired differences, $s_d$ = std dev of differences, $\mu_{d_0}$ = hypothesised mean difference (usually 0).

Non-Parametric Tests — When Assumptions Break Down

Non-parametric tests work with ranks rather than actual values. Since ranking doesn't assume any distribution, these tests are robust to outliers, non-normality, and small samples. The trade-off: when distributional assumptions ARE met, parametric tests are more powerful (better at detecting true effects).

Spearman Rank Correlation Exam Favourite

The non-parametric analogue of Pearson correlation. Rank both variables, then apply the standard Pearson correlation formula to the ranks.

\[ r_S = 1 - \frac{6\sum d_i^2}{n(n^2 - 1)} \]

Where $d_i$ = difference in ranks for the i-th pair. This formula assumes no tied ranks.

Condition	Use Parametric (Pearson t-test)	Use Non-Parametric (Spearman)
Distribution	Approximately normal	Non-normal, unknown, or heavily skewed
Sample size	Any (CLT helps for large n)	Small samples especially
Data type	Continuous ratio/interval data	Ordinal data, or ranked data
Outliers	Sensitive to outliers	Robust to outliers (uses ranks)
Power	Higher (when assumptions met)	Lower (loses information by ranking)

Runs Test — Testing for Independence/Randomness

Tests whether a sequence of observations is random (independent). Counts "runs" — consecutive sequences of the same sign (e.g., consecutive positive or negative returns).

Too few runs: positive serial correlation (trending). Too many runs: negative serial correlation (mean-reverting). Exact random walk: number of runs consistent with random probability.

🎯 Likely Exam Question

An analyst tests whether analyst rankings correlate with future excess returns. The ranking data is ordinal and the sample is small (n = 15). Which test is most appropriate?

Answer: Spearman rank correlation. Two reasons: (1) the data is ordinal (analyst rankings) — Pearson correlation requires continuous data; (2) the small sample (n = 15) means CLT doesn't apply and the normality assumption required for a parametric test is harder to justify. Spearman makes no distributional assumptions and is designed for ranked data.

Reading 10

Simple Linear Regression

Quantifying how one variable is linearly related to another — and testing whether that relationship is statistically real.

🗺 Big Picture

Regression is the workhorse of quantitative finance. Factor models (CAPM, Fama-French), earnings forecasting, economic prediction — all rely on regression. Simple linear regression (one dependent variable, one independent variable) gives you the conceptual foundation. Understand the mechanics here and multiple regression (not in Level 1) follows naturally. The exam tests both the mechanics AND interpretation of outputs.

The Regression Model

\[ Y_i = b_0 + b_1 X_i + \varepsilon_i \]

Where $Y_i$ = dependent variable (what we're predicting), $X_i$ = independent variable (the predictor), $b_0$ = intercept, $b_1$ = slope coefficient, $\varepsilon_i$ = error term (residual).

CAPM is a regression model: R_i − Rf = α + β(Rm − Rf) + ε. The dependent variable is the stock's excess return. The independent variable is the market excess return. β (beta) is the slope coefficient. α (Jensen's alpha) is the intercept. The regression quantifies how much of the stock's return is explained by market movements.

Ordinary Least Squares (OLS) Estimation Formula

OLS minimises the sum of squared residuals (SSE). This gives the "best fit" line through the data.

\[ b_1 = \frac{\text{Cov}(X,Y)}{\text{Var}(X)} = \frac{\sum_{i}(X_i - \bar{X})(Y_i - \bar{Y})}{\sum_{i}(X_i - \bar{X})^2} \]

\[ b_0 = \bar{Y} - b_1 \bar{X} \]

The slope b₁ tells you: for a one-unit increase in X, Y is expected to change by b₁ units. The intercept b₀ tells you the expected value of Y when X = 0. In CAPM: if β = 1.2, for every 1% the market moves, this stock is expected to move 1.2%.

Measuring Fit: R², SEE, and ANOVA Critical

The total variation in Y (SST) can be decomposed into explained variation (SSR) and unexplained variation (SSE):

\[ SST = SSR + SSE \]

\[ R^2 = \frac{SSR}{SST} = 1 - \frac{SSE}{SST} \]

R² = the proportion of variation in Y explained by X. R² = 1 means perfect fit; R² = 0 means X explains nothing about Y.

\[ SEE = \sqrt{\frac{SSE}{n-2}} \quad \text{(Standard Error of Estimate)} \]

SEE = the typical size of a prediction error. Lower SEE = better model fit.

✏️ Interpreting Regression Output

Regression of excess stock returns (Y) on market excess returns (X):

ŷ = 0.5% + 1.3X, with R² = 0.65, SEE = 4.2%

• Intercept (0.5%): The stock earns 0.5% per period even when the market return equals zero (Jensen's α)

• Slope (1.3): For every 1% move in the market, this stock is expected to move 1.3% (β > 1, aggressive stock)

• R² = 0.65: 65% of the variation in the stock's return is explained by market movements; 35% is idiosyncratic

• SEE = 4.2%: Typical prediction error is ±4.2%

Testing the Slope Coefficient Critical

H₀: b₁ = 0 (the independent variable has no linear relationship with Y). The t-test:

\[ t = \frac{\hat{b}_1 - b_{1,0}}{s_{\hat{b}_1}} \quad \text{with } (n-2) \text{ degrees of freedom} \]

Where $s_{\hat{b}_1}$ = standard error of the slope estimate.

F-Test — Overall Regression Significance

\[ F = \frac{MSR}{MSE} = \frac{SSR/k}{SSE/(n-k-1)} \quad \text{(k = 1 for simple regression)} \]

In simple regression, the F-test and the t-test for the slope coefficient test the same hypothesis. F = t². Always one-tailed (reject if F > critical value).

Assumptions of OLS Regression

Assumption	What It Means	If Violated…
Linearity	The true relationship is linear	Model is misspecified; residuals show patterns
No heteroskedasticity	Var(ε) is constant across all X values	Standard errors are wrong; t-tests and F-tests invalid
No autocorrelation	Residuals are not correlated with each other over time	Standard errors understated; t-statistics inflated
No multicollinearity	Predictors are not highly correlated (multiple regression)	Coefficients unstable, large standard errors
Normal residuals	ε ~ N(0, σ²)	t and F tests may be unreliable in small samples

Heteroskedasticity and autocorrelation are the two most frequently tested regression problems. Both cause OLS standard errors to be incorrect (usually too small), making test statistics look more significant than they really are. Detection: heteroskedasticity via a Breusch-Pagan test or plotting residuals vs. fitted values; autocorrelation via the Durbin-Watson test or plotting residuals over time.

🎯 Likely Exam Question

A regression of stock returns on earnings surprise has b₁ = 0.42 with a standard error of 0.15, and n = 42 observations. Is the slope significant at the 5% level (two-tailed)?

t = (0.42 − 0) / 0.15 = 2.80. Degrees of freedom = 42 − 2 = 40. t-critical (40 df, α=0.025 each tail) ≈ 2.021. Since 2.80 > 2.021, reject H₀. The positive relationship between earnings surprises and stock returns is statistically significant at the 5% level. The coefficient of 0.42 means a 1-unit earnings surprise is associated with a 0.42% higher stock return on average.

Reading 11

Introduction to Big Data Techniques

How machine learning, natural language processing, and alternative data are reshaping investment analysis.

🗺 Big Picture

The investment industry is being transformed by big data. Hedge funds analyse satellite imagery of parking lots to forecast retail earnings. Algorithms mine earnings call transcripts for sentiment signals before human analysts can read them. Machine learning models identify non-linear patterns that classical regression misses. This reading introduces the vocabulary and core concepts — you won't be asked to code, but you must understand how these methods work, their strengths and limitations, and the critical problem of overfitting.

Types of Data

📊 Structured Data

Organised in rows and columns; easily queried. Examples: financial statements, price/volume data, economic indicators.

Traditional statistical methods designed for this.

📝 Unstructured Data

Does not fit neatly into tables. Examples: earnings call transcripts, news articles, social media, satellite images, web traffic.

Requires ML/NLP to extract information.

Structured data is like a spreadsheet. Unstructured data is everything else — all the messy information humans generate constantly that doesn't fit in a table. 80%+ of data generated in the world is unstructured. Machine learning unlocks this data for investment analysis.

Machine Learning Categories Critical

Category	What It Does	Investment Application	Example Algorithms
Supervised Learning	Learns from labelled training data (input → known output). Predicts outputs for new inputs.	Credit scoring, stock return prediction, fraud detection	Linear regression, logistic regression, decision trees, random forests, neural networks
Unsupervised Learning	Finds patterns in data without labelled outputs. Groups similar observations.	Portfolio clustering, market regime detection, factor discovery	k-means clustering, principal component analysis (PCA), hierarchical clustering
Deep Learning	Multi-layer neural networks; learns complex non-linear patterns automatically.	Image recognition (satellite), speech recognition, NLP	Convolutional networks (CNN), recurrent networks (RNN), transformer models

Overfitting — The Most Dangerous Problem Critical

Overfitting occurs when a model learns the training data too well — including the noise — and fails to generalise to new data. In investment research, an overfit model shows spectacular backtested returns but performs poorly (or disastrously) in live trading.

Imagine fitting a polynomial curve to 10 data points. A 9th-degree polynomial can pass through all 10 points perfectly (R² = 100%), but it will wiggle wildly between points and predict terrible values for new inputs. A simpler line might not fit the training data as well, but it generalises far better. This is the bias-variance tradeoff.

Bias-Variance Tradeoff

	High Bias (Underfitting)	High Variance (Overfitting)
Training error	High (doesn't fit training data)	Low (fits training data perfectly)
Test error	High (poor predictions)	High (fails on new data)
Problem	Model too simple; misses real patterns	Model too complex; memorises noise
Solution	Increase model complexity	Regularisation, more data, cross-validation

Preventing Overfitting: Regularisation and Cross-Validation

Regularisation — Penalise Complexity

Add a penalty to the loss function that discourages large coefficients. Two main types:

LASSO (L1 regularisation): Penalty = λ × Σ|βᵢ|. Drives some coefficients exactly to zero — performs automatic variable selection. Use when you want a sparse model.
Ridge (L2 regularisation): Penalty = λ × Σβᵢ². Shrinks all coefficients toward zero but rarely to exactly zero. Use when you believe all variables are relevant.

Higher λ = stronger regularisation = simpler model. The optimal λ is chosen by cross-validation.

k-Fold Cross-Validation

Divide the training data into k equal "folds." Train on k−1 folds, test on the remaining fold. Repeat k times, rotating the test fold. Average the test errors. This gives an unbiased estimate of out-of-sample performance — the true test of a model's value.

Natural Language Processing (NLP)

NLP converts unstructured text into structured, analysable data. Applications in investment analysis:

Sentiment analysis: Classify text as positive/negative/neutral. Mine earnings call transcripts, analyst reports, news articles for market-moving tone shifts.
Named entity recognition: Automatically identify companies, people, dates, and amounts in text.
Word embeddings (Word2Vec, BERT): Represent words as numeric vectors that capture semantic meaning — "excellent" and "outstanding" will have similar vectors.

Neural Networks

Inspired by the human brain: layers of interconnected nodes (neurons). Each connection has a weight; the network learns by adjusting weights to minimise prediction error (backpropagation).

Layer Type	Role
Input layer	Receives raw features (prices, fundamentals, text embeddings)
Hidden layers	Extract increasingly abstract patterns; more layers = "deeper" network = more complex representations
Output layer	Produces final prediction (buy/sell signal, return forecast, default probability)

Deep learning (many hidden layers) is extremely powerful for complex pattern recognition (images, speech, text) but requires enormous amounts of data and is a "black box" — it's very hard to explain why it makes a particular prediction. In regulated industries where explainability is required (e.g., credit decisions), simpler interpretable models are often preferred despite lower accuracy.

🎯 Likely Exam Question

An analyst builds a machine learning model to predict stock returns. The model has an extremely high R² in training (0.95) but a much lower R² out of sample (0.12). This is most consistent with:

Answer: Overfitting. A model that fits training data extremely well but fails to generalise to new data has memorised the training data's noise rather than learning true patterns. Solutions include regularisation (LASSO/Ridge), cross-validation to select model complexity, reducing the number of features, or using more training data. The high in-sample R² is actually a warning sign here, not a positive signal.

Reference

Master Formula Sheet — All 11 Readings

Every key formula, organised for last-minute revision.

Formula	Description	Reading
$HPR = \frac{P_1 - P_0 + CF_1}{P_0}$	Holding period return	1
$\bar{R}_G = \left[\prod(1+R_t)\right]^{1/T} - 1$	Geometric mean return	1
$\bar{R}_H = \frac{N}{\sum 1/R_t}$	Harmonic mean return (dollar-cost averaging)	1
$(1+r_{real}) = \frac{1+r_{nom}}{1+r_{inf}}$	Fisher equation — exact real return	1
$EAR = (1 + r_s/m)^m - 1$	Effective annual rate	1
$EAR = e^{r_s} - 1$	EAR with continuous compounding	1
$FV = PV(1+r)^N$	Future value	2
$PV = FV/(1+r)^N$	Present value	2
$PV_{annuity} = PMT \cdot \frac{1-(1+r)^{-N}}{r}$	Present value of ordinary annuity	2
$FV_{annuity} = PMT \cdot \frac{(1+r)^N-1}{r}$	Future value of ordinary annuity	2
$PV_{perp} = PMT/r$	Present value of perpetuity	2
$PV_{grow.perp} = PMT_1/(r-g)$	Growing perpetuity (r > g required)	2
$NPV = \sum CF_t / (1+r)^t$	Net present value	2
$s^2 = \sum(x_i-\bar{x})^2/(n-1)$	Sample variance (n−1 denominator)	3
$CV = s/\bar{x}$	Coefficient of variation	3
$Sharpe = (\bar{R}_p - R_f)/\sigma_p$	Sharpe ratio	3
$P(A\|B) = P(A \cap B)/P(B)$	Conditional probability	4
$P(A\|B) = P(B\|A)P(A)/P(B)$	Bayes' theorem	4
$E(X) = \sum x_i P(x_i)$	Expected value of a random variable	4
$Var(X) = E(X^2) - [E(X)]^2$	Variance of a random variable	4
$E(R_p) = \sum w_i E(R_i)$	Portfolio expected return	5
$\sigma_p^2 = w_1^2\sigma_1^2 + w_2^2\sigma_2^2 + 2w_1w_2\rho_{12}\sigma_1\sigma_2$	Two-asset portfolio variance	5
$Cov(R_1,R_2) = \rho_{12}\sigma_1\sigma_2$	Covariance from correlation	5
$SE = \sigma/\sqrt{n}$	Standard error of the mean	7
$CI = \bar{x} \pm t_{\alpha/2} \cdot s/\sqrt{n}$	Confidence interval (σ unknown)	7
$t = (\bar{x} - \mu_0)/(s/\sqrt{n})$	t-test statistic for mean	8
$\chi^2 = (n-1)s^2/\sigma_0^2$	Chi-square test for variance	8
$F = s_1^2/s_2^2$	F-test for equality of variances	8
$t = r\sqrt{n-2}/\sqrt{1-r^2}$	t-test for correlation coefficient	9
$r_S = 1 - 6\sum d_i^2/[n(n^2-1)]$	Spearman rank correlation	9
$b_1 = Cov(X,Y)/Var(X)$	OLS slope coefficient	10
$b_0 = \bar{Y} - b_1\bar{X}$	OLS intercept	10
$R^2 = SSR/SST = 1-SSE/SST$	Coefficient of determination	10
$SEE = \sqrt{SSE/(n-2)}$	Standard error of estimate	10
$F = MSR/MSE$	F-test for regression significance	10
$t = \hat{b}_1/s_{\hat{b}_1}$	t-test for slope coefficient	10
$SST = SSR + SSE$	Total = Explained + Unexplained variation	10

📋 Top 5 Exam Traps — Quick Reference

TWR vs. MWR: Use TWR to evaluate manager skill; MWR for investor's personal experience. TWR is the CFA GIPS standard for comparing managers.
Geometric ≤ Arithmetic mean: Always use geometric for historical compounding; arithmetic for expected future single-period returns.
Sample variance uses n−1: Not n. The extra division ensures the estimator is unbiased. Population variance uses N.
"Fail to reject" ≠ "Accept H₀": Lack of evidence against H₀ ≠ proof of H₀. Always say "fail to reject."
Portfolio variance ≠ weighted average: The covariance/correlation term makes portfolio variance lower than the weighted average of individual variances (for ρ < 1). This IS the diversification benefit.

Formula	Description	Reading
\(HPR = \frac{P_1 - P_0 + CF_1}{P_0}\)	Holding period return	1
\(\bar{R}_G = \left[\prod(1+R_t)\right]^{1/T} - 1\)	Geometric mean return	1
\(\bar{R}_H = \frac{N}{\sum 1/R_t}\)	Harmonic mean return (dollar-cost averaging)	1
\((1+r_{real}) = \frac{1+r_{nom}}{1+r_{inf}}\)	Fisher equation — exact real return	1
\(EAR = (1 + r_s/m)^m - 1\)	Effective annual rate	1
\(EAR = e^{r_s} - 1\)	EAR with continuous compounding	1
\(FV = PV(1+r)^N\)	Future value	2
\(PV = FV/(1+r)^N\)	Present value	2
\(PV_{annuity} = PMT \cdot \frac{1-(1+r)^{-N}}{r}\)	Present value of ordinary annuity	2
\(FV_{annuity} = PMT \cdot \frac{(1+r)^N-1}{r}\)	Future value of ordinary annuity	2
\(PV_{perp} = PMT/r\)	Present value of perpetuity	2
\(PV_{grow.perp} = PMT_1/(r-g)\)	Growing perpetuity (r > g required)	2
\(NPV = \sum CF_t / (1+r)^t\)	Net present value	2
\(s^2 = \sum(x_i-\bar{x})^2/(n-1)\)	Sample variance (n−1 denominator)	3
\(CV = s/\bar{x}\)	Coefficient of variation	3
\(Sharpe = (\bar{R}_p - R_f)/\sigma_p\)	Sharpe ratio	3
\(P(A\|B) = P(A \cap B)/P(B)\)	Conditional probability	4
\(P(A\|B) = P(B\|A)P(A)/P(B)\)	Bayes' theorem	4
\(E(X) = \sum x_i P(x_i)\)	Expected value of a random variable	4
\(Var(X) = E(X^2) - [E(X)]^2\)	Variance of a random variable	4
\(E(R_p) = \sum w_i E(R_i)\)	Portfolio expected return	5
\(\sigma_p^2 = w_1^2\sigma_1^2 + w_2^2\sigma_2^2 + 2w_1w_2\rho_{12}\sigma_1\sigma_2\)	Two-asset portfolio variance	5
\(Cov(R_1,R_2) = \rho_{12}\sigma_1\sigma_2\)	Covariance from correlation	5
\(SE = \sigma/\sqrt{n}\)	Standard error of the mean	7
\(CI = \bar{x} \pm t_{\alpha/2} \cdot s/\sqrt{n}\)	Confidence interval (σ unknown)	7
\(t = (\bar{x} - \mu_0)/(s/\sqrt{n})\)	t-test statistic for mean	8
\(\chi^2 = (n-1)s^2/\sigma_0^2\)	Chi-square test for variance	8
\(F = s_1^2/s_2^2\)	F-test for equality of variances	8
\(t = r\sqrt{n-2}/\sqrt{1-r^2}\)	t-test for correlation coefficient	9
\(r_S = 1 - 6\sum d_i^2/[n(n^2-1)]\)	Spearman rank correlation	9
\(b_1 = Cov(X,Y)/Var(X)\)	OLS slope coefficient	10
\(b_0 = \bar{Y} - b_1\bar{X}\)	OLS intercept	10
\(R^2 = SSR/SST = 1-SSE/SST\)	Coefficient of determination	10
\(SEE = \sqrt{SSE/(n-2)}\)	Standard error of estimate	10
\(F = MSR/MSE\)	F-test for regression significance	10
\(t = \hat{b}_1/s_{\hat{b}_1}\)	t-test for slope coefficient	10
\(SST = SSR + SSE\)	Total = Explained + Unexplained variation	10