Abstract

자본주의 사회에서 초기의 자산격차는 시간이 갈수록 기하급수적으로 커지고 있다 (wealth polarization). 'Too big to fail', 'Winner takes All'이라고, 처음에 엄청 운이 좋아 처음부터 돈이 많았거나 중간에 엄청 운이 좋아 중간에 돈이 많아져서 상위 가진자 계급에 속하게 되면, 자신이 가진 자산의 변동성이 적어지기까지 했다. 이같은 자본주의 사회의 구조 (structure)가 진실 (reality)이라면, 매우 효율적인 주식 펀드 운용전략을 세울 수 있다. 초창기 신분계승을 통해서든, 중간에 승자독식을 통해서든, 어쨌든 '가진자계급 (one who has)'이 가진 주식들의 초과수익률이 전체 주식의 초과수익률보다 거의 stochastic dominant할 것으로 예상되기 때문에 (Matthew effect, 부익부빈익빈), 그들을 추종하기만 하면 된다. 실증적으로도 그들을 추종한 주식 포트폴리오는 market portfolio 정도의 수익률을 보였지만, 훨씬 안전했고 안정적이었다. 초과수익률의 차이가 Martingale 해 보이지만, 평균수익률은 비슷하고 negative risk cost가 적었기 때문에 보다 효율적인 포트폴리오 전략일 것이다. 추종 포트폴리오에 들어간 주식들은 turnover가 적었기 때문에 운용 transaction cost도 적었다고 할 수 있다. 한편 이 전략에 들어간 주요명제들은 de facto이지, anomaly가 아닐 것이다.

효율적인 포트폴리오 전략은 시장 비효율성을 고려하여 주식 이자율을 간접적으로 예측하며, TBTF 포트폴리오가 평균 수익률을 유지하면서도 재조정 비용을 줄여 투자 효율성을 높일 수 있음을 보여준다. 구조적 요인인 시장 프리미엄과 규모 프리미엄이 자본 비용 예측에서 가장 중요하다는 점을 강조하며, 복잡한 수치 모델의 한계를 드러내고 단순함을 명확히 할 필요성을 제기한다.

Keywords: No Arbitrage Principle, Realism, Practicism, Prediction, Machine Learning, Theoretical Models, Symmetric Measure, Duality, Performance Measures, Empirical Comparison

1. Introduction

1.1 Wealth Inequality in the Stock Market

주식시장 (Stock Market)에서 거래되는 주식들의 (public-traded stock) 대부분은 소수의 부자들이 소유하고 있다. 2023년 미국의 경우, 재산 상위 1% (100명 중 1명)가 상장주식 전체의 53% 정도 (100개 중 53개)를 소유하고 있으며, 재산 하위 50% (100명 중 50명)은 상장주식 전체의 1% 정도 (100개중 1개)를 소유하고 있다. 주식을 피자에 비유해 보자. 100명이 사는 세상에 피자 100판이 있는데, 1명이 53개의 피자를 먹고 50명은 피자 1판을 나눠서 먹고 있는 것이다. 재산 하위 계급에 속한 사람들이 피자 1판을 1/n로 나눠서 먹는다면, 그들은 1/50 판을 먹고, 재산 상위 계급에 속한 1명은 혼자서 피자 53판을 먹으니, 2500배 이상 차이가 나는 것이다.

같은 주식시장의 엄청난 빈부격차는 왜 생겨났을까? 크게 2가지 정도의 경제구조적 원인을 생각해 볼 수 있다.

첫째, 거의 모든 재산들의 가격성장 비율은 장기적으로 시간에 따라 거의 일정하게 유지되었다. 선진화된 국가들의 자본주의 '경제구조'는 극단적이게 보수적이었다. '모든' 국민의 '모든' 재산에 대한 소유권과 이에 대한 '자유'로운 행사를 '똑같이' 법적으로 보장해 주어, 재산 상위 계급의 지나치게 많은 재산 소유권도 재산 하위 계급의 지나치게 적은 재산 소유권만큼 똑같이 법적으로 보장해 왔다. 따라서 사회계층 양극화 측면에서 보면, 초기 재산량의 격차가 미래 재산량의 격차를 결정했다는 것으로 볼 수 있다.

둘째, 재산 하위계급에서 상위계급으로 이동할 수 있을 만큼 지나치게 많은 재산을 형성할 수 있는 경제적 기회는 '모든' 국민들에게 제공되지만, 기회를 잡는 승자가 될 확률은 로또 당첨 확률만큼 적었다 (미국 대통령 선거와 같은 승자독식 구조).

1.2 Matthew Effect and Initial Condition

미국에 대한 데이터가 가장 많기에 측정오류가 적을 것으로 예상한다. World Inequality Database에서는 재산량 (재산의 크기, 달러)에 따라 사회계급을 4단계로 나누어서 시간에 따른 각 계급별 재산 변화를 조사하였다.

$x(1)$ : Top 1% 재산량 = 100명 중 상위 1등 1명의 재산량
$x(2)$ : Top 10% 재산량 = 100명중 상위 1등을 포함한 1등에서 10등까지 10명의 '평균' 재산량
$x(3)$ : 50-90th percentile 재산량 = 100명중 상위 10등에서 50등까지 40명의 '평균' 재산량
$x(4)$ : Bottom half 재산량 = 100명중 50등에서 꼴지 100등까지 50명의 '평균' 재산량

Log-scale로 장기적 재산량의 변화를 볼때, 모든 계급의 재산량은 시간이 지남에 따라 산술적으로 증가하였다. 재산량의 성장률은 $ln(x)=ln(x_0)+ r \cdot t$ 로 모형화 해 볼수 있으므로, 계급간 빈부의 level 격차가 커진 원인은, 계급간 재산의 성장률 차이가 아니라 원래의 '초기조건 (initial endowment level) 차이'를 유지하고 있는 경제 구조의 영향이 컸다는 것으로 추정해 볼 수 있다 (i.e. intertemporal order-preserving map).

미국의 사회학자 Robert K. Merton은 과학연구에서 초기조건에 따른 격차를 Matthew effect (i.e. a rich-get-richer phenomenon) 라는 용어로 설명하였다. '부익부빈익빈'이다.

무릇 있는 자는 더 많이 받아 풍족하게 되고, 없는 자는 그 있는 것까지 빼앗기리라. — Matthew 25:29

1.3 Winner-Takes-All Market

승자독식 경제구조. 주식시장에 대한 개인적 소유권만이 아니다. 전체의 주가를 움직이는 힘은 소수의 대형 주식들로부터 나온다. The excess return of the market portfolio is driven by large stocks (i.e. large market capitalization).

In economics, a winner-take-all market is a market in which a product or service that is favored over the competitors, even if only slightly, receives an extremely large share of the revenues for that class of products or services.

문제는 현재 자본주의 경제구조는 승자독식을 옹호하게끔 design 되어있다는 점이다. 속으로는 몰래 Monopoly (독과점)을 보호하고, 겉으로는 공개적인 법재정으로 독과점을 방지하고 있는 것이다. 이렇게 누가봐도 불공정한 승자독식 구조를 가진 시장임에도 불구하고, 많은 사람들이 이러한 시장에 뛰어들고 있는 현상은 어떻게 설명할 수 있을까? 가장 쉽고 합리적인 이유는, '살려면 어쩔 수 없으니까'일 것이다.

1.4 Industry Classification as a Basis for Diversification

주식들을 비동질적 (inhomogeneous)으로 나누는 가장 큰 기준은 각 회사들이 속한 산업 (industry)에 의해 결정된다. 모든 주식들에 대한 '산업별 분류법'를 이용하면, 표면적으로 모든 주식들을 mutually exclusive하게 분류할 수 있다. 서로 같은 산업에 속한 서로 다른 주식들은 서로 비슷한 특성을 공유하지만, 서로 다른 산업에 속한 서로 다른 주식들은 서로 매우 다른 특성을 가질 것으로 예상해 볼 수 있다.

금융시장은 가계들이 앞으로 잘될 것이라고 생각하는 산업의 금융상품들이 거래되는 곳이다. 기간간 소비안정화를 바라는 가계는, 자신의 인적자본이 투여되지 않은 '다른 산업'에 대한 '물적자본투자'를 통해 '인적자본투자 (노동소득)'의 위험을 분산하려고 할 것이다.

현재 A 산업의 시가총액은 1) A산업의 현재 상품수요는 '많이', 2) A 산업의 미래 상품수요는 '어느정도만' 반영한다. 그리고 현재 A산업에 속한 a 회사의 주식가격은 1) A산업 내에서 a 회사의 현재 경쟁력은 '많이', 2) A 산업내에서 a 회사의 미래 경쟁력을 '어느정도만' 반영한다.

2. Theoretical Framework

2.1 Philosophy of Prediction

Milton Friedman (Essays in Positive Economics, 1953). Economic assumptions need not be "realistic" to serve as scientific hypotheses; they merely need to make significant predictions.

Economic model에 대한 Friedman의 입장은 매우 실용적이다. 경제모형이 과거와 현재의 현상들을 잘 설명하고 미래의 현상들도 잘 예측하면 좋겠지만, 과거와 현재의 잡다한 현상들을 설명할 바에는 미래의 주요한 현상들을 설명해 주길 바랬다.

덜 극단적인 determinism을 가져와서 세상의 어느 정도는 결정되어 있다면, 미래에 벌어질 일들도 어느 정도는 예측할 수 있다고 추론해 볼 수 있다. 그리고 예측성 (predictability)이 높은 이론 (모형)일 수록, 우리에게 실용성 (practicality)도 높다고 간주해 볼 수 있다.

Prediction의 대상은 a system's future state이다. 시스템의 미래 상태를 제대로 예측할 수 없다고 시스템의 미래 상태에 주요한 영향을 주는 시스템의 구조가 현존하지 않는다고 말할 수는 없다. 시스템의 미래 상태에 대한 예측의 정확성 (accuracy)이 높지 않은 이유 중 하나는, 시스템의 주요 불변 구조 (structure, operator, constant)와 구조가 돌아가는 원리 (mechanism, principle, operation)를 파악하지 못했기 때문일 수 있다.

진리 (truth)의 언어적 의미는 둘 중에 하나일 것이다.

하나는, 사실명제 (is-statement)로 표현된 진리이다. 사회적으로는 다수가 객관적으로 확인하여 참이라고 믿는 진실 (reality) 또는 사실 (fact)이다.

다른 하나는, 당위명제 (ought-statement)로 표현된 진리이다. 사회적으로는 다수가 옳다고 믿는 가치관인 정의 (justice) 또는 이에 따른 윤리 (ethics)이다.

2.3 Expected Return vs. Uncertain Risk

'Expected' return in economics usually refers to an average or a mean operator over past return data. 경제학에서 일반적으로 사용하는 자산의 '기대' 수익률이란 용어는, 통계적인 '평균' 수익률일 뿐, 자산 수요자나 공급자들이 실재로 '기대'하고 있는 수익률이 아니다.

경제학에서 이야기하는 자산수익률의 위험 (risk)은 불확실성 (uncertainty)과 상당히 다르다. 확률을 이용한 미래 수익률에 대한 예측이 계속해서 틀려서 거의 모른다고 간주되는 상황이 uncertainty이고, 그래도 확률을 이용한 예측이 어느 정도 맞아서 그래도 좀 안다고 간주되는 상황이 risky이다.

숫자모형을 통해 미래를 예측하는 숫자모형들 (statistical optimization model)이 내재하고 있는 자유도에는:

estimator for the predicted variable
type of observable predictors
unit and scaling of predictors
number of predictors and model parameters
relationship between predictors
objective cost function
the testable relationship model between the predicted and the predictors
the size ratio between the training data and the testing data
look-back period and look-forward period
type and number of hyperparameters
type of model testing estimators

2.4 Market Structure Hypotheses

2.4.1 Efficient Market Hypothesis and Information Technology

일반적으로 주식가격의 변동성을 일으키는 정보들은 빠르게 주식 가격에 반영되고 이에 따라 주식시장에서 단기간의 차익거래기회는 없을 것이라고 예상하고 있다 (효율적 시장 가설). IT 기술 발달로 인해, 자산 거래를 통한 가격에 변동을 줄 수 있는 정보들은 예전보다 투명하고 빠르게 전달되고 있다.

2.4.2 One-period Markov Chain Hypothesis

시장이 효율적이어서 모든 금융상품들에 대해 확률적으로 변하는 내일의 가격 정보가 모든 시장참여자들에게 시장에 공개되어 있다면, 시장참여자들이 오늘과 내일의 가격 정보를 바탕으로 거래하여 오늘의 자산 가격들을 정한다고 생각해 볼 수도 있다. $n$ 개의 금융상품과 $m$ 개의 확률적 상태가 있다고 가정하면, $n \times m$ 개의 숫자들을 예측해야할 것이다.

2.4.3 One-price Hypothesis

One-price hypothesis or "the law of one price" (LOP) condition is a necessary condition for the No arbitrage strategy condition.

The same asset trades at the same price on all markets
Two assets with identical cash flows trade at the same price

주식투자에서 중요한 점은, 주식들의 가격인 요구수익률 (risk-premium)을 결정하는 확률적 요인들 (risk factors)의 종류와 강도 뿐만이 아니라 이들이 요구수익률에 반영되는 방식도 시간에 따라 계속해서 변할 수 있지만, 그중에서 가장 변하지 않는 요소들을 결정하는 것에 있다.

본 논문에서는 주식투자자들의 요구수익률과 변동성과의 관계를 결정하는 가장 큰 원인 중에 하나가 Size라고 가정하였다. 대부분의 합리적 주식투자자들은 'Too Big to fail'을 알고 있기에, 현재 size가 큰 주식들과 현재 size가 작은 주식들을 서로 다른 (비동질적인) 금융상품으로 여기고 있을 것으로 추론하기 때문이다.

2.4.4 Arbitrage-free Market Hypothesis

Proposition. Absence of linear dominant arbitrage strategy in a financial market does not imply Fair trade structure in the market. However, presence of linear dominant arbitrage strategy in the market implies unfair trade structure in real financial market. The no arbitrage condition in the market is expected to be held only within the same group of assets (i.e. homogeneous assets), not between the different groups of assets. This is why an asset selection is important in portfolio optimization.

2.5 Optimality and Efficiency: A Critique

수학을 이용한 경제학 이론들과 모형들은, 정답을 미리 정해놓은 산수 문제를 풀고 나서, 문제의 답에 대해 '최적 (optimal)', '균형 (equilibrium)', '효율적 (efficiency)'라는 용어를 사용해 많은 사람들을 혼란스럽게 만들곤 한다.

현실에서 기존의 재산과 이를 통한 능력을 통해 얻을 수 있는 자본투자 실질적 기대수익률은, 왕자와 거지가 동일하지 않고 엄청난 차이를 보인다. 이러한 결과를 사회적으로 최적인 결정으로 볼 수 있을까? 개인적으로도 살려고 최선을 다하는 결정하고 노력하는 것을 최적의 결정으로 해석하는 것은 무리가 있을 것이다.

3. Theoretical Models

3.1 Model Uncertainty

Fama and French (1997), Industry Costs of Equity: Most textbooks in corporate finance emphasize the uncertainty in projections of cash flows $x$ . Our main point is that the cost of capital estimates $m$ used to discount cash flows are also unavoidably imprecise.

$y \approx m \cdot x$ 와 같은 discounted cash flow model의 경우, $m$ 과 $x$ 에 대한 estimate인 $\hat{m},\hat{x}$ 둘 다 uncertainty가 높다면, model uncertainty은 그냥 uncertain이라서 ' $y$ 에 대해 아무것도 몰라요' 라고 할 수 있다.

저자들은 다음과 같이 논문을 끝맺는다:

Our message is that the task of project valuation is beset with massive uncertainty. [...] our guess is that whatever the former approach two of the ubiquitous tool in capital budgeting are a wing and a prayer, and serendipity is an important force in outcomes.

3.2 Linear Algebra Representation Models

경제학의 Linear state-space model은 주로 기계 공학에서 사용하는 Linear time-invariant system model과 거의 똑같다. 가장 크고 중요한 차이점은 state-transition equation을 기술하는 데 있다. 공학에서는 실험적으로 결정된 물리량과 물리법칙을 이용하여 해당 방정식이 기술되고, 경제학에서는 각자가 임의로 정한 변수와 모형으로 해당 방정식이 기술된다.

대부분의 경제 모형들이 수학을 이용해 경제이론을 표현하는 방식은, Field, linear space, linear map, bilinear map 등을 사용한 linear algebra에 지나지 않다. 특히 inner product가 부여된 구조와 positive-definite kernel을 중점적으로 다룬다.

Samuelson (1947): Meaningful theorems? hypotheses about empirical data that could conceivably be refuted by empirical data

3.3 Utility Maximization and Risk Aversion

'효용극대화 가설'은, '인간 행복감의 증가량은 돈을 계속 따게 되면 갈수록 줄어든다'이다. 인간의 negative-risk averse 현상을 그대로 표현한 가설이다.

수학 기호를 이용해 표현해보면, 현재의 돈 $X=\mu$ 의 가치가 $X_i=\mu+\sigma$ 로 올랐을 때 느끼게 되는 효용의 변화량 $u(X_i-X)$ 보다 $X_j=mu-sigma$ 로 내렸을 때 느끼게 되는 효용의 변화량 $u(X-X_j)$ 이 항상 크다는 것을 의미한다. i.e. $u(X-X_j) geq u(X_i-X)$ .

Rational은 '선형 비율적, 비례적 linearly proportional'이라는 의미이지, '합리적 reasonable'이라는 의미가 아니다.

3.4 Mean Operator and Covariance Operator

Cauchy–Schwarz inequality: For ALL linear vectors $m$ and $x$ of an inner product space,

| \langle m, x \rangle|^2 \leq \langle m, m \rangle^2 \cdot \langle x, x \rangle ^2

공분산 연산자를 확률 공간에 적용하면:

\sigma_{m,x}:=\rho_{m,x} \ \sigma_m \ \sigma_x \leq \sigma_m \ \sigma_x

where $\rho_{m,x}$ is the Pearson correlation coefficient.

Properties of covariance operator on a probability space with the inner product:

Bilinear: $cov(a_1X_1+a_2X_2, Y)=a_1cov(X_1,Y)+a_2cov(X_2,Y)$
Symmetric: $cov(X_1,X_2)=cov(X_2,X_1)$
Positive-semi-definite: $var(Y):=cov(Y,Y)\geq0$ for all $Y$

금융경제학 응용: No arbitrage trade condition $E_t(m_{t+1}\cdot x_{t+1})=0$ 을 만족하는 경우, 이에 대한 Cauchy-Schwarz inequality 를 표현하면 Hansen-Jagannathan Bound이다.

3.5 Markov Chain Linear State-space Model

Lucas (1978) model: $m(t+1):=\beta \frac{u'(c_{t+1})}{u'(c_t)}$ where $0<\beta \leq1$ and utility function $u(c)$ is CONCAVE.

Power utility: $u(c):=\frac{c^\gamma-1}{\gamma}$ where $c geq0$ . Risk-neutral if $gamma=1$ , Risk-seeking if $gamma >1$ , and Risk-averse CRRA if $gamma <1$ .

3.6 No-Arbitrage and Asset Pricing

Equivalent Representation of the NA Condition (Fundamental Theorems of Asset Pricing)

1st Theorem: LOOP and NA in a perfectly competitive market $\iff$ The $m(t+1)\geq0$ fairly affects ALL traded homogeneous assets and discounts them to have no excess return.

2nd Theorem: Furthermore, the market is complete $\iff$ the $m(t+1)$ is unique.

NA condition model의 Euler equation:

1 \approx E_t(m \cdot R_i)=cov(m,R_i)+E(m)E(R_i)

이를 정리하면:

E(R_i)=R_f+\beta_{i,m}\cdot \lambda

where $\beta_{i,m}:=$ linear projection coefficient and $\lambda:=$ market price of risk.

Unsystematic (i.e. idiosyncratic) risk, not linearly correlated with the discount factor, neither generates the risk-premium nor affects the estimated price under the models with the NA hypothesis. Only the component of a cash flow perfectly correlated with the discount factor generates an extra return.

Linear Risk Factor Models

The expected-return-beta representation:

\bar{r}_i \approx \hat{\mathbf{b}}_i' \cdot \mathbf{\lambda} +\alpha_i

CAPM 1 factor: $\bar{r}_i \approx \hat{\beta}_{m,i} \cdot r_m + \alpha_i$
Fama-French 3 factor: $\bar{r}_i \approx \hat{\beta}_{m,i} r_m + \hat{\beta}_{smb,i} r_{smb} + \hat{\beta}_{hml,i} r_{hml} + \alpha_i$

Cross-sectional Sharpe ratio representation:

\frac{E(r_i)}{\sigma_i(r_i)}\approx \rho_i(r_i,f) \cdot \frac{E(f)}{\sigma_f(f)}

"Asset pricing is all about covariances" – Lars Peter Hansen

Financial market risk factors are said to represent some aspect or dimension of undiversifiable systematic risk which should be compensated with higher expected returns. But why should the risk factors be compensated? It is rather the ought-to statement than the is-statement.

4. Machine Learning and Econometric Models

4.1 Autoregressive Models

4.1.1 Adaptive Expectation Model

'미래 수익률이 과거 수익률의 가중 평균으로 수렴한다'는 적응적 기대 모형의 핵심 가정은, 경제주체들이 예측 오차를 점진적으로 수정한다는 것이다.

\hat{r}_{t+1} = \hat{r}_t + \lambda(r_t - \hat{r}_t)

Infinite geometric lag model로 표현하면 $\hat{r}_{t+1} = \lambda \sum_{k=0}^{\infty}(1-\lambda)^k r_{t-k}$ 이 된다. 즉, 과거 수익률들의 지수 가중 이동 평균 (EWMA)과 동일한 구조이다.

4.1.2 AR, VAR, ARIMA Models

AR(p) 모형:

r_t = \mu + \sum_{k=1}^{p} \phi_k r_{t-k} + \varepsilon_t, \quad \varepsilon_t \sim WN(0, \sigma^2)

AR 모형이 유효한 예측력을 가지려면 covariance stationarity 조건이 필요하다. 그런데 개별 주식 수익률은 대개 i.i.d.에 가까워서 AR 모형의 계수들이 통계적으로 유의미하게 추정되지 않는다.

VAR(p) 모형:

\mathbf{r}_t = \boldsymbol{\mu} + \sum_{k=1}^{p} \mathbf{\Phi}_k \mathbf{r}_{t-k} + \boldsymbol{\varepsilon}_t

거시경제변수와 주식수익률을 함께 VAR에 넣어 주가 예측력을 테스트한 실증연구들 (Campbell & Shiller, 1988; Cochrane, 2008)은 장기 수익률 예측 가능성에 대한 중요한 증거를 제시하였다.

ARIMA(p,d,q) 모형은 비정상 시계열에 차분 연산자 $(1-L)^d$ 를 적용하여 정상화한 다음 ARMA(p,q)를 적용하는 모형이다.

4.2 ARCH / GARCH: Volatility Clustering Models

주식 수익률의 수준을 예측하는 것만큼 중요한 것은 그 변동성을 예측하는 것이다. Mandelbrot (1963)는 금융 수익률이 '변동성 군집 (volatility clustering)' 현상을 보인다고 관찰하였다.

ARCH(q) 모형:

h_t = \omega + \sum_{i=1}^{q} \alpha_i \varepsilon_{t-i}^2

GARCH(p,q) 모형 (Bollerslev, 1986):

h_t = \omega + \sum_{i=1}^{q} \alpha_i \varepsilon_{t-i}^2 + \sum_{j=1}^{p} \beta_j h_{t-j}

안정성 조건: $sum_{i=1}^q alpha_i + sum_{j=1}^p beta_j < 1$ . 실증적으로 주식시장에서 $\alpha_1 + \beta_1 \approx 0.97 \sim 0.99$ 수준으로 추정되어 충격의 지속성이 매우 높다.

확장 모형들: EGARCH (Nelson, 1991), GJR-GARCH (Glosten et al., 1993), DCC-GARCH (Engle, 2002).

4.3 Cross-sectional Models: Factor Models and Penalized Regression

Fama-MacBeth (1973) 2-단계 회귀: 1단계에서 time-series regression으로 $hat{boldsymbol{beta}}_i$ 를 추정하고, 2단계에서 매 기간 cross-sectional regression을 실행.

GMM / SDF representation: Hansen (1982)의 GMM을 이용하여 $E(m_{t+1} cdot R_{i,t+1}) = 1$ 의 moment condition을 직접 추정.

Penalized Regression

'Factor Zoo' 문제 (Cochrane, 2011): 300개 이상의 cross-sectional anomaly factors. 기본 구조:

\hat{\boldsymbol{\beta}} = \arg\min_{\boldsymbol{\beta}} \left\{ \sum_{i=1}^n \left(r_i - \boldsymbol{\beta}' \mathbf{x}_i\right)^2 + P_\lambda(\boldsymbol{\beta}) \right\}

Ridge ( $L_2$ penalty): 다중공선성이 심한 경우에 유효
LASSO ( $L_1$ penalty): 자동 변수 선택 수행. Factor zoo에서 유의미한 factors를 선택하는 용도에 적합
Elastic Net: LASSO와 Ridge의 혼합

4.4 Tree-based Models

Random Forest (Breiman, 2001): Bootstrap aggregation으로 다수의 트리를 훈련하되, 각 분기점에서 무작위로 선택된 $m < p$ 개의 특성만 사용하여 트리 간 de-correlation을 유도.

Gradient Boosting (Friedman, 2001):

\hat{f}_m(\mathbf{x}) = \hat{f}_{m-1}(\mathbf{x}) + \eta \cdot h_m(\mathbf{x})

Gu, Kelly, and Xiu (2020, Review of Financial Studies)는 미국 주식시장 데이터에 Random Forest, Gradient Boosting, Neural Network 등을 적용하여 선형 모형 대비 유의미하게 개선된 out-of-sample 수익률 예측력을 보고하였다.

4.5 Neural Networks and Deep Learning

Feedforward Neural Network (FNN): Universal approximation theorem에 의해 충분히 넓은 단층 신경망은 임의의 연속 함수를 원하는 정밀도로 근사할 수 있다. 그러나 금융 데이터의 근본적 한계:

금융 데이터는 이미지, 텍스트 데이터에 비해 훨씬 작고, 신호 대 잡음비가 낮다
Regularization을 강하게 적용하지 않으면 overfitting이 심각하다
Hyperparameter에 대한 민감도가 높아 out-of-sample 성과의 재현성이 낮다

LSTM: 시계열 금융 데이터의 장기 의존성을 포착하는 데 사용. 실증 성과는 혼재.

Transformer and Attention Mechanism: 금융 응용은 장기 시계열 예측에서 유망한 결과를 보이고 있으나, 경제적 유의성은 여전히 검증 중.

4.6 Reinforcement Learning for Portfolio Optimization

RL은 예측 문제가 아니라 순차적 의사결정 문제로 포트폴리오 관리를 프레임화한다.

State $s_t$ : 현재 포트폴리오 비중, 시장 상태
Action $a_t$ : 포트폴리오 재조정 비중 벡터 $\mathbf{w}_t$
Reward $r_t$ : 포트폴리오 수익률 (거래비용 차감 후)
Policy $pi(a_t | s_t)$ : 상태에서 행동으로의 확률적 mapping

그러나 RL 기반 포트폴리오 전략의 실질적 한계는 명확하다: 비정상성으로 인한 일반화 어려움, 낮은 sample efficiency, reward function 설계의 주관성, 현실적 제약 모형화의 어려움.

4.7 Model Evaluation and Limits

Bias-Variance Tradeoff

E\left[(y - \hat{f}(\mathbf{x}))^2\right] = \underbrace{\left(f(\mathbf{x}) - E[\hat{f}(\mathbf{x})]\right)^2}_{\text{Bias}^2} + \underbrace{E\left[\left(\hat{f}(\mathbf{x}) - E[\hat{f}(\mathbf{x})]\right)^2\right]}_{\text{Variance}} + \underbrace{\sigma_\varepsilon^2}_{\text{Irreducible Noise}}

금융 데이터에서 irreducible noise가 압도적으로 크기 때문에, 복잡한 모형과 단순한 모형의 실제 예측 성과 차이가 미미한 경우가 많다. Gu, Kelly, and Xiu (2020)조차도 모형의 out-of-sample $R^2$ 는 기껏해야 2~3% 수준에 불과함을 보고하였다.

The Structural Critique of ML in Finance

💡

핵심 요약: 금융 ML은 '더 나은 예측기'라기보다 '더 유연한 함수 근사기'이다. 유연성은 과거 데이터가 풍부하고 패턴이 안정적일 때 강점이지만, 금융시장의 비정상성과 적응적 특성이 이 강점을 상쇄한다. 경제적 구조 (economic structure)에 기반한 모형 설계와 엄격한 out-of-sample 평가만이 data snooping의 함정에서 벗어나게 한다.

5. The TBTF Portfolio Strategy

5.1 Motivation and Design Principles

경제시장의 변화와 회사들에 대한 social selection에는, physical capital과 human capital을 포함한 생산 자원들이 이미 자원들을 많이 가진 소수의 회사들에 더욱 집중되는 Matthew effect가 관여하는 것으로 보인다.

핵심 설계 원리:

1995년 이전과 이후의 세상은 달라졌다. 따라서 1994년–2023년의 시장 데이터를 이용하였다.
주식시장 전체 포트폴리오의 초과수익률과 비교해서, 평균수익률은 비슷하지만 변동성이 적고 손실위험이 적은 포트폴리오를 구성한다.
대형 펀드들의 주요 rebalancing period인 monthly를 사용하였다.
공매도 상태 (short-sell position)가 없게 만든다. 항상 long position만을 유지한다.

5.2 Portfolio Allocation Strategy

주식 종류의 개수를 결정한다. TBTF 전략은 100개 (미국 전체주식수의 3% 정도)로 정했다.

Inter-industry weighting scheme: Value-weighted. Monthly rebalancing 시점의 market cap 비중을 따른다. Industry beta momentum 이용.
Intra-industry asset selection scheme: Industry 별로 정해진 개수만큼 market cap Top에서 주식들을 고른다. Individual size momentum 이용.
Intra-industry asset weighting: Equal-weighted. 산업내에서는 다음 달에 시가총액 순위가 바뀔 수 있으므로, Uncertainty 상황에서 최적의 선택은 $1/n$ 으로 가정한다.

5.3 Formal Model

Notation:

$x$ = monthly excess return of a stock over a bond
$D$ = number of stochastically independent industries (e.g. $D=10$ )
$T$ = number of historical time-series observations (e.g. $T=60$ months)
$\mathbf{x}_d=(x_1, x_2, ..., x_T)'_d$ is a ( $T times 1$ ) time-series vector of the $d$ industry value-weighted portfolio
$w=(w_1,\dots,w_D)'$ represents a ( $D times 1$ ) current market-cap weight of each industry at time $T$

Hypothesis 1 (Sectoral independence): The random variable $\mathbf{x}_{d_i}$ and $\mathbf{x}_{d_j}$ are mutually independent for all $d=1,2,\dots,D$ and they have an identical joint probability distribution with its mean vector $\mu:=E(X)$ and its auto-covariance $V(X)=Omega_{Dtimes D}$ , a diagonal matrix because of the independence.

Hypothesis 2 (Linear risk-premium): The $\Omega$ is a bijective linear map between the conditional expected return $\mu$ and the revealed preference $w$ for the future such that $mu=Omega w$ .

Hypothesis 3: The industry portfolio with the current market-cap weights $w_t$ is a conditionally optimal portfolio because it minimizes uncertainties in the future consumption basket or maximizes diversification benefit given a fixed amount of current capital.

For the expected-return-beta representation, if $\hat{\mu}_{t+1} = (\hat{\Omega} w)_t+\mathbf{\alpha}_{t+1}$ where $E(alpha_{t+1})=0$ , then the current market cap weight on each industry $w_t$ is the present value of the future cash flow $\hat{\mu}_{t+1}$ discounted with a factor $hat{Omega}^{-1}$ .

5.4 Why Equal-weighted Within Industry Leading Stocks?

If asset prices do not change in perfect synchrony, a diversified portfolio will have less variance than the weighted average variance of its constituent assets, and often less volatility than the least volatile of its constituents.

6. Conclusion

There exists an arbitrage opportunity only for people with a vested interest in an exchange market. There also exists limits to no arbitrage in the exchange market, due to competitive market failures. If there are no dominant trading strategies, then the one price hypothesis holds. However, the converse is not necessarily true.

Over long durations passively managed funds consistently overperform actively managed funds. Relative return is a measure of the return or profit of an investment portfolio relative to a theoretical passive reference portfolio or benchmark.

Factor ETFs are index funds that use enhanced indexing, which combines active management with passive management in an attempt to beat the returns of an index. 나의 TBTF portfolio는 Nasdaq+NYSE+AMEX 다 합쳐서 100개 미만의 Factor strategy ETF 이다. DIA (30개, 1998), QQQ (100개, 1999), SPY (500개, 1993), FF market portfolio (5,000개)랑 monthly return의 분포를 비교한 결과로 승부한다.

현실의 시장이 비효율적이라면, 시장의 효율성과 합리성을 가정하고 미래를 예측하는 모형이 실제적으로 맞을 확률은 쓰레기를 쓰레기통에 넣으면 맛있는 음식이 나올 확률과 비슷할 것이다. 경제학 연구는 언제나 효율적이고 합리적인 시장을 가정한 예측모형 연구에 중점을 두지 말고, 현실적으로 비효율적인 금융시장을 어떻게 효율적으로 개선할지에 대한 방안 연구에 중점을 두었으면 좋겠다.

6.1 Empirical Results Summary

본 논문은 미국 주식시장에서 주식 이자율의 흐름을 간접적으로 예측하였다. 각 산업별로는 market beta (주식시장 전체와의 공분산), 각 산업내에서는 size (현재의 시가총액순위)가 가까운 미래에도 가장 명백히 유지될 주식시장의 구조로 가정하였다.

지난 30년간의 데이터들을 이용해 monthly-rebalancing stock portfolio의 out-of-sample test를 진행하였다. TBTF 포트폴리오의 결과:

1999년에서 2023년까지의 주식시장은 총 3번의 큰 변화를 겪었다: 2000년대 초의 Dot-com bubble, 2008년의 Subprime mortgage crisis, 2020년의 COVID-19 shock
TBTF portfolio에 들어가는 selected assets turnover는 적었다. 100개에서 시작해서 25년간 다룬 회사들이 총 300개 정도
3번의 큰 변화를 겪어도, TBTF portfolio의 초과수익률 분포는 symmetric했다
초과수익률 min의 절대값이 시장포트폴리오 min의 절대값보다 훨씬 작았다
손실이 예상되는 40%의 확률 구간에서 TBTF portfolio excess return이 market portfolio excess return보다 stochastic dominant 했다
TBTF portfolio excess return의 분포는 market portfolio excess return의 mean을 중심으로 상대적으로 분산이 작은 normal distribution에 근사한다

6.2 Academic Contribution

TBTF 포트폴리오에 들어간 구조적 요인들의 중요성은, 모든 회사 주식들의 cost of capital 예측에서 다른 어떠한 위험요소들보다 market premium과 size premium이 실무에서 가장 중요하게 고려되는 이유를 제공한다. 본 논문의 예측모형을 직접적 위험보상 형식의 선형적 주식이자율 예측모형으로 표현해 본다면, two-factor model이라고 할 수 있다.

한편 여러 숫자모형들에 활용된 수학적 표현들을 대수학, 기하학의 원리들과 연결하여 설명함으로서 숫자모형의 한계와 복잡해 보이는 것들에 숨은 단순함을 보다 명확히 드러낼 것으로 기대한다.

TBTF 포트폴리오는 평균수익률을 유지하며 disastrous loss를 피하는 확률이 statistical dominant 했기에, 보다 안전한 ETF 운영에 활용될 수 있을 것으로 기대한다. 또한 TBTF 포트폴리오는 rebalancing turnover가 적었기 때문에 적은 재산을 가지고 주식을 거래하는 투자자들의 transaction cost를 줄여 투자의 효율성을 높일 수 있을 것으로 기대한다.

References

Arrow, K. J. & Debreu, G. (1954). Existence of an Equilibrium for a Competitive Economy.
Bollerslev, T. (1986). Generalized Autoregressive Conditional Heteroskedasticity. Journal of Econometrics.
Breiman, L. (2001). Random Forests. Machine Learning.
Campbell, J. Y. & Shiller, R. J. (1988). The Dividend-Price Ratio and Expectations of Future Dividends and Discount Factors. Review of Financial Studies.
Cochrane, J. H. (2001). Asset Pricing. Princeton University Press.
Cochrane, J. H. (2011). Presidential Address: Discount Rates. Journal of Finance.
Delbaen, F. & Schachermayer, W. (2006). The Mathematics of Arbitrage. Springer Finance.
Engle, R. F. (1982). Autoregressive Conditional Heteroskedasticity. Econometrica.
Engle, R. F. (2002). Dynamic Conditional Correlation. Journal of Business & Economic Statistics.
Fama, E. F. (1965, 1970). Efficient Capital Markets. Journal of Finance.
Fama, E. F. (1991). Efficient Capital Markets: II. Journal of Finance.
Fama, E. F. & French, K. R. (1993). Common Risk Factors in the Returns on Stocks and Bonds. Journal of Financial Economics.
Fama, E. F. & French, K. R. (1997). Industry Costs of Equity. Journal of Financial Economics.
Fama, E. F. & MacBeth, J. D. (1973). Risk, Return, and Equilibrium. Journal of Political Economy.
Frank, R. H. & Cook, P. J. (1996). The Winner-Take-All Society.
Freyberger, J., Neuhierl, A., & Weber, M. (2020). Dissecting Characteristics Nonparametrically. Review of Financial Studies.
Friedman, J. H. (2001). Greedy Function Approximation. Annals of Statistics.
Friedman, M. (1953). Essays in Positive Economics.
Gu, S., Kelly, B., & Xiu, D. (2020). Empirical Asset Pricing via Machine Learning. Review of Financial Studies.
Hansen, L. P. (1982). Large Sample Properties of Generalized Method of Moments Estimators. Econometrica.
Harvey, C. R., Liu, Y., & Zhu, H. (2016). …and the Cross-Section of Expected Returns. Review of Financial Studies.
Kozak, S., Nagel, S., & Santosh, S. (2020). Shrinking the Cross-Section. Journal of Financial Economics.
Lo, A. W. (2004). The Adaptive Markets Hypothesis. Journal of Portfolio Management.
Lucas, R. E. (1978). Asset Prices in an Exchange Economy. Econometrica.
Mandelbrot, B. (1963). The Variation of Certain Speculative Prices. Journal of Business.
Merton, R. K. (1968). The Matthew Effect in Science. Science.
Samuelson, P. A. (1947). Foundations of Economic Analysis.
Tibshirani, R. (1996). Regression Shrinkage and Selection via the Lasso. JRSS Series B.
Vaswani, A. et al. (2017). Attention Is All You Need. NeurIPS.