Ráfaga: Log-Mean Reversion of VIX

View on GitHub

Ráfaga – Spanish for “gust” – began as a gust of hubris, a conviction that I could model the CBOE Volatility Index with a handful of characteristic functions, some complex integration, and a prayer. The source material: Qunfang Bao’s 2013 PhD thesis, Mean-Reverting Logarithmic Modeling of VIX, a paper I found sitting in the MPRA archives like a loaded weapon waiting for someone dumb enough to pick it up.

I picked it up.

The idea is seductive in its clarity. The logarithm of the VIX follows an Ornstein-Uhlenbeck process. Mean-reverting. Elegant. You bolt on Poisson jumps with exponential size distributions to capture the fat tails, and suddenly you have a model that prices VIX futures and options with ~97% accuracy. The theory is beautiful. The implementation is where Python tried to kill me.

But before we get to the carnage, let me build the math from scratch. If you survived a semester of calculus and remember what an integral is, you can follow this. I will take you from a rubber band to a jump-diffusion characteristic function, one layer at a time.

Part I: The Rubber Band – What Mean Reversion Means

Ordinary Differential Equations and Springs

Imagine you have a number $Y$ that represents, say, the log of a stock’s volatility. Right now $Y$ is at some value, and it has a “home” it wants to return to, which we call $\theta$. The simplest model of this homing behavior is an ordinary differential equation:

\[\frac{dY}{dt} = \kappa(\theta - Y)\]

This says: the rate of change of $Y$ is proportional to how far $Y$ is from home. If $Y$ is above $\theta$, the quantity $(\theta - Y)$ is negative, so $Y$ decreases. If $Y$ is below $\theta$, the quantity is positive, so $Y$ increases. The constant $\kappa > 0$ controls the speed. Big $\kappa$, fast return. Small $\kappa$, slow drift.

You solved this in your first ODE course. Separate variables:

\[\frac{dY}{\theta - Y} = \kappa\,dt\]

Integrate both sides:

\[-\ln\lvert\theta - Y\rvert = \kappa t + C\]

Exponentiate, apply the initial condition $Y(0) = Y_0$, and you get:

\[Y(t) = \theta + (Y_0 - \theta)\,e^{-\kappa t}\]

This is exponential decay toward $\theta$. A rubber band. A spring. Nothing stochastic yet, just a deterministic pull toward home. The “half-life” of a displacement from $\theta$ is $t_{1/2} = \frac{\ln 2}{\kappa}$. If $\kappa = 5$ (annualized), the half-life is about 50 trading days. If the VIX spikes from 15 to 40, you expect it to be halfway back to its long-term mean in roughly two months. This is not a guess. This is what 20 years of daily VIX data confirms when you fit an AR(1) regression to the log-VIX time series.

Adding Noise: The Stochastic Differential Equation

The deterministic model is too clean. The VIX does not slide back to $\theta$ along a smooth exponential curve. It wobbles. It jitters. It sometimes mean-reverts in a day and sometimes takes six months. We need randomness.

In calculus you learned about the Riemann integral. In stochastic calculus, there is an analogous object called a Wiener process (or Brownian motion), denoted $W_t$. The key properties:

$W_0 = 0$
Increments $W_{t+\Delta t} - W_t$ are normally distributed with mean 0 and variance $\Delta t$
Non-overlapping increments are independent

Think of $W_t$ as the cumulative sum of infinitely many infinitesimal coin flips. At any finite time $t$, $W_t \sim \mathcal{N}(0, t)$. The paths are continuous but nowhere differentiable – a fractal jitter.

We bolt this onto our rubber band:

\[dY_t = \kappa(\theta - Y_t)\,dt + \sigma\,dW_t\]

This is the Ornstein-Uhlenbeck (OU) process, and it is the foundation of everything that follows. The first term is the deterministic pull toward $\theta$. The second term is random noise scaled by $\sigma$, the “volatility of volatility” (or vol-of-vol). Every instant, the process gets tugged toward home and simultaneously kicked by a random perturbation.

What does the solution look like? Apply the integrating factor $e^{\kappa t}$ (just like you would for a first-order linear ODE) and integrate:

\[Y_t = \theta + (Y_0 - \theta)e^{-\kappa t} + \sigma\int_0^t e^{-\kappa(t-s)}\,dW_s\]

The first two terms are the deterministic solution you already know. The third term is a stochastic integral – a weighted sum of all the random shocks from time 0 to $t$, where recent shocks (small $t - s$) are weighted heavily and old shocks (large $t - s$) are exponentially forgotten. This is the mean-reverting memory of the process.

Since the stochastic integral is a sum of normals (with deterministic weights), $Y_t$ is itself normally distributed:

\[Y_t \sim \mathcal{N}\left(\theta + (Y_0 - \theta)e^{-\kappa t},\;\frac{\sigma^2}{2\kappa}(1 - e^{-2\kappa t})\right)\]

The mean decays exponentially toward $\theta$. The variance saturates at $\frac{\sigma^2}{2\kappa}$ as $t \to \infty$. This is the stationary distribution. No matter where the process starts, it eventually forgets its initial condition and fluctuates around $\theta$ with a fixed spread determined by the ratio $\sigma^2 / \kappa$. High noise and low reversion means wide fluctuations. High reversion and low noise means the process hugs $\theta$ tightly.

This is the MRLR model: Mean-Reverting Logarithmic. We set $Y_t = \ln(\text{VIX}_t)$, so the VIX itself is log-normally distributed around $e^\theta$. The model has three parameters: $\kappa$ (reversion speed), $\theta$ (long-term log-VIX level), and $\sigma$ (vol-of-vol). Estimate them from data, and you can price VIX futures and options.

But Why Mean-Reverting? Why Not a Random Walk?

This is worth pausing on, because the choice of mean reversion is not cosmetic. It is the single most consequential modeling decision in the entire project, and if you get it wrong, everything downstream is garbage.

Stock prices are typically modeled as geometric Brownian motion – a random walk with drift. The idea is that Apple’s stock price today does not “want” to return to any particular level. There is no gravitational home. If AAPL is at 200, it is just as likely to wander to 250 as to drift back to 150. The efficient market hypothesis says the current price already reflects all information, so there is no force pulling it anywhere. You model it as a random walk and that is a reasonable first approximation.

The VIX is not a stock price. The VIX is a measure of fear.

Fear has a baseline. Humans cannot sustain panic indefinitely. Markets cannot sustain 80-vol indefinitely. When the VIX is at 40, traders are terrified, hedging is expensive, and the entire options market is priced for Armageddon. But Armageddon, as a rule, does not last. The panic subsides. The hedges get unwound. Implied volatility collapses back toward its long-term average. Conversely, when the VIX is at 10, complacency reigns. Everyone is short vol. Everyone is selling puts. And then some exogenous shock – a pandemic, a war, a bank failure, a leveraged fund blowing up – arrives, and the VIX spikes back up. The floor is soft but real.

This is mean reversion. The VIX has a home. It wanders away from home – sometimes violently – but it always comes back. Twenty-two years of daily data confirm this: fit an AR(1) regression to log-VIX, and the coefficient $\beta$ is significantly less than 1. If $\beta = 1$, you have a random walk. If $\beta < 1$, you have mean reversion. For the VIX, $\beta \approx 0.98$ at a daily frequency, which implies an annualized $\kappa$ of roughly 5. The VIX is not a random walk. It is a rubber band anchored to a long-term mean of about $e^{2.8} \approx 16$.

If you modeled the VIX as a random walk, your futures prices would be wrong (they would not converge to a long-term level), your option prices would be wrong (you would misprice the term structure), and your hedging ratios would be wrong (you would ignore the gravitational pull). Mean reversion is not an assumption. It is a fact about what the VIX is – a fear gauge with a home address.

The Ornstein-Uhlenbeck process is the simplest continuous-time model that captures this fact. It is exactly a random walk plus gravity. That is why Bao chose it, and that is why it works.

Part II: Pricing – From Process to Product

VIX Futures

A VIX futures contract is an agreement to exchange cash based on the VIX at some future date $T$. Under risk-neutral pricing, the futures price $F(t, T)$ is the expected value of $\text{VIX}_T$ given today’s information:

\[F(t, T) = \mathbb{E}^{\mathbb{Q}}[\text{VIX}_T \,\vert\, \mathcal{F}_t]\]

Since $Y_T = \ln(\text{VIX}_T)$ is normal under MRLR, $\text{VIX}_T = e^{Y_T}$ is log-normal, and the expected value of a log-normal random variable $e^X$ where $X \sim \mathcal{N}(\mu, \sigma^2)$ is $e^{\mu + \sigma^2/2}$. Plug in our expressions for the mean and variance of $Y_T$:

\[F(t, T) = \exp\!\left(e^{-\kappa\tau}\ln(\text{VIX}_t) + \theta(1 - e^{-\kappa\tau}) + \frac{\sigma^2}{4\kappa}(1 - e^{-2\kappa\tau})\right)\]

where $\tau = T - t$. Written more compactly:

\[F(t, T) = \text{VIX}_t^{\,\phi} \cdot M\]

where $\phi = e^{-\kappa\tau}$ is the mean-reversion factor and $M$ absorbs the rest. As $\tau \to \infty$, $\phi \to 0$, and the futures price converges to a constant determined solely by $\theta$ and $\sigma/\kappa$. The current VIX level becomes irrelevant for long-dated futures. This is the term structure of VIX futures, and it is why VIX futures are almost always in contango (upward-sloping) – the long-term expected VIX sits above the current spot when VIX is low.

VIX Options (Black-Scholes Variant)

A European call option on the VIX with strike $K$ and expiry $T$ pays $\max(\text{VIX}_T - K, 0)$ at expiry. Under risk-neutral pricing:

\[C(t, T, K) = e^{-r\tau}\,\mathbb{E}^{\mathbb{Q}}[\max(\text{VIX}_T - K, 0)]\]

Since $\text{VIX}_T$ is log-normal under MRLR, this expectation has a closed-form solution identical in structure to the Black-Scholes formula. Define:

\[\sigma_{\text{eff}}^2 = \frac{\sigma^2}{2\kappa}(1 - e^{-2\kappa\tau})\]

This is the total variance of $\ln(\text{VIX}_T)$ over the interval $[t, T]$. Then:

\[d_1 = \frac{\ln(F/K) + \frac{1}{2}\sigma_{\text{eff}}^2}{\sigma_{\text{eff}}}, \qquad d_2 = d_1 - \sigma_{\text{eff}}\] \[C(t, T, K) = e^{-r\tau}\left(F\,\Phi(d_1) - K\,\Phi(d_2)\right)\]

where $\Phi$ is the standard normal CDF. This is it. The MRLR option price. It looks like Black-Scholes because it is Black-Scholes – just with a mean-reverting forward and an effective volatility that accounts for the OU dynamics instead of geometric Brownian motion.

The problem: this works fine for at-the-money options but fails on the wings. The log-normal distribution from pure diffusion does not have fat enough tails to explain the prices of deep out-of-the-money VIX calls. The market prices those calls as though the VIX can suddenly spike 10 points in a day – because it can. The MRLR model, with its smooth Gaussian increments, says this is vanishingly unlikely. The market disagrees.

We need jumps. Jump like Jordan – the VIX does not get to 80 by dribbling.

Part III: The Jumps – Making the Model Honest

What Is a Jump Process?

In the OU process, the path of $Y_t$ is continuous. It wiggles, but it never teleports. In reality, the VIX can leap from 15 to 30 overnight (February 5, 2018, anyone?). A continuous process cannot produce this behavior without an absurdly large $\sigma$, which would then overestimate volatility the rest of the time.

Enter the Poisson process $N_t$. This is a counting process: $N_t$ counts the number of “events” (jumps) that have occurred by time $t$. The jumps arrive randomly at rate $\lambda$ – on average, $\lambda$ jumps per unit time. The probability of exactly $k$ jumps in an interval of length $\Delta t$ is:

\[P(N_{t+\Delta t} - N_t = k) = \frac{(\lambda \Delta t)^k}{k!}\,e^{-\lambda \Delta t}\]

For small $\Delta t$, there is at most one jump (probability $\approx \lambda \Delta t$) or no jump (probability $\approx 1 - \lambda \Delta t$). The Poisson process is the simplest model of rare, discrete events.

Now, each time a jump arrives, the VIX does not just nudge – it leaps by a random amount $J$. In Bao’s model, the jump sizes are exponentially distributed with parameter $\eta > 0$:

\[J \sim \text{Exp}(\eta), \qquad f_J(j) = \eta\,e^{-\eta j}, \quad j \geq 0\]

The mean jump size is $1/\eta$. The jumps are always positive – the VIX jumps up. This is physically correct: volatility spikes are sudden upward moves driven by panic. Volatility does not “spike” downward; it decays downward, which the mean-reversion term already handles. The asymmetry between jumps (sudden, upward) and decay (gradual, gravitational) is the fundamental dynamic of the VIX, and the MRLRJ model captures both.

The MRLRJ Stochastic Differential Equation

Bolt the Poisson jumps onto the OU process:

\[dY_t = \kappa(\theta - Y_t)\,dt + \sigma\,dW_t + J_t\,dN_t\]

Three terms:

$\kappa(\theta - Y_t)\,dt$ – the deterministic pull toward home
$\sigma\,dW_t$ – continuous random noise
$J_t\,dN_t$ – discrete random jumps: at each Poisson event (on average $\lambda$ times per year), $Y_t$ increases by a random amount $J_t \sim \text{Exp}(\eta)$

This is the MRLRJ model: Mean-Reverting Logarithmic with Random Jumps. Five parameters now: $\kappa$, $\theta$, $\sigma$, $\lambda$ (jump frequency), and $\eta$ (inverse mean jump size).

Why Jumps Break Closed-Form Pricing

Under MRLR, $Y_T$ was normally distributed, so $\text{VIX}_T$ was log-normal, and we used the Black-Scholes formula. Under MRLRJ, $Y_T$ is not normally distributed. The jump component adds a mixture of shifted exponentials to the Gaussian base, creating a distribution with a heavier right tail – exactly the fat tail the market is pricing.

There is no closed-form expression for the CDF of this distribution. You cannot write down $d_1$ and $d_2$ and call it a day. But there is another way in.

The Characteristic Function

Here is where we need one idea from analysis that you may not have seen in a standard calculus sequence, but which is straightforward once stated.

The characteristic function of a random variable $X$ is:

\[\psi_X(u) = \mathbb{E}[e^{iuX}] = \int_{-\infty}^{\infty} e^{iux}\,f_X(x)\,dx\]

This is the Fourier transform of the probability density $f_X$. The function $\psi_X(u)$ encodes all distributional information about $X$. If you know $\psi_X$, you can recover $f_X$ by inverse Fourier transform. You can compute any moment by differentiating $\psi_X$ at $u = 0$. And crucially, you can compute option prices.

Why use characteristic functions instead of the density? Because for many processes – including jump-diffusions – the characteristic function has a closed-form expression even when the density does not.

For the MRLRJ model, after considerable algebra (Bao’s paper works through it in detail), the characteristic function of $Y_T = \ln(\text{VIX}_T)$ given today’s state is:

\[\psi(u) = \exp\!\Big(iu\phi\ln(\text{VIX}_t) + iu\theta(1-\phi) - \frac{u^2\sigma^2}{4\kappa}(1 - e^{-2\kappa\tau}) + \frac{\lambda}{\kappa}\ln\frac{\eta - iu\phi}{\eta - iu}\Big)\]

where $\phi = e^{-\kappa\tau}$ as before. Let me walk through each term so you see where it comes from:

Term 1: $iu\phi\ln(\text{VIX}_t)$. This is the initial condition, decayed by mean reversion. The factor $\phi = e^{-\kappa\tau}$ tells you how much the current VIX level matters at expiry. For short-dated options ($\tau$ small), $\phi \approx 1$ and the current VIX dominates. For long-dated options, $\phi \to 0$ and the initial condition fades.

Term 2: $iu\theta(1-\phi)$. This is the mean reversion target. As $\tau \to \infty$, $(1-\phi) \to 1$, and this term becomes $iu\theta$ – the characteristic function converges to one centered at $\theta$.

Term 3: $-\frac{u^2\sigma^2}{4\kappa}(1 - e^{-2\kappa\tau})$. This is the diffusion variance. It is the $-u^2\sigma^2/2$ term you see in the characteristic function of any Gaussian, modified by the mean-reversion-adjusted total variance $\frac{\sigma^2}{2\kappa}(1 - e^{-2\kappa\tau})$.

Term 4: $\frac{\lambda}{\kappa}\ln\frac{\eta - iu\phi}{\eta - iu}$. This is the jump contribution, and it is the term that makes everything interesting and difficult. It arises from integrating the Poisson-exponential jump kernel against the mean-reverting dynamics. The $\lambda/\kappa$ prefactor reflects the interplay between jump frequency and reversion speed: if $\kappa$ is large, the process reverts so fast that jumps are quickly absorbed, reducing their cumulative impact. The logarithm of the complex ratio introduces branch cuts in the complex plane, which is exactly why Python’s float64 arithmetic could not handle the numerical integration. More on that shortly.

If you set $\lambda = 0$ (no jumps), term 4 vanishes and you recover the pure MRLR characteristic function, which is Gaussian. The jump term deforms the Gaussian into a heavier-tailed distribution.

Gil-Pelaez Inversion: From Characteristic Function to Option Price

We have $\psi(u)$ in closed form. We need option prices. The bridge is the Gil-Pelaez inversion theorem (1951), which expresses cumulative probabilities directly in terms of the characteristic function:

\[\Pi = \frac{1}{2} + \frac{1}{\pi}\int_0^{\infty} \frac{\text{Im}\!\left[\psi(u)\,e^{-iu\ln K}\right]}{u}\,du\]

This integral recovers the probability that $\text{VIX}_T > K$ (or a risk-adjusted variant thereof). The option price is:

\[C(t, T, K) = e^{-r\tau}\left(F\,\Pi_1 - K\,\Pi_2\right)\]

where $\Pi_1$ and $\Pi_2$ use slightly modified characteristic functions (one shifted to account for the asset-or-nothing payoff, one unshifted for the cash-or-nothing payoff). This is structurally identical to the Black-Scholes decomposition $C = e^{-r\tau}(F\,\Phi(d_1) - K\,\Phi(d_2))$, except $\Phi(d_1)$ and $\Phi(d_2)$ are replaced by $\Pi_1$ and $\Pi_2$ computed via numerical integration of the characteristic function.

The integral converges because $\lvert\psi(u)\rvert \to 0$ as $u \to \infty$ (the variance term $-u^2\sigma^2/(\cdots)$ in the exponent dominates). In practice, we truncate the integration at some upper limit ($u = 100$ for BigFloat, $u = 50$ for Float64) and use adaptive Gauss-Kronrod quadrature (QuadGK in Julia).

This is the entire pricing pipeline. Three layers of math: (1) the SDE defines the dynamics, (2) the characteristic function encodes the distribution in closed form, (3) Gil-Pelaez inversion recovers option prices via a single numerical integral. No Monte Carlo. No PDE solvers. One integral per option price.

Part IV: The Python Disaster

The original codebase was Python. I wrote it in October 2022 under files named, with escalating confidence, cool.py, cooler.py, and coolest.py. The naming convention should tell you everything about the trajectory of the project.

The MRLR model worked fine. Black-Scholes with a mean-reverting forward. Python handles this. Python is happy.

Then I introduced jumps.

The characteristic function’s jump term – $\frac{\lambda}{\kappa}\ln\frac{\eta - iu\phi}{\eta - iu}$ – involves a logarithm of a ratio of complex numbers. In Python’s complex128 (which is just two float64 values glued together), this term is numerically treacherous. When the integration variable $u$ is large, the imaginary parts of $\eta - iu\phi$ and $\eta - iu$ dominate the real part $\eta$, and the logarithm of their ratio develops branch cut sensitivity. The exponential of the full characteristic function then involves $e^z$ where the real part of $z$ can be $-200$ or worse. Python’s float64 maps this to exactly zero – underflow – and the integrand becomes discontinuous in ways that scipy.integrate.quad silently integrates through, occasionally returning a probability greater than 1.0.

A probability. Greater than one. I stared at that output for an hour before I understood what was happening.

I suppressed the warnings. I actually wrote this with my own hands:

warnings.simplefilter(action='ignore', category=scipy.ComplexWarning)
warnings.simplefilter(action='ignore', category=integrate.IntegrationWarning)

If you are suppressing integration warnings in a project whose entire purpose is numerical integration, you have lost the plot. I had lost the plot.

The calibration optimizer would converge to parameters that made no physical sense. Negative jump intensities. Mean reversion speeds of $10^{12}$. The Nelder-Mead simplex would wander into regions of parameter space where the characteristic function returned NaN, and scipy.optimize would shrug and keep going. I spent weeks debugging numerical issues that were not bugs in my code but fundamental limitations of double-precision arithmetic applied to equations that demand better.

My God.

Part V: Julia

Julia solved the problem in a weekend.

Not because Julia is magic. Because Julia has BigFloat as a first-class citizen. You declare your struct fields as BigFloat, and the entire computation – every intermediate exponential, every complex logarithm, every term in the characteristic function – propagates at arbitrary precision. No wrapper libraries. No mpmath imported from some dusty corner of PyPI. It is just… how the language works.

The core module is 122 lines. Here is the characteristic function, the thing that Python could not compute without lying to me:

function characteristic_function(m::MRLRJ, t, T, s, VIX_t)
    τ = BigFloat(T - t)
    ϕ = exp(-m.κ * τ)
    s_bf = Complex{BigFloat}(s)
    
    term1 = im * s_bf * ϕ * log(BigFloat(VIX_t))
    term2 = im * s_bf * m.θ * (1 - ϕ)
    term3 = -s_bf^2 * m.σ^2 * (1 - exp(-2*m.κ*τ))/(4*m.κ)
    term4 = (m.λ/m.κ) * log((m.η - im*s_bf*ϕ)/(m.η - im*s_bf))
    
    return ComplexF64(exp(term1 + term2 + term3 + term4))
end

Every term computed in BigFloat. The final result cast back to ComplexF64 for the integrator. No underflow. No overflow. No lies. The Gil-Pelaez inversion integrates cleanly with QuadGK at rtol=1e-6, and the option prices come out correct.

Julia’s type dispatch also meant I could define separate vix_option methods for MRLR and MRLRJ without any inheritance boilerplate. The pure diffusion model uses the analytic Black-Scholes formula. The jump model uses Fourier inversion. Same function name. The compiler figures it out. It is the kind of thing that makes you wonder why you spent years writing if isinstance(model, MRLRJ) branches in Python like some kind of animal.

I also built a fast Float64 variant for calibration loops where you are evaluating the objective function thousands of times and BigFloat arithmetic is too slow. Same math, narrower integration range (50.0 vs 100.0), looser tolerance (rtol=1e-4 vs 1e-6). Use Float64 for parameter search, BigFloat for final pricing. The two modules coexist. It works brilliantly.

Part VI: Calibration – Fitting the Model to Reality

A model with five free parameters is a toy until you show it can reproduce market prices. Calibration is the process of choosing $(\kappa, \theta, \sigma, \lambda, \eta)$ so that the model’s theoretical option prices match the observed market prices as closely as possible.

Time-Series Calibration (the $\mathbb{P}$-measure)

The simplest approach: fit an AR(1) regression to daily log-VIX levels. If $y_t = \ln(\text{VIX}_t)$, then the discrete-time version of the OU process is:

\[y_{t+1} = \alpha + \beta\,y_t + \varepsilon_t\]

where $\varepsilon_t \sim \mathcal{N}(0, \sigma_\varepsilon^2)$. The relationship to the continuous parameters is:

\[\beta = e^{-\kappa\,\Delta t}, \quad \alpha = \theta\,\kappa\,\Delta t, \quad \sigma = \frac{\sigma_\varepsilon}{\sqrt{\Delta t}}\]

with $\Delta t = 1/252$ (one trading day). Run OLS on 20 years of daily VIX data (5,500+ observations), and you get the historical $\kappa$, $\theta$, $\sigma$ under the real-world $\mathbb{P}$-measure.

For the jump parameters, I separate diffusion from jumps by identifying residuals that exceed 2.5 standard deviations from the AR(1) fit. These outliers are classified as jump days. The jump intensity $\lambda_{\mathbb{P}}$ is estimated as the fraction of jump days per year, and $\eta_{\mathbb{P}}$ is estimated from the mean size of the detected jumps. It is crude. It works. The filtered AR(1) (excluding jump days) gives cleaner estimates of the diffusion parameters.

Options-Chain Calibration (the $\mathbb{Q}$-measure)

The more powerful approach: fit the model to live option prices. Take a snapshot of the VIX options chain – say, April 16, 2021, with 33 days to May 19 expiry and spot VIX at 16.25. You have 20-30 strikes with observed last prices. Define the objective function:

\[\text{MAPE}(\kappa, \theta, \sigma, \lambda, \eta) = \frac{1}{N}\sum_{i=1}^N \frac{\lvert C_{\text{model}}(K_i) - C_{\text{market}}(K_i)\rvert}{C_{\text{market}}(K_i)}\]

Minimize this using Nelder-Mead (a derivative-free simplex method, appropriate because the objective is non-smooth and the characteristic function integration makes autodiff impractical). Starting guess: $\kappa = 5$, $\theta = 2.8$, $\sigma = 1$, $\lambda = 2$, $\eta = 2$. The optimizer converges to physically meaningful parameters. The MAPE settles under 3%.

The critical subtlety: the parameters recovered from options calibration are $\mathbb{Q}$-measure (risk-neutral) parameters, not $\mathbb{P}$-measure (historical) parameters. The two sets of parameters are different because investors are risk-averse. They pay a premium for protection, which inflates the implied jump intensity $\lambda_{\mathbb{Q}}$ relative to the historical $\lambda_{\mathbb{P}}$. This gap between $\mathbb{P}$ and $\mathbb{Q}$ is the volatility risk premium, and it is where the money lives. More on this in the trading section.

Part VII: Results

The Julia implementation verifies and confirms Bao’s primary findings:

Jump impact: The addition of the jump component significantly improves pricing accuracy for deep out-of-the-money options. Without jumps, the MRLR model systematically underprices OTM VIX calls because the log-normal distribution cannot generate the fat right tail that the market expects. With jumps, the MRLRJ model captures the skewness of the VIX options surface. The jump intensity $\lambda$ and mean jump size $1/\eta$ together control the weight of the right tail.
Mean reversion dominance: The parameters $\kappa$ and $\theta$ remain stable and physically interpretable across different calibration windows. The implied long-term VIX level $e^\theta$ sits consistently in the 16-20 range, matching the historical unconditional mean. The reversion speed $\kappa$ sits in the 3-8 range, implying a half-life of 30-80 trading days. This is not a fitted artifact. It is the dominant dynamic of the VIX.
Characteristic function integrity: The BigFloat implementation solves the numerical instability that plagues standard-precision routines. The probabilities $\Pi_1$ and $\Pi_2$ integrate cleanly. No warnings suppressed. No probabilities greater than one.
Cross-temporal validation: The model was calibrated against the April 16, 2021 VIX options chain (33 days to May 19 expiry, spot VIX at 16.25) and, independently, against live-scraped March 2026 data (spot VIX at 29.49, 40 strikes from 11 to 95). Both calibrations converge to plausible parameters. Both produce sub-3% MAPE. The model works in low-vol and high-vol regimes.

From a 2013 paper. In 122 lines of Julia. After Python tried to gaslight me with probabilities greater than one.

Part VIII: How to Actually Make Money With This

Fitting the market to 97% accuracy does not give you a crystal ball. What it gives you is a highly precise mathematical map of how the market currently values risk across different strikes and expirations. The gap between what the model implies and what the market does is where the edge lives. Let me be specific.

Strategy 1: Harvesting the Volatility Risk Premium ($\mathbb{P}$ vs. $\mathbb{Q}$)

There are two probability worlds. The real world ($\mathbb{P}$, estimated from historical data) and the risk-neutral world ($\mathbb{Q}$, implied by option prices). Investors are structurally terrified of VIX spikes – volatility is the one asset class where fear is a permanent fixture, because a VIX spike means your equity portfolio is hemorrhaging. So they overpay for VIX call options as insurance.

This overpayment is measurable. Calibrate the MRLRJ to both worlds:

Historical ($\mathbb{P}$): Fit AR(1) + jump detection to 20 years of daily log-VIX. Extract $\lambda_{\mathbb{P}}$ (how often jumps actually happen) and $\eta_{\mathbb{P}}$ (how large they actually are).
Implied ($\mathbb{Q}$): Calibrate to the live options chain. Extract $\lambda_{\mathbb{Q}}$ and $\eta_{\mathbb{Q}}$.

If $\lambda_{\mathbb{Q}} \gg \lambda_{\mathbb{P}}$ – the options market implies jumps happen far more often than they historically do – the tail insurance is overpriced. This is the volatility risk premium. You sell OTM VIX calls, dynamically hedge your directional exposure using VIX futures, and collect the premium as the statistically expected jumps fail to materialize.

The sizing is precise: the MRLRJ model tells you exactly how much premium each strike contains in excess of its actuarially fair value. You are not selling vol blindly. You are selling the specific strikes where the $\mathbb{P}$-$\mathbb{Q}$ gap is widest, weighted by the model’s confidence.

Warning: when jumps do materialize, they materialize violently. The VIX went from 13 to 37 overnight in February 2018 (the “Volmageddon” event). If you are short OTM VIX calls without proper tail hedging, you will be annihilated. The MRLRJ model gives you the tools to measure this tail risk – the jump intensity $\lambda$ and the mean jump size $1/\eta$ together determine the expected loss from a single jump event – but no model can save you from insufficient capital or reckless position sizing.

Strategy 2: Relative Value on the Skew

The MRLRJ model produces a smooth “fair value” curve across strikes for a given expiry. The residuals – strikes where the market price deviates from the model’s theoretical price – are where supply-demand dislocations create opportunity.

Example: a large pension fund buys a massive block of VIX 35 calls to hedge its equity portfolio. This specific strike gets temporarily bid up. The model says the VIX 35 Call should be $0.60 but it is trading at $0.80. Meanwhile, the neighboring VIX 40 Call is perfectly priced at $0.48. You sell the expensive 35, buy the fair 40 – a bear call spread – and isolate the dislocation. You are not betting on the direction of the VIX. You are betting that one specific strike will revert to the smooth mathematical curve that the rest of the chain already sits on.

This requires a model accurate enough to distinguish genuine mispricing from model error. At 97% accuracy and sub-3% MAPE, the MRLRJ gives you that resolution. At 85% accuracy, you are trading your own model’s noise.

Strategy 3: Jump-Aware Delta Hedging

You cannot buy or sell “spot VIX.” It is a computed index, not a tradeable asset. If you sell VIX options to collect premium (from Strategy 1 or as a market maker), you must hedge your directional risk using VIX futures. The question is: how many futures per option?

The answer is the delta – the sensitivity of the option price to the underlying. A Black-Scholes delta is wrong for VIX options. Not slightly wrong. Categorically wrong. Black-Scholes assumes continuous paths. The VIX jumps. When it jumps, the continuous-path delta underestimates the position’s sensitivity, and your hedge breaks.

The MRLRJ delta, computed by bumping VIX_t in the Gil-Pelaez integral and taking the numerical derivative, accounts for the probability-weighted impact of jumps. For a short-dated OTM VIX call, the MRLRJ delta can exceed the Black-Scholes delta by 30-40%. That gap is the jump risk that Black-Scholes ignores. If you hedge with the wrong delta, a 5-point VIX spike will blow through your position. If you hedge with the MRLRJ delta, the same spike is accounted for.

This is what allows institutional market makers to safely capture the bid-ask spread on VIX options while remaining directionally neutral. The model is the hedge.

Strategy 4: Calendar Spreads via Mean Reversion

When the VIX spikes to 40 during a panic, near-term options become absurdly expensive. The entire term structure inverts – nearby futures trade above deferred futures (backwardation), and short-dated implied volatility explodes. But the MRLRJ model explicitly quantifies how fast the VIX will collapse back to $\theta$.

The half-life of a VIX spike is $t_{1/2} = \ln(2)/\kappa$. If $\kappa = 5$, the half-life is about 50 trading days. If the VIX is at 40 and $\theta$ implies a long-term mean of 18, the model predicts:

After 50 days: VIX $\approx$ 29 (halfway back)
After 100 days: VIX $\approx$ 23.5
After 150 days: VIX $\approx$ 20.75

This decay curve dictates the relative pricing of near-term vs. deferred options. If the market’s term structure implies a slower decay than $\kappa$ predicts, deferred options are underpriced relative to near-term options. You sell near-term VIX calls (which are pricing in sustained panic) and buy deferred-month options (which are underpricing the speed of mean reversion). The model gives you the exact weighting by computing theoretical prices at both expirations and identifying the term-structure dislocation.

This is not a vibes trade. This is a calibrated bet on the speed of panic dissipation, sized by a model that has been validated against 20 years of VIX history and live options chains.

Strategy 5: VIX/SPX Variance Risk Premium Arbitrage

The VIX is derived from SPX options. It is, by construction, the square root of the 30-day expected variance of the S&P 500, calculated from a strip of SPX option prices. This means you can compute a “theoretical VIX” directly from the live SPX options chain and compare it to the traded VIX futures/options.

The difference between the theoretical VIX (from SPX math) and the traded VIX (from the VIX options market) is the Variance Risk Premium (VRP). When fear is elevated, traders bid up VIX derivatives beyond what the underlying SPX options math justifies. The VRP stretches.

You can construct a relative-value trade: short the overpriced VIX options and long a delta-neutral SPX straddle. You are hedged against actual market movement – if the S&P drops and volatility rises, your SPX straddle profits while your short VIX position loses, and the net exposure is the VRP. When the premium collapses back to fair value, you collect.

The MRLRJ model’s role here is to provide the theoretical anchor – the “fair” VIX options price given the model’s parameters – against which you measure the VRP. Without a precise model, you cannot distinguish a stretched VRP (tradeable) from genuine repricing of tail risk (not tradeable).

Part IX: Future Work

The MRLRJ is a 2013 model. It works. But modern quantitative finance has moved in directions that suggest several extensions, each with specific trading implications.

Rough Volatility

Recent literature (Gatheral, Jaisson, Rosenbaum, 2018) demonstrates that volatility is “rough” – it scales locally like a fractional Brownian motion with Hurst parameter $H < 0.5$, rather than the $H = 0.5$ of standard Brownian motion. The practical implication: standard MRLRJ misprices short-dated VIX options because it cannot generate the steep volatility skew observed in sub-2-week expirations without relying on unrealistic jump parameters. Replacing the Brownian driver with a fractional Brownian motion (a “Rough-MRLRJ” model) would accurately capture the short-term skew, enabling systematic selling of overpriced short-dated OTM VIX options – a trade that is currently mispriced by classical models.

Self-Exciting Jumps (Hawkes Processes)

In the standard MRLRJ, jumps arrive independently via a Poisson process with constant intensity $\lambda$. In reality, financial panics cluster. One jump makes the next jump more likely. The VIX does not spike once and return to calm; it spikes, spikes again, spikes a third time, and then slowly decays. Upgrading the Poisson process to a Hawkes process – a self-exciting point process where each jump temporarily increases $\lambda$ – would allow quantification of the “decay rate of panic.” This enables precise timing of calendar spreads: you wait for the Hawkes intensity function to peak and begin decaying, then sell near-term volatility (priced for continued contagion) and buy deferred volatility (underpricing the coming calm).

Neural SDEs and Deep Calibration

Classical calibration (Nelder-Mead on MAPE) is an end-of-day procedure. You recalibrate when new option prices arrive, which means your parameters are always stale. Universal Differential Equations (Neural SDEs) replace the static drift and volatility functions with neural networks trained continuously on tick-by-tick data. The neural components can detect micro-regime shifts – a sudden vanishing of bid-side liquidity on SPX futures, a spike in VIX call open interest – and adjust the model parameters before the VIX index actually moves. This is the frontier of intraday volatility trading: front-running regime shifts by milliseconds, buying VIX call butterflies at “stale” prices before the market reprices.

Deep Hedging (Reinforcement Learning)

Classical delta hedging assumes frictionless markets. VIX options have wide bid-ask spreads. If you delta-hedge every time the model says your delta shifted by 0.01, you will cross the spread a thousand times and bleed your alpha through transaction costs. Deep Reinforcement Learning wraps the MRLRJ model inside a simulated trading environment and trains an agent to hedge optimally – accounting for transaction costs, spread width, gamma risk, and time-of-day liquidity patterns. The agent learns that sometimes the correct hedge is no hedge at all. This is the difference between a model that is theoretically correct and a model that makes money.

Joint SPX-VIX Modeling

The MRLRJ models the VIX in isolation, but the VIX is a derivative of SPX options. A coupled model – SPX following a local-stochastic volatility model with jumps, VIX deterministically derived from the SPX options strip, with a stochastic VRP overlay modeled by MRLRJ – would enable true cross-asset arbitrage. You would compute the theoretical VIX in real-time from SPX options, compare to traded VIX derivatives, and trade the VRP when it is historically stretched. This is the cleanest expression of the volatility risk premium trade: structurally hedged, model-driven, and continuously re-priced.

The Bitter Lesson, Again

The same bitter lesson from Vakyume applies here, rotated 90 degrees. In Vakyume, the lesson was that brute-force metaprogramming loses to data-driven approaches. In Rafaga, the lesson is that the right tool for the job is not always the popular tool.

Python is a magnificent language for prototyping, data wrangling, and machine learning. It is a terrible language for high-precision complex arithmetic on characteristic functions with exponential terms that span 80 orders of magnitude. I spent weeks fighting float64 limitations that Julia resolved by simply… not having them. BigFloat is there. It works. You use it.

The original Python code suppressed its own warnings. The Julia code has no warnings to suppress.

There is also a second lesson, specific to quantitative finance: a model from 2013, implemented correctly, can price VIX derivatives to 97% accuracy in 2026. The MRLRJ is not a neural network. It is not a transformer. It is five parameters with physical interpretations, a characteristic function, and a Fourier integral. It works because the VIX genuinely mean-reverts, genuinely jumps, and the market genuinely prices these dynamics into the options chain. Sometimes the old math is the right math. You just need a language that can compute it without lying to you.

Five files. One module. Two models. Five trading strategies. ~97% accuracy.

Rafaga. A gust. It came and went. But the numbers hold up. And the VIX, as always, reverts to the mean.

Julian Henry

polyglot / software engineer / author