In recent years, CompatibL has been pushing the boundaries of machine learning to tackle the downsides of traditional interest rate models. In this article, we will provide an overview of the working paper on autoencoder market models (AEMM) by CompatibL’s founder and head of quant research, Alexander Sokol. We will also discuss their applications and the potential impact they can have on the financial sector.


In his recent paper called Autoencoder Market Models for Interest Rates, Alexander Sokol proposes a highly optimized latent factor representation of the yield curve obtained by training a variational autoencoder (VAE) to the curve data from multiple currencies. A curious by-product of such training is a “world map of latent space,” where neighbors have similar curve shapes, and distant lands have disparate curve shapes. The proposed VAE-based mapping offers a high degree of parsimony, in some cases achieving similar accuracy to classical methods with an additional state variable.

Alexander describes four types of autoencoder market models in the Q- and P-measures. Each autoencoder-based model starts as a popular classical model and replaces its state variables with autoencoder latent variables. This replacement leads to a greater similarity between the curves generated by the model and the historically observed curves—a desirable feature in both the Q- and P-measures. By aggressively eliminating invalid curve shapes from its latent space, VAE prevents them from appearing within the model without intricate constraints on the stochastic process used by the classical models for the same purpose. This makes VAE-based models more robust and simplifies their calibration.

Alexander also discusses the potential applications of the new models and the VAE-based latent factor representation on which they are based.

The paper is organized as follows:

  • First Alexander introduces the core ideas of the research
  • Then, in Chapter 2 he describes machine learning architecture and presents the results of using autoencoders to compress LIBOR and OIS swap curves
  • After that, he describes how to convert four distinct classical model types to AEMM by switching from classical state variables to VAE latent variables
  • The conversion of the Q-measure models is discussed in Chapter 3
  • The P-measure models are discussed in Chapter 4
  • The paper concludes with Chapter 5, where Alexander summarizes his key results

A New Way of Representing the Yield Curve

The properties of a term structure interest rate model are to a large extent determined by the choice of its state variables. Having too many of these negatively affects performance and causes parameter estimation issues. Having too few or choosing them poorly causes the model to miss certain risks. Therefore, a dimension reduction, i.e., decreasing the number of state variables with the least possible loss of accuracy, is of paramount importance.

In one of the early successes of using machine learning for a dimension reduction, Oleksiy Kondratyev demonstrated that feedforward neural networks outperform classical regression techniques such as PCA (principal component analysis) in describing the evolution of interest rate curve shapes under the P-measure. Bergeron and Buehler used the VAE to reduce the dimension of the volatility surface.

In his paper, Alexander derives a highly optimized VAE-based representation of the yield curve and proposes a new category of interest rate models in Q- and P-measures that produce VAE-generated curve shapes to which minimal corrections are applied to keep the model arbitrage-free.

Dimension Reduction with Autoencoders

Dimension reduction is a compression algorithm, like those used to compress images. The maximum possible degree of compression depends on the universe of images for which the algorithm is designed. Because JPEG and similar general-purpose image compression algorithms impose no restrictions on what the image can depict, they have a moderate rate of compression (around x10 for JPEG). For these general-purpose algorithms, the dimensions of the compressed data (i.e., bits of the compressed file) are local in the sense that each of them encodes information from a group of nearby pixels.

Variational autoencoders are machine learning algorithms that provide a fundamentally different type of compression. The rate of compression they can achieve is multiple orders of magnitude higher than general-purpose compression algorithms. Such a tremendous performance gain can only be achieved by training the algorithm to compress a specific type of image, such as the image of a human face. In the process of aggressively eliminating implausible combinations of pixels in pursuit of better compression, something quite remarkable happens – the dimensions of the compressed image acquire meaning.

When using a VAE to encode images of a human face, the dimensions of the compressed data (the latent variables, from Latin “lateo” meaning “hidden”) become associated with realistic changes to the image of a human face, such as adding a smile or changing hair color. This happens for the simple reason that the only combinations of pixels not eliminated by training on a large library of human face images are those that correspond to realistic faces. In machine learning, this “feature extraction” effect is frequently a more important objective than the compression itself.

The latent factors obtained in this manner are global because they can affect pixels that may be far away from each other in the image (e.g., a dimension that encodes hair color).

What can a similar approach do for the interest rate term structure models?

The first thing to consider is what we should be compressing. To build a VAE-based counterpart to a stochastic volatility term structure model such as SABR-LMM, we would need to compress both the yield curve and the volatility surface into a single latent space (we will use the term “volatility surface” generically to describe volatility surface or volatility cube). For the deterministic volatility term structure models, the volatility surface is a function of the yield curve, and accordingly the yield curve is the only thing we need to compress.

Continuing with the image analogy, a smoothing spline fit to the yield curve is similar to JPEG in the sense that its dimensions and the structure it imposes are both local, while the Nelson-Siegel basis is similar to the VAE in the sense that its dimensions (roughly corresponding to the level, slope, and convexity) and the structure it imposes (for example, not permitting a curve to have both minimum and maximum at the same time) are both global. Having global dimensions leads to a higher degree of compression for the Nelson-Siegel basis compared with the smoothing spline.

Is aggressive dimension reduction necessary for term structure interest rate models? Can machine learning help find a more effective way to achieve it than the Nelson-Siegel basis?

Alexander answers both questions in the affirmative.

Many of the interest rate models popular with practitioners, including multi-factor short-rate models, the Cheyette model and others, are Markovian in a small number of state variables, usually between two and four. Considering the aggressive dimension reduction that must occur when a very wide variety of historical yield curve shapes is compressed into a small number of state variables, a sophisticated compression algorithm is clearly required. And yet, most classical models use an exogenously specified stochastic differential equation (SDE) or factor basis whose selection is driven by criteria unrelated to an optimal compression.

Further, Alexander describes several models in the Q- and P-measures that start from a popular classical model specification and replace the classical model’s state variables by autoencoder latent variables. This replacement leads to a greater similarity between the curves generated by the model and the historically observed curves—a desirable feature in both the Q- and P-measures. This new model category is called Autoencoder Market Models.

The Benefits of Autoencoder Market Models and Key Results

Aggressively eliminating unfeasible curve shapes by VAE training creates state variables that represent only valid curves. This prevents AEMM from generating unrealistic curve shapes without using intricate constraints on curve dynamics.

The improvement in accuracy of mapping historical curve observations to the model state variables achieved with the use of autoencoders can be measured from first principles, providing a rigorous way to compare the proposed machine learning approach with its classical counterparts.

The results indicate that the use of autoencoders leads to a significant and measurable improvement in the accuracy of representing complex curve shapes compared with classical methods with the same number of state variables. In turn, this makes AEMM perform better than the corresponding classical models.


This ongoing research area is extremely extensive and covers a wide range of important details. It is impossible to fully capture its depth and complexity in such a short article, so we invite you to download the source paper from SSRN.

CompatibL’s autoencoder market models are now available to the quant community and open source, please get in touch with us if you want to become a contributor or use them in your project.

You can also subscribe to our AEMM LinkedIn page for timely updates on this research or join a discussion with Alexander Sokol on the topic.

Interested in Learning More?

Check out Alexander’s recent interviews with on the validation challenges of machine learning models, and with WatersTechnology on enhancing traditional models using machine learning.

Contact CompatibL
Submit your query and one of our experts will be in touch