background
Denoising
Financial ML.
6

Denoising

Explain a procedure for reducing the noise and enhancing the signal included in an empirical covariance matrix.
Marcenko–Pastur
Signal/Noise Ratio
MV Portfolio
M SR Portfolio
Targeted Shrinkage
Denoising
S.Alireza Mousavizade
Sat Mar 05 2022

Detonation & Denoising

Inspiration

Covariance matrices are commonly used in finance. We utilize them to conduct regressions, evaluate risks, optimize portfolios, run Monte Carlo simulations, discover clusters, reduce the dimensionality of a vector space, and so on. Empirical covariance matrices are calculated using a sequence of observations from a random vector to estimate the linear comovement between the random variables that comprise the random vector. Because these observations are limited and nondeterministic, the estimated covariance matrix contains some noise. Empirical covariance matrices constructed from estimated factors are similarly numerically ill-conditioned, as the estimated factors are also based on incorrect data. Unless we address this noise, it will influence the covariance matrix computations, perhaps rendering the study meaningless.

Here, we attempt to describe a method for minimizing noise and boosting signals in an empirical covariance matrix. We will presume that empirical covariance and correlation matrices have been treated to this approach throughout this Element.

The Marcenko-Pastur Theorem is a mathematical theorem.

Consider a matrix XX of independent and identically distributed random observations with a size of TxNT x N and an underlying process with a mean of zero and a variance of σ2\sigma^{2}. The matrix C=T1XXC=T^{-1} X^{\prime} X has eigenvalues λ\lambda that asymptotically converge (as N+N \rightarrow+\infty and T+T \rightarrow+\infty with 1<TN<+1<T_{N}<+\infty ) to the Marcenko-Pastur probability density function (PDF),

f[λ]={TN(λ+λ)(λλ)2πλσ2 if λ[λ,λ+]0 if λ[λ,λ+]f[\lambda]= \begin{cases}\frac{T}{N} \frac{\sqrt{\left(\lambda_{+}-\lambda\right)\left(\lambda-\lambda_{-}\right)}}{2 \pi \lambda \sigma^{2}} & \text { if } \lambda \in\left[\lambda_{-}, \lambda_{+}\right] \\ 0 & \text { if } \lambda \notin\left[\lambda_{-}, \lambda_{+}\right]\end{cases}

where λ+=σ2(1+N/T)2\lambda_{+}=\sigma^{2}(1+\sqrt{N / T})^{2} is the greatest predicted eigenvalue and λ=σ2(1N/T)2\lambda_{-}=\sigma^{2}(1-\sqrt{N / T})^{2} is the least expected eigenvalue. When σ2=1\sigma^{2}=1, the correlation matrix associated with XX is CC. The Marcenko-Pastur PDF is implemented in Python by Code Snippet 2.12.1.

Eigenvalues λ[λ,λ+]\lambda \in\left[\lambda_{-}, \lambda_{+}\right]are consistent with random behavior, and eigenvalues λ[λ,λ+]\lambda \notin\left[\lambda_{-}, \lambda_{+}\right]are consistent with nonrandom behavior. Specifically, we associate eigenvalues λ[0,λ+]\lambda \in\left[0, \lambda_{+}\right]with noise. Figure 2.12.1 and Code Snippet 2.22.2 demonstrate how closely the Marcenko-Pastur distribution explains the eigenvalues of a random matrix XX.

SNIPPET 11 MARCENKO-PASTUR THEOREM TESTING

Julia
Python
Copy

_45
function pdfMarcenkoPastur(
_45
var, # variance of observations
_45
ratio, # T/N
_45
points # points for lambda
_45
)
_45
λmin = var*(1 - sqrt(1/ratio))^2 # minimum expected eigenvalue
_45
λmax = var*(1 + sqrt(1/ratio))^2 # maximum expected eigenvalue
_45
eigenValues = range(λmin, stop = λmax, length = points) # range for eigen values
_45
diffλ = ((λmax .- eigenValues).*(eigenValues .- λmin)) # numerical error
_45
diffλ[diffλ .< -1E-3] .= 0. # numerical error
_45
pdf = ratio./(2*pi*var*eigenValues).*diffλ # probability density function
_45
# pdf = ratio./(2*pi*var*eigenValues).*sqrt.(((λmax .- eigenValues).*(eigenValues .- λmin))) # probability density function
_45
return DataFrames.DataFrame(index = eigenValues, values = pdf)
_45
end
_45
_45
function PCA(matrix) # Hermitian matrix
_45
eigenValues, eigenVectors = LinearAlgebra.eigen(matrix) # compute eigenValues, eigenVectors from matrix
_45
indices = sortperm(eigenValues, rev = true) # arguments for sorting eigenValues desc
_45
eigenValues, eigenVectors = eigenValues[indices], eigenVectors[:, indices] # sort eigenValues, eigenVectors
_45
eigenValues = Diagonal(eigenValues) # diagonal matrix with eigenValues
_45
return eigenValues, eigenVectors
_45
end
_45
_45
function KDE(
_45
observations; # Series of observations
_45
bandWidth = 0.25,
_45
kernel = Distributions.Normal, # type of kernel
_45
valuesForEvaluating = nothing # array of values on which the fit KDE will be evaluated
_45
)
_45
density = kde(observations, kernel = kernel, bandwidth = bandWidth) # kernel density
_45
if valuesForEvaluating == nothing
_45
valuesForEvaluating = reshape(reverse(unique(observations)), :, 1) # reshape valuesForEvaluating to vector
_45
end
_45
density = KernelDensity.pdf(density, valuesForEvaluating[:]) # probability density function
_45
return DataFrames.DataFrame(index = vec(valuesForEvaluating), values = density)
_45
end
_45
_45
# Random Matrix
_45
X = rand(Normal(0, 1), 10000, 1000)
_45
# Get the eigenvalues and vectors of the correlation matrix of X
_45
eVal0, eVec0 = PCA(cor(X))
_45
# Marcenko-Pastur pdf
_45
pdf0 = pdfMarcenkoPastur(1., size(X)[1]/size(X)[2], 1000)
_45
# Fits a Kernel Density Estimate to the eigenvalues of the correlation matrix
_45
pdf1 = KDE(diag(eVal0), bandWidth = 0.01, kernel = Distributions.Normal, valuesForEvaluating = nothing);

Figure 1 depicts the Marcenko-Pastur theorem.

is Loading ...

Signal Random Matrix

Not all eigenvectors in an empirical correlation matrix are necessarily random. Because Code Snippet 22 generates a covariance matrix that is not totally random, its eigenvalues will only approximate the Marcenko-Pastur PDF. Only numberFactors have some signal among the numberColumns random variables that comprise the covariance matrix formed by randomCov. To further dilute the signal, we combine it with a random matrix with an alpha weight.

Marcenko-Pastur Distribution Fitting

In this case, we use the technique proposed by Laloux et al (2000). Because random eigenvectors only account for a portion of the variance, we may alter σ2\sigma^{2} in the previous equations accordingly. For example, if we assume that the eigenvector associated with the greatest eigenvalue is not random, we should substitute σ2\sigma^{2} in the earlier equations with σ2(1λ+/N)\sigma^{2}\left(1-\lambda_{+} / N\right). In reality, we may get the implied σ2\sigma^{2} by fitting the function f[λ]f[\lambda] to the empirical distribution of eigenvalues. This yields the variance explained by the random eigenvectors in the correlation matrix, as well as the cutoff level λ+\lambda_{+}, adjusted for nonrandom eigenvectors.

Snippet 3 applies the Marcenko-Pastur PDF on a random covariance matrix with the signal. The fit aims to identify the σ2\sigma^{2} value that minimizes the sum of squared discrepancies between the analytical PDF and the kernel density estimate (KDE) of the observed eigenvalues (for references on KDE, see Rosenblatt 1956; Parzen 1962). The value λ+\lambda_{+} is reported as eMax0, σ2\sigma^{2} is saved as var0, and the number of factors is retrieved as numberFactors0numberFactors0.

SNIPPET 22 TO A RANDOM COVARIANCE MATRIX ADD SIGNAL

Copy

_23
function randomCov(
_23
numberColumns, # number of columns
_23
numberFactors # number of factors
_23
)
_23
data = rand(Normal(), numberColumns, numberFactors) # random data
_23
covData = data*data' # covariance of data
_23
covData += Diagonal(rand(Uniform(), numberColumns)) # add noise to the matrix
_23
return covData
_23
end
_23
_23
function covToCorr(cov) # covariance matrix
_23
std = sqrt.((diag(cov))) # standard deviations
_23
corr = cov./(std.*std') # create correlation matrix
_23
corr[corr .< -1] .= -1 # numerical error
_23
corr[corr .> 1] .= 1 # numerical error
_23
return corr
_23
end
_23
_23
alpha, numberColumns, numberFactors, ratio = .995, 1000, 100, 10
_23
covv = cov(rand(Normal(0,1),numberColumns*ratio,numberColumns))
_23
covv = alpha*covv + (1 - alpha)*randomCov(numberColumns, numberFactors) # noise+signal
_23
corr0 = covToCorr(covv)
_23
eval0, evec0 = PCA(corr0);

View More: Julia | Python

SNIPPET 3 FItTING THE MARCENKO–PASTUR PDF

Copy

_30
function errorPDFs(
_30
var, # variance
_30
eigenValues, # eigenvalues
_30
ratio, # T/N
_30
bandWidth; # band width for kernel
_30
points = 1000 # points for pdfMarcenkoPastur
_30
)
_30
pdf0 = pdfMarcenkoPastur(var, ratio, points) # theoretical pdf
_30
pdf1 = KDE(eigenValues, bandWidth = bandWidth, kernel = Distributions.Normal, valuesForEvaluating = pdf0.index) # empirical pdf
_30
sse = sum((pdf1.values .- pdf0.values).^2) # sum of squares of errors
_30
return sse
_30
end
_30
_30
function findMaxEval(
_30
eigenValues, # eigenvalues
_30
ratio, # T/N
_30
bandWidth # band width for kernel
_30
)
_30
out = optimize(var->errorPDFs(var, eigenValues, ratio, bandWidth), 1E-5, 1-1E-5) # minimize pdferrors
_30
if Optim.converged(out) == true
_30
var = Optim.minimizer(out) # variance that minimizes pdferrors
_30
else
_30
var = 1
_30
end
_30
λmax = var*(1 + (1/ratio)^.5)^2 # max random eigenvalue
_30
return λmax, var
_30
end
_30
_30
emax0, var0 = findMaxEval(diag(eval0), ratio, .01)
_30
numberFactors0 = size(eval0)[1] - searchsortedfirst(reverse(diag(eval0)), emax0) + 1

Basic categories of financial data

Fitting the Marcenko-Pastur PDF to a noisy covariance matrix (Figure 2).

Figure 22 depicts the eigenvalue histogram and PDF of the fitted Marcenko-Pastur distribution. Eigenvalues to the right of the fitted Marcenko-Pastur distribution cannot be linked with noise. Hence they must be connected to the signal. The code returns 100 for numberFactors0numberFactors0, the same number of factors we injected into the covariance matrix. Despite a weak signal in the covariance matrix, the technique was able to distinguish the eigenvalues associated with noise from the eigenvalues associated with the signal. The fitted distribution predicts σ2.6768\sigma^{2} \approx .6768, implying that the signal accounts for just approximately 32.32%32.32 \% of the variance. This is one method of calculating the signal-to-noise ratio in financial data sets, which is well known to be poor due to arbitrage effects.

Denoise

Shrinking a numerically ill-conditioned covariance matrix is popular in financial applications (Ledoit and Wolf 2004). Shrinkage minimizes the condition number of the covariance matrix by bringing it closer to a diagonal. Shrinkage, however, does this without distinguishing between noise and signal. As a result, shrinkage might amplify an already weak signal. In the previous section, we learned how to differentiate between eigenvalues associated with noise components and eigenvalues related to signal components. This section will look at how to use this information to denoise the correlation matrix.

Method of Constant Residual Eigenvalues

This approach consists in setting a constant eigenvalue for all random eigenvectors. Let {λn}n=1,,N\left\{\lambda_{n}\right\}_{n=1, \ldots, N} be the set of all eigenvalues, ordered descending, and ii be the position of the eigenvalue such that λi>λ+\lambda_{i}>\lambda_{+}and λi+1λ+\lambda_{i+1} \leq \lambda_{+}. Then we set λj=1/(Ni)k=i+1Nλk,j=i+1,,N\lambda_{j}=1 /(N-i) \sum_{k=i+1}^{N} \lambda_{k}, j=i+1, \ldots, N, hence preserving the trace of the correlation matrix. Given the eigenvector decomposition VW=WΛV W=W \Lambda, we form the denoised correlation matrix C1C_{1} as

C~1=WΛ~WC1=C~1[(diag[C~1])12(diag[C~1])12]1,\begin{aligned} \widetilde{C}_{1} &=W \widetilde{\Lambda} W^{\prime} \\ C_{1} &=\widetilde{C}_{1}\left[\left(\operatorname{diag}\left[\widetilde{C}_{1}\right]\right)^{\frac{1}{2}}\left(\operatorname{diag}\left[\widetilde{C}_{1}\right]\right)^{\frac{1^{\prime}}{2}}\right]^{-1}, \end{aligned}

The apostrophe (') transposes a matrix, and diag[.] zeroes all non-diagonal elements of a squared matrix. The second transformation is used to rescale the matrix C~1\widetilde{C}_{1} so that the major diagonal of C1C_{1} is an array of 1 s1 \mathrm{~s}. This technique is implemented by Code Snippet 44. Figure 33 compares the logarithms of the eigenvalues before and after this denoising approach.

DENOISING SNIPPET 4 BY CONSTANT RESIDUAL EIGENVALUE

Julia
Python
Copy

_13
function denoisedCorr(eigenValues, # eigenvalues
_13
eigenVectors, # eigenvectors
_13
numberFactors) # number of factors
_13
λ = copy(diag(eigenValues)) # copy eigenvalues
_13
λ[numberFactors:end] .= sum(λ[numberFactors:end])/(size(λ)[1] - numberFactors)
_13
λdiag = Diagonal(λ) # diagonal matrix with λ
_13
cov = eigenVectors * λdiag * eigenVectors' # covariance matrix
_13
corr2 = covToCorr(cov) # correlation matrix
_13
return corr2
_13
end
_13
_13
corr1 = denoisedCorr(eval0, evec0, numberFactors0)
_13
eval1 , evec1 = PCA(corr1);

is Loading ...

Figure 3 compares eigenvalues before and after the residual eigenvalue approach was applied.

Shrinkage on Demand

Because it reduces noise while keeping the signal, the numerical technique described previously is better for shrinkage. Alternatively, we might limit the shrinkage application to random eigenvectors. Take a look at the correlation matrix. C1C_{1}

C1=WLΛLWL+αWRΛRWR+(1α)diag[WRΛRWR]C_{1}=W_{L} \Lambda_{L} W_{L}^{\prime}+\alpha W_{R} \Lambda_{R} W_{R}^{\prime}+(1-\alpha) \operatorname{diag}\left[W_{R} \Lambda_{R} W_{R}^{\prime}\right]

where WRW_{R} and ΛR\Lambda_{R} are the eigenvectors and eigenvalues associated with {nλnλ+},WL\left\{n \mid \lambda_{n} \leq \lambda_{+}\right\}, W_{L} and ΛL\Lambda_{L} are the eigenvectors and eigenvalues associated with {nλn>λ+}\left\{n \mid \lambda_{n}>\lambda_{+}\right\}, and α\alpha regulates the amount of shrinkage among the eigenvectors and eigenvalues associated with noise ( α0\alpha \rightarrow 0 for total shrinkage). This technique is implemented by Code Snippet 55. Figure 44 compares the logarithms of the eigenvalues before and after this denoising approach.

Basic categories of financial data

Figure 4 compares eigenvalues before and after the targeted shrinking strategy was used.

DENOISING SNIPPET 5 BY TARGETED SHRINKAGE

Copy

_18
function denoisedCorrShrinkage(
_18
eigenValues, # eigen values
_18
eigenVectors, # eigen vectors
_18
numberFactors; # number of factors
_18
α = 0 # parameter for shrinkage
_18
)
_18
eigenValuesL = eigenValues[1:numberFactors, 1:numberFactors] # divide eigenValues
_18
eigenVectorsL = eigenVectors[:, 1:numberFactors] # divide eigenVectors
_18
eigenValuesR = eigenValues[numberFactors:end, numberFactors:end] # divide eigenValues
_18
eigenVectorsR = eigenVectors[:, numberFactors:end] # divide eigenVectors
_18
corr0 = eigenVectorsL * eigenValuesL * transpose(eigenVectorsL) # correlation matrix 1
_18
corr1 = eigenVectorsR * eigenValuesR * transpose(eigenVectorsR) # correlation matrix 2
_18
corr2 = corr0 + α*corr1 + (1 - α)*diagm(diag(corr1)) # correlation matrix
_18
return corr2
_18
end
_18
_18
corr1 = denoisedCorrShrinkage(eval0, evec0, numberFactors0, α = .5)
_18
eval1 , evec1 = PCA(corr1);

Until Here

Detoning