Overview of the ksm package

Leo Belzile

2026-06-06

The ksm package provides functionalities for the kernel density estimation for random symmetric positive definite matrices. It provides C++ wrappers for estimating the density, with additional functionalities for estimating the optimal bandwidth according to the least square cross validation (LSCV) and the leave-one-out cross (LCV) validation criteria. Three kernels are implemented: the Gaussian (smnorm), log-Gaussian (smlnorm) and Wishart (Wishart).

Evaluation of the kernel and bandwidth optimization

set.seed(2025)
library(ksm)
#> 
#> Attaching package: 'ksm'
#> The following object is masked from 'package:stats':
#> 
#>     rWishart
d <- 4L
# Equicorrelation matrix
S <- diag(rep(0.5, d)) + matrix(0.5, d, d)
# Generate simulated data from an inverse Wishart (d by d by n array).
samp <- rinvWishart(n = 100, df = 10, S)
dim(samp)
#> [1]   4   4 100
# Optimize the bandwidth for kernel using leave-one-out cross validation
b <- bandwidth_optim(
  x = samp, 
  kernel = "smlnorm", 
  criterion = "lcv")
# Evaluate kernel with a new psd matrix data point
newsamp <- rinvWishart(n = 2, df = 10, S = S)
# Kernel density (default to log scale)
kdens_symmat(
  x = newsamp, 
  xs = samp, 
  kernel = "smlnorm", 
  b = b)
#> [1] 15.710544  3.460046

We can also consider stationary data and exclude observations at lag \(h\) to account for serial dependence. This is illustrated below with a time series of realized variance covariance matrix between two assets over 250 days.

data(realvar, package = "ksm")
dim(realvar)
#> [1]   2   2 250
# Select suitable lag for least square cross validation
h <- ceiling(dim(realvar)[3]^0.25)
# Calculate optimal bandwidth
bopt <- bandwidth_optim(
  x = realvar, 
  kernel = "Wishart", 
  criterion = "lscv",
  h = h)
# Evaluate at multiple values of bandwidth
bseq <- seq(0.5 * bopt, 2 * bopt, length.out = 101)
lscv <- ksm::lscv_kdens_symmat(
  x = realvar, 
  b = bseq, 
  h = h, 
  kernel = "Wishart")
with(lscv, 
     plot(x = b, 
          y = lscv,
          type = "l", 
          panel.first = {
            abline(v = bopt, lty = 2, col = "grey")
            },
          xlab = "bandwidth",
          ylab = "least square cross validation"
          )
     )

Integrating over positive definite matrices

We can also make integration in dimensions \(d \in \{2, 3\}\) to evaluate numerically the performance of kernels on simulated data. The function integrate_spd performs a transformation of inputs for numerical integration with routines from the cubature package. This is useful for checking correct implementation: below, we check that the density of a Wishart integrates to unity over the space of positive definite matrices.

Different methods for integration are provided: we recommend in particular cuhre, suave or hcubature. The lb argument controls the smaller value for the eigenvalues, which sometimes returns matrices that are close to being non-positive definite. If such an error mistake arises, consider setting lb to a small value.

# Check that Wishart density integrates to unity
(int <- integrate_spd(
  dim = 2L,
  neval = 1e5L,
  f = function(x, S){
   dWishart(x, df = 10, S = S, log = FALSE)},
  S = diag(2),lb = 0, method = "hcubature"))
#> $integral
#> [1] 0.9999433
#> 
#> $error
#> [1] 0.0009986894
#> 
#> $neval
#> [1] 30723
#> 
#> $returnCode
#> [1] 0
isTRUE(all.equal(int$integral, 1, tol = 2*int$error))
#> [1] TRUE

The main use for the integrate_spd function is to calculate the integrated squared error for data simulated from some density. We do this here for different models that generate independent and identically distributed matrices from a mixture of Wishart densities. The function here relies on predefined models from functions simu_rdens and simu_fdens used in the simulation studies of the paper for the independent and identically distributed scenario.

# Integrated squared error
ISE <- function(
  S,
  x,
  bandwidth,
  model = 1:6,
  kernel = c("Wishart", "smlnorm", "smnorm"),
  ...
) {
  model <- match.arg(as.character(model), choices = 1:6)
  kernel <- match.arg(kernel)
  dim <- ncol(x)
  # Compute the squared difference between the kernel and the target density
  (
    ksm::kdens_symmat(
      x = S,
      xs = x,
      b = bandwidth,
      kernel = kernel,
      log = FALSE
    ) -
      ksm::simu_fdens(S, model = model, d = dim)
  )^2
}

# Calculate numerical integral
d <- 2L
xs <- simu_rdens(n = 100, model = 2, d = d)
b <- bandwidth_optim(
  x = xs, 
  criterion = "lcv", 
  kernel = "smnorm")
# Computing ISE
ISE(
  S = xs[,,1, drop = FALSE], 
  x = xs, 
  bandwidth = b, 
  model = 2, 
  kernel = "smnorm")
#> [1] 8.525459e-05
# Next integrate over the space of psd matrices
ISE_int <- try(
            ksm::integrate_spd(
              f = function(S) {
                ISE(
                  S,
                  x = xs,
                  kernel = "smnorm",  # (only for lcv)
                  bandwidth = b,
                  model = 2
                )
              },
              dim = d,
              method = "hcubature",
              lb = 0,
              ub = Inf,
              neval = 1e6L
            ),
            silent = TRUE
          )
ISE_int
#> $integral
#> [1] 0.002425163
#> 
#> $error
#> [1] 2.42419e-06
#> 
#> $neval
#> [1] 66561
#> 
#> $returnCode
#> [1] 0