ustats is a thin R interface to the Python package u-stats, which
computes higher-order U-statistics efficiently using Einstein summation
(numpy.einsum / torch.einsum). This vignette
covers the one part of the package that needs a little care —
setting up the Python environment — and then shows
basic usage.
install.packages("ustats")
library(ustats)
H <- matrix(rnorm(100), 10, 10)
ustat(list(H, H), "ab,bc->")That is all most users need: on the first call,
reticulate (>= 1.41) automatically downloads a private
Python together with the required packages (u-stats,
numpy, torch) into a cached environment, and
reuses it in later sessions.
ustats declares its Python dependencies via
reticulate::py_require() when the package is loaded. When
Python is first needed, reticulate resolves these requirements as
follows:
reticulate::use_virtualenv(),
reticulate::use_condaenv(), or the
RETICULATE_PYTHON environment variable — that environment
is used. It must contain u-stats, numpy, and
(recommended) torch.There are therefore three ways to set things up, from least to most manual.
Do nothing. The first call that touches Python triggers the automatic setup:
Two things to know:
setup_ustats()setup_ustats() creates a dedicated environment and
installs all dependencies into it. By default it installs the
CPU-only PyTorch build:
library(ustats)
setup_ustats() # virtualenv/conda + CPU-only torch
setup_ustats(gpu = TRUE) # default PyPI torch (CUDA-enabled on Linux)
setup_ustats(
method = "virtualenv", # or "conda"
envname = "r-ustats",
persist = TRUE # print the RETICULATE_PYTHON line to add
) # to your .Rprofile (no files are written)For GPU builds on Windows, or for a wheel matching a specific CUDA version, see https://pytorch.org/get-started/locally/ and use Option 3.
If you already maintain a conda or virtualenv environment (for example, one with a carefully chosen CUDA-enabled PyTorch), install the one missing piece:
and tell reticulate to use that environment before Python
initializes (i.e. right after loading the package, before the
first ustat() call):
library(ustats)
reticulate::use_condaenv("your_env_name", required = TRUE)
# or: reticulate::use_virtualenv("~/.virtualenvs/your_env")Alternatively, set RETICULATE_PYTHON to the path of the
Python binary in .Renviron or .Rprofile, which
takes effect for all sessions.
check_ustats_setup()
#> === ustats Environment Status ===
#>
#> [OK] Python: /path/to/python
#> Version: 3.12
#> [OK] u_stats available
#> [OK] NumPy available
#> [OK] PyTorch available (version 2.5.1, CUDA available)
#>
#> ---------------------------------
#> Environment fully ready (Torch backend available)ustat() takes a list of kernel tensors (R vectors or
matrices) and an Einstein summation expression describing how their
indices are contracted, with distinct letters ranging over distinct
observation indices:
library(ustats)
set.seed(1)
n <- 300
H1 <- rnorm(n)
H2 <- matrix(rnorm(n * n), n, n)
H3 <- rnorm(n)
result <- ustat(
tensors = list(H1, H2, H2, H3),
expression = "a,ab,bc,c->",
backend = "torch", # falls back to numpy if torch is unavailable
average = TRUE, # divide by the number of index tuples
dtype = NULL # auto: float32 on GPU, float64 on CPU
)
print(result)The index structure can equivalently be given as a list of numeric index vectors, which is convenient when the expression is built programmatically:
check_ustats_setup() reports a missing
module. The session is bound to a Python environment that lacks
the dependency. Either install it there
(pip install u-stats), or restart R and select a different
environment (Options 2-3 above).use_condaenv() /
RETICULATE_PYTHON. reticulate binds to a single
Python per R session, at the moment Python is first initialized. Restart
R and configure the environment before anything touches
Python.ustat() warns “Torch backend not
available”. The bound environment has no PyTorch; the
computation falls back to NumPy, which is slower and can be less
numerically stable. Install torch with setup_ustats() or
pip install torch --index-url https://download.pytorch.org/whl/cpu.