--- title: "Introduction to AIBias: Longitudinal Bias Auditing" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction to AIBias: Longitudinal Bias Auditing} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 4.5, warning = FALSE, message = FALSE ) library(AIBias) ``` ## Overview Standard fairness audits treat bias as a **snapshot** — a single-point disparity measurement at one moment in time. But in sequential decision systems (loan approvals, parole reviews, hiring pipelines, credit scoring), decisions at time $t$ feed back into the features available at time $t+1$. **AIBias** treats algorithmic bias as a longitudinal process. It tracks: - How group disparities **evolve** over repeated decisions - Where disparities **compound** via transition dynamics - Whether earlier decisions **amplify** later inequality --- ## The Synthetic Lending Dataset The package ships with `lending_panel`, a 600-applicant × 6-year panel of synthetic loan decisions across three racial groups. ```{r data-overview} data(lending_panel) head(lending_panel) ``` ```{r approval-rates} # Pooled approval rates by group tapply(lending_panel$approved, lending_panel$race, mean) |> round(3) ``` Even from this simple tabulation we see a gap. But it misses the *dynamic* story: are disadvantaged groups less able to recover from a denial? Are gaps widening over time? --- ## Step 1 — Build the Audit Object ```{r build} obj <- aib_build( data = lending_panel, id = "applicant_id", time = "year", group = "race", decision = "approved" ) print(obj) ``` --- ## Step 2 — Describe Bias Trajectories `aib_describe()` computes: - $\hat{\pi}_g(t)$: group-specific decision rate at each wave - $\hat{B}_{g,r}(t) = \hat{\pi}_g(t) - \hat{\pi}_r(t)$: raw bias trajectory - $\hat{B}^*_{g,r}(t)$: standardized (SMD) trajectory - $CB_{g,r}(T)$: cumulative bias burden ```{r describe} obj <- aib_describe(obj, ref_group = "White") obj$bias$cumulative ``` The cumulative burden (`CB_normalized`) summarizes the average disparity experienced across all waves — a single policy-facing number. ### Trajectory Plot ```{r plot-trajectory, fig.alt="Bias trajectory plot"} plot(obj, type = "trajectory") ``` Both groups show a persistent negative disparity from wave 1. The gap is relatively stable, suggesting a **persistent** rather than purely growing pattern — but the dynamic analysis below reveals compounding beneath the surface. ### Heatmap — Disparity Surface ```{r plot-heatmap, fig.alt="Group-time disparity heatmap"} plot(obj, type = "heatmap") ``` The heatmap displays the full group × time disparity surface. Red cells indicate disadvantaged periods. --- ## Step 3 — Transition Analysis The key question for **compounding bias**: are disadvantaged groups less likely to *recover* after a denial, and less likely to *retain* approval? ```{r transition} obj <- aib_transition(obj, ref_group = "White") # Recovery and retention gaps obj$transitions$recovery_gap obj$transitions$retention_gap ``` ```{r plot-transition, fig.alt="Transition probabilities plot"} plot(obj, type = "transition") ``` The transition plot reveals the mechanism of compounding. Despite some overall disparity in approval rates, the **recovery gap** is the most consequential finding: after a denial, Black applicants recover at a much lower rate than White applicants, locking them into unfavorable states. ### Markov State Evolution The Markov amplification operator $A^{state}_{g,r}(T) = \sum_t \|v_g(t) - v_r(t)\|$ quantifies cumulative divergence in state distributions: ```{r amp-state} obj$transitions$amp_state ``` --- ## Step 4 — Amplification Analysis The amplification index measures: $$A_{g,r}(t) = B_{g,r}(t \mid 1) - B_{g,r}(t \mid 0)$$ If $A_{g,r}(t) \neq 0$, prior decision state is **modifying** the group disparity — the hallmark of dynamic rather than static bias. ```{r amplify} obj <- aib_amplify(obj, ref_group = "White") obj$amplification$cumulative ``` ```{r plot-amplification, fig.alt="Amplification index plot"} plot(obj, type = "amplification") ``` ### Narrative Interpretation ```{r narratives} obj$amplification$narratives ``` --- ## Step 5 — Covariate Adjustment To separate "case mix" differences from residual disparity, fit a covariate-adjusted model: ```{r adjust, eval=FALSE} obj <- aib_adjust( obj, formula = ~ income + credit_score, method = "glm", ref_group = "White" ) # Adjusted trajectory head(obj$adjusted$trajectory) ``` --- ## Step 6 — Bootstrap Confidence Intervals ```{r bootstrap, eval=FALSE} obj <- aib_bootstrap(obj, B = 500, seed = 2024, conf = 0.95) plot(obj, type = "trajectory") # Now includes ribbon CIs ``` --- ## One-Shot: `aib_audit()` Run the full pipeline in one call: ```{r audit, eval=FALSE} result <- aib_audit( lending_panel, id = "applicant_id", time = "year", group = "race", decision = "approved", ref_group = "White", bootstrap = TRUE, B = 200, seed = 42 ) summary(result) ``` --- ## Formal Definition of Bias Amplification A decision system exhibits **bias amplification** for group $g$ relative to reference group $r$ over times $1, \ldots, T$ if: 1. $|B_{g,r}(t)| > |B_{g,r}(s)|$ for some $t > s$ (disparity grows), **and** 2. Either $A_{g,r}(t) = B_{g,r}(t \mid 1) - B_{g,r}(t \mid 0) \neq 0$ (prior decisions modulate current disparity), **or** 3. $P_g(t) \neq P_r(t)$ (group transition matrices are unequal). **Proposition:** If $p_g^{11}(t) < p_r^{11}(t)$ and $p_g^{01}(t) < p_r^{01}(t)$ for all $t$, then under common initial conditions the favorable-decision probability for group $g$ weakly decreases relative to group $r$ over time, implying nonnegative cumulative disparity against group $g$. This distinguishes *static persistent bias* (constant gap) from *dynamic compounding bias* (self-reinforcing gap driven by the decision process itself). --- ## Summary of Core Estimands | Estimand | Function | Formula | |---|---|---| | Bias trajectory | `aib_describe()` | $B_{g,r}(t)$ | | Standardized trajectory | `aib_describe()` | $B^*_{g,r}(t)$ | | Cumulative burden | `aib_describe()` | $CB_{g,r}(T)$ | | Recovery gap | `aib_transition()` | $\Delta^{01}_{g,r}$ | | Retention gap | `aib_transition()` | $\Delta^{11}_{g,r}$ | | Amplification index | `aib_amplify()` | $A_{g,r}(t)$ | | Bias persistence | `aib_persistence()` | $PB_{g,r}(c)$ |