Algorithmic fairness — Glossary Aria Research

Extended definition

Algorithmic fairness is the interdisciplinary ML subfield studying bias and discrimination in algorithmic systems, articulating formal criteria to measure and mitigate disparities between groups defined by protected attributes (race, gender, age, disability). Barocas, Hardt, and Narayanan (2019, Fairness and Machine Learning: Limitations and Opportunities, online at fairmlbook.org) consolidated the contemporary theoretical reference; Mehrabi et al. (2021, ACM Computing Surveys) offered a complete review of bias types and mitigation methods. Central formal criteria include: demographic parity (equal proportions of positive classifications across groups), equal opportunity (equal true-positive rate), equalized odds (equal TPR and FPR), per-group calibration (predicted probability matches actual frequency in each group). Fundamental result (Chouldechova 2017; Kleinberg, Mullainathan, Raghavan 2016): with different prevalence across groups, it is mathematically impossible to simultaneously satisfy parity, equalized odds, and calibration, except in trivial cases — value-laden choices are unavoidable. Mitigation at three pipeline points: pre-processing (resampling, reweighting), in-processing (loss-function constraints, adversarial models), post-processing (per-group threshold adjustment).

When it applies

Algorithmic fairness applies to any ML system that makes or supports decisions with substantive human impact: credit, insurance, hiring, criminal justice, public resource allocation, medical diagnosis, content moderation. It is a growing regulatory requirement — EU AI Act (2024), US Equal Credit Opportunity Act, sector-specific regulations. It applies in published research when data or model can reflect systematic biases: NeurIPS, FAccT (ACM Conference on Fairness, Accountability, and Transparency) require impact declarations. It applies to critical review of production systems: fairness auditing is emerging practice in large organizations. It applies in scientific ML research where training data have under-representation of relevant groups (e.g., medical images with light-skin dominance).

When it does not apply

It does not apply as an isolated technical solution to structural social problems: algorithmic bias reflects biases in society and data; purely technical correction is insufficient without process, policy, and data change. It does not apply as a single set of universally valid criteria: the appropriate criterion depends on context, values, and trade-offs acceptable to the affected community. It does not replace human and participatory auditing: stakeholders from affected groups should be involved in criteria definition. It does not apply in problems where the protected attribute is causally appropriate for the outcome (rare cases): care in distinguishing spurious association from genuine causal relation. In systems where the final decision is human, algorithmic fairness is part of the audit but does not exhaust ethical responsibility.

Applications by field

— Criminal justice: systems like COMPAS under scrutiny in ProPublica (2016), Angwin et al.; subsequent research on equity-aware risk assessment. — Credit and finance: models with parity across demographic groups required by regulators; FICO score under audit. — Health: clinical ML with under-representative data can amplify disparities; auditing by race, gender, socioeconomic class. — Human resources: resume screening systems with documented gender bias (Amazon 2018); research on hiring algorithms.

Common pitfalls

The first pitfall is trying to simultaneously satisfy all formal criteria: mathematical impossibility (Chouldechova 2017) implies value-laden choices; transparency about the choice is better than illusion of technical neutrality. The second is removing the protected attribute from features (“fairness through unawareness”) without realizing that correlated features (ZIP code, name, educational institution) act as proxies. The third is trusting aggregate metrics without disaggregation: a model with 90% global accuracy can have 60% in a critical subgroup; per-group analysis is essential. The fourth is treating fairness as a compliance checkbox: technical implementation without organizational and process change produces fairness theater, not real fairness. The fifth is failing to involve affected stakeholders in criteria definition: purely technical definition imposed by developers reproduces existing hierarchies.