AI and machine learning
Object Detection Beyond ImageNet: When the Domain Leaves the Training Set
Almost all object detection is evaluated on ImageNet or COCO, but the real deployment domains have their own distributions. A detector with high benchmark performance can collapse when the domain leaves the training set. In one study, the same detector fell from 96.79% to 60.18% mAP out of domain. The standard benchmark is not the validation of the deployment domain.
Embeddings and Cultural Bias: What Pretrained Models Learn and Forget
An embedding is a compressed imprint of the text that trained it: it learns the culture of that corpus, with its stereotypes and its silences. Pretrained does not mean neutral. For under-represented populations there are two failures: the encoded stereotype and the thin representation. And the bias is measurable: on a health benchmark, a biomedicine model encoded stronger ethnic associations than a legal one.
Generative AI in Systematic Review: Tool or Shortcut?
Generative AI speeds up the systematic review, but it becomes a shortcut the moment it replaces, rather than assists, human judgment under a documented protocol. The data show why: LLM screeners trade sensitivity for specificity. What makes the use legitimate is the protocol: pre-registration, validation, the model as a second screener with human arbitration, and reporting of prompt, model and version.
Predictive Modeling in Social Sciences: Why AUC Alone Is Not Enough
AUC is the metric everyone reports and the one that says least about whether the model is any good. It measures ranking, and is blind to calibration, to decision value, and to the predictability ceiling. Worse, high discrimination at derivation does not survive external validation. In 158 external validations of 104 models, the median c-statistic falls from 0.76 to 0.64, so a single number overstates performance.
LDA vs. BERTopic in academic corpora
LDA models probabilistic mixture over words; BERTopic clusters documents by dense semantic similarity. The choice between the two depends on the evaluative dimension relevant to the analytical objective.
Semantic embeddings for systematic review screening
Large-scale manual screening has a 5-12% human error rate and zero documented traceability. Semantic embeddings preserve recall above 90% and make every exclusion auditable against a declared threshold.
Computer vision in medical imaging: high AUC is not enough
Computer vision pipelines for medical imaging fail in Q1 journals not for the accuracy metric but for the absence of documented external validation, demographic subgroup breakdown, and explicit human-in-the-loop intervention. Models with internal AUC of 0.95 drop to 0.54 on data from another hospital, and the STARD-AI, TRIPOD+AI, and CLAIM frameworks consolidated this editorial expectation between 2020 and 2025.