Insights

AI and machine learning

Abstract paper-and-gold composition, no text: a sharp bounding box on the left loses definition crossing a dotted boundary; on the right, skewed boxes, one re-anchored in gold.

Object Detection Beyond ImageNet: When the Domain Leaves the Training Set

Almost all object detection is evaluated on ImageNet or COCO, but the real deployment domains have their own distributions. A detector with high benchmark performance can collapse when the domain leaves the training set. In one study, the same detector fell from 96.79% to 60.18% mAP out of domain. The standard benchmark is not the validation of the deployment domain.

object detectionout-of-distribution domainImageNet

AI and machine learning 5 min

Embeddings and Cultural Bias: What Pretrained Models Learn and Forget

An embedding is a compressed imprint of the text that trained it: it learns the culture of that corpus, with its stereotypes and its silences. Pretrained does not mean neutral. For under-represented populations there are two failures: the encoded stereotype and the thin representation. And the bias is measurable: on a health benchmark, a biomedicine model encoded stronger ethnic associations than a legal one.

embeddingscultural biasunder-represented populations

AI and machine learning 5 min

Generative AI in Systematic Review: Tool or Shortcut?

Generative AI speeds up the systematic review, but it becomes a shortcut the moment it replaces, rather than assists, human judgment under a documented protocol. The data show why: LLM screeners trade sensitivity for specificity. What makes the use legitimate is the protocol: pre-registration, validation, the model as a second screener with human arbitration, and reporting of prompt, model and version.

systematic reviewgenerative AIabstract screening

AI and machine learning 5 min

Predictive Modeling in Social Sciences: Why AUC Alone Is Not Enough

AUC is the metric everyone reports and the one that says least about whether the model is any good. It measures ranking, and is blind to calibration, to decision value, and to the predictability ceiling. Worse, high discrimination at derivation does not survive external validation. In 158 external validations of 104 models, the median c-statistic falls from 0.76 to 0.64, so a single number overstates performance.

predictive modelingsocial sciencesAUC

AI and machine learning 4 min

LDA vs. BERTopic in academic corpora

LDA models probabilistic mixture over words; BERTopic clusters documents by dense semantic similarity. The choice between the two depends on the evaluative dimension relevant to the analytical objective.

topic modelingBERTopicLDA

AI and machine learning 4 min

Semantic embeddings for systematic review screening

Large-scale manual screening has a 5-12% human error rate and zero documented traceability. Semantic embeddings preserve recall above 90% and make every exclusion auditable against a declared threshold.

systematic reviewsemantic embeddingsSBERT

AI and machine learning 10 min

Computer vision in medical imaging: high AUC is not enough

Computer vision pipelines for medical imaging fail in Q1 journals not for the accuracy metric but for the absence of documented external validation, demographic subgroup breakdown, and explicit human-in-the-loop intervention. Models with internal AUC of 0.95 drop to 0.54 on data from another hospital, and the STARD-AI, TRIPOD+AI, and CLAIM frameworks consolidated this editorial expectation between 2020 and 2025.

computer visionmedical imagingdeep learning