Insights · 21 essays

Insights.
Methods and scientific production.

Argument-driven technical analyses, calibrated for researchers who need to publish in indexed journals. In Portuguese and English, published under the Aria Research name.

Abstract paper-and-gold composition, no text: three vertical bands of different weight converge on a column of aligned seals, one highlighted in gold, with two loose dots left outside.
Writing and publishing 5 min

COPE, ICMJE and CRediT as Standard Editorial Practice

Recognizing contribution is too important to leave to informal negotiation. COPE, ICMJE and CRediT form the standard editorial practice that documents who did what and makes authorship auditable. Without that standard, misattribution is common: in a survey of six high-impact journals, one in four research articles had an honorary author, and ghost authorship was present too.

editorial practiceauthorshipICMJE
AI and machine learning 5 min

Object Detection Beyond ImageNet: When the Domain Leaves the Training Set

Almost all object detection is evaluated on ImageNet or COCO, but the real deployment domains have their own distributions. A detector with high benchmark performance can collapse when the domain leaves the training set. In one study, the same detector fell from 96.79% to 60.18% mAP out of domain. The standard benchmark is not the validation of the deployment domain.

object detectionout-of-distribution domainImageNet
AI and machine learning 5 min

Embeddings and Cultural Bias: What Pretrained Models Learn and Forget

An embedding is a compressed imprint of the text that trained it: it learns the culture of that corpus, with its stereotypes and its silences. Pretrained does not mean neutral. For under-represented populations there are two failures: the encoded stereotype and the thin representation. And the bias is measurable: on a health benchmark, a biomedicine model encoded stronger ethnic associations than a legal one.

embeddingscultural biasunder-represented populations
AI and machine learning 5 min

Generative AI in Systematic Review: Tool or Shortcut?

Generative AI speeds up the systematic review, but it becomes a shortcut the moment it replaces, rather than assists, human judgment under a documented protocol. The data show why: LLM screeners trade sensitivity for specificity. What makes the use legitimate is the protocol: pre-registration, validation, the model as a second screener with human arbitration, and reporting of prompt, model and version.

systematic reviewgenerative AIabstract screening
Data and statistics 5 min

Missing Data Is Not a Technical Detail: The Mechanism Decides

Missing data is not a cleanup step. The choice between deleting cases and imputing changes estimates and standard errors, and Q1 reviewers read that decision closely. Validity is governed by the assumed missingness mechanism, not by how much is missing. In one simulation, imputation error was similar under MCAR and MAR but exploded under NMAR, where missingness depends on the missing value itself.

missing datamultiple imputationMAR
AI and machine learning 5 min

Predictive Modeling in Social Sciences: Why AUC Alone Is Not Enough

AUC is the metric everyone reports and the one that says least about whether the model is any good. It measures ranking, and is blind to calibration, to decision value, and to the predictability ceiling. Worse, high discrimination at derivation does not survive external validation. In 158 external validations of 104 models, the median c-statistic falls from 0.76 to 0.64, so a single number overstates performance.

predictive modelingsocial sciencesAUC
Data and statistics 6 min

Publishable vs Exploratory Visualization: Two Objects, Two Rule Sets

Exploratory visualization serves the analyst: fast, disposable, optimized to see. Publishable visualization serves the reader: read once, and it has to decode unaided. They are different objects, not two finish levels of one chart. And the publishing format changes interpretation: a controlled experiment found 'better' graphs read more accurately (OR 1.55) and clearly (OR 1.91) than 'normed' ones.

exploratory visualizationpublishable visualizationgraphical perception
Data and statistics 5 min

SEM for Multiple Mediation: When Linear Regression Stops Answering

Multiple mediation asks through which mechanism an effect operates, and the quantity of interest is the indirect effect, a product of paths. Linear regression estimates isolated paths, not the inference on that product nor simultaneous mediators. SEM estimates the whole system, absorbs latent variables and chains. For the interval, the choice of bootstrap changes the false-positive rate by a measurable amount.

multiple mediationSEMindirect effect
Data and statistics 5 min

Web Scraping in Academic Research: Public Is Not the Same as Collectable

That a datum sits on an open page is a statement about access, not about permission, and still less about ethics. Web scraping in research forces the distinction: terms of use, privacy expectations, and risk of harm draw the line technical accessibility ignores. A review of 367 studies using public Twitter data measured the gap: most reported no ethics approval, and informed consent was attempted in none of them.

web scrapingresearch ethicspublic data
Writing and publishing 5 min

Strategic Venue Selection After the First Rejection

Resubmitting by reflex to a lower journal treats rejection as a verdict on quality. Evidence on submission flows shows that what preserves a paper's citation trajectory is fit, not tier, and that the jump across distinct journal communities is where citations are lost.

submission strategyeditorial rejectionjournal selection
Writing and publishing 5 min

Literal Translation Is the First Cause of PT→EN Rejection in Q1

Rejection of literally translated manuscripts is rarely a vocabulary problem. It is the rhetorical structure of Portuguese, carried over intact, that an Anglophone reviewer reads as a poorly built argument. The fix is reconstruction in the target register, not word-by-word editing.

academic translationeditorial rejectioncontrastive rhetoric
Writing and publishing 4 min

The Structured 250-Word Abstract: The Architecture That Decides Reading

Editors and reviewers triage on the abstract; readers decide to read on it. The 250-word limit is not bureaucracy but the IMRaD compression that exposes whether a declarable contribution exists. Structured abstracts beat unstructured ones on completeness and clarity, setting the paper's visibility before any merit of the body.

structured abstractacademic writingeditorial triage