Insights.
Methods and scientific production.
Argument-driven technical analyses, calibrated for researchers who need to publish in indexed journals. In Portuguese and English, published under the Aria Research name.
COPE, ICMJE and CRediT as Standard Editorial Practice
Recognizing contribution is too important to leave to informal negotiation. COPE, ICMJE and CRediT form the standard editorial practice that documents who did what and makes authorship auditable. Without that standard, misattribution is common: in a survey of six high-impact journals, one in four research articles had an honorary author, and ghost authorship was present too.
Object Detection Beyond ImageNet: When the Domain Leaves the Training Set
Almost all object detection is evaluated on ImageNet or COCO, but the real deployment domains have their own distributions. A detector with high benchmark performance can collapse when the domain leaves the training set. In one study, the same detector fell from 96.79% to 60.18% mAP out of domain. The standard benchmark is not the validation of the deployment domain.
Embeddings and Cultural Bias: What Pretrained Models Learn and Forget
An embedding is a compressed imprint of the text that trained it: it learns the culture of that corpus, with its stereotypes and its silences. Pretrained does not mean neutral. For under-represented populations there are two failures: the encoded stereotype and the thin representation. And the bias is measurable: on a health benchmark, a biomedicine model encoded stronger ethnic associations than a legal one.
Generative AI in Systematic Review: Tool or Shortcut?
Generative AI speeds up the systematic review, but it becomes a shortcut the moment it replaces, rather than assists, human judgment under a documented protocol. The data show why: LLM screeners trade sensitivity for specificity. What makes the use legitimate is the protocol: pre-registration, validation, the model as a second screener with human arbitration, and reporting of prompt, model and version.
Missing Data Is Not a Technical Detail: The Mechanism Decides
Missing data is not a cleanup step. The choice between deleting cases and imputing changes estimates and standard errors, and Q1 reviewers read that decision closely. Validity is governed by the assumed missingness mechanism, not by how much is missing. In one simulation, imputation error was similar under MCAR and MAR but exploded under NMAR, where missingness depends on the missing value itself.
Predictive Modeling in Social Sciences: Why AUC Alone Is Not Enough
AUC is the metric everyone reports and the one that says least about whether the model is any good. It measures ranking, and is blind to calibration, to decision value, and to the predictability ceiling. Worse, high discrimination at derivation does not survive external validation. In 158 external validations of 104 models, the median c-statistic falls from 0.76 to 0.64, so a single number overstates performance.
Publishable vs Exploratory Visualization: Two Objects, Two Rule Sets
Exploratory visualization serves the analyst: fast, disposable, optimized to see. Publishable visualization serves the reader: read once, and it has to decode unaided. They are different objects, not two finish levels of one chart. And the publishing format changes interpretation: a controlled experiment found 'better' graphs read more accurately (OR 1.55) and clearly (OR 1.91) than 'normed' ones.
SEM for Multiple Mediation: When Linear Regression Stops Answering
Multiple mediation asks through which mechanism an effect operates, and the quantity of interest is the indirect effect, a product of paths. Linear regression estimates isolated paths, not the inference on that product nor simultaneous mediators. SEM estimates the whole system, absorbs latent variables and chains. For the interval, the choice of bootstrap changes the false-positive rate by a measurable amount.
Web Scraping in Academic Research: Public Is Not the Same as Collectable
That a datum sits on an open page is a statement about access, not about permission, and still less about ethics. Web scraping in research forces the distinction: terms of use, privacy expectations, and risk of harm draw the line technical accessibility ignores. A review of 367 studies using public Twitter data measured the gap: most reported no ethics approval, and informed consent was attempted in none of them.
Strategic Venue Selection After the First Rejection
Resubmitting by reflex to a lower journal treats rejection as a verdict on quality. Evidence on submission flows shows that what preserves a paper's citation trajectory is fit, not tier, and that the jump across distinct journal communities is where citations are lost.
Literal Translation Is the First Cause of PT→EN Rejection in Q1
Rejection of literally translated manuscripts is rarely a vocabulary problem. It is the rhetorical structure of Portuguese, carried over intact, that an Anglophone reviewer reads as a poorly built argument. The fix is reconstruction in the target register, not word-by-word editing.
The Structured 250-Word Abstract: The Architecture That Decides Reading
Editors and reviewers triage on the abstract; readers decide to read on it. The 250-word limit is not bureaucracy but the IMRaD compression that exposes whether a declarable contribution exists. Structured abstracts beat unstructured ones on completeness and clarity, setting the paper's visibility before any merit of the body.