Natural language processing (NLP) — Glossary Aria Research

Extended definition

Natural language processing (NLP) is the field at the intersection of artificial intelligence, computational linguistics, and computer science dedicated to representing, processing, and generating human language with computational systems. The field’s trajectory covers four major methodological waves: handcrafted rules and dictionaries (1950–1980), classical statistical methods (1990–2010, formalized in Manning & Schütze, 1999), deep learning with recurrent networks and embeddings (2013–2018), and the current era of large-scale language models based on the Transformer architecture (2018+). Canonical tasks include tokenization, part-of-speech tagging, syntactic parsing, named entity recognition, document classification, sentiment analysis, summarization, machine translation, question answering, and text generation. The contemporary pedagogical reference is Jurafsky & Martin, Speech and Language Processing (third edition in open development), the standard in graduate programs in the field.

When it applies

NLP is appropriate in any project involving unstructured text at scale — analysis of medical records, case law classification, parliamentary or social media discourse analysis, scientific literature mining, automatic summarization, semantic search systems, chatbots and virtual assistants, review and open-feedback analysis. In empirical research, NLP is today the standard approach for any study that needs to extract signal from text exceeding the capacity of systematic human reading.

When it does not apply

NLP does not apply when the problem has a solution with simple regular expressions or keyword counts — technological overkill introduces unnecessary fragility. It does not apply in very small corpora (tens to a few hundred documents), where careful human reading is more robust and informative. It does not replace qualitative analysis in serious interpretive research — NLP classifies and measures, but does not replace contextual human interpretation in discourse studies, hermeneutics, or critical analysis. In domains with highly technical vocabulary or low-resource languages, generic models perform poorly, and domain adaptation is required.

Applications by field

— Health: information extraction from electronic medical records, biomedical literature mining, classification of adverse event reports. — Law: automatic case law classification, argument extraction, contract analysis at scale. — Social sciences and digital humanities: discourse analysis over large corpora, conceptual mapping in historical archives, sentiment studies on social networks. — Bibliometrics: automatic topic detection in scientific literature, thematic paper classification, emerging-fronts identification.

Common pitfalls

The first pitfall is assuming that tools pre-trained on generic corpora transfer well to specialized domains — generic models on Brazilian case law or clinical records perform substantially worse than specialized ones. The second is ignoring linguistic and cultural bias of models: practically all current models were trained predominantly on English with documented social biases. The third is trusting generic benchmark metrics without validating in the application domain — a model with 92% GLUE accuracy can drop to 60% in a specific corpus. The fourth is treating NLP as a black box without methodological documentation — manuscripts using generative models must declare version, prompt, settings, and human validation protocol, per emerging guidelines (COPE, ICMJE, individual journals). The fifth is confusing tasks: automatic classification does not replace manual coding with multiple raters in qualitative research that requires inter-rater reliability.