A large language model (LLM) is a type of artificial intelligence system trained on massive text datasets that can generate, summarize, translate, and reason about language at a level that closely approximates human ability.
‍
A large language model (LLM) is an AI system built on a neural network architecture that has been trained on extremely large volumes of text data—web pages, books, code repositories, and more. Through this training process, the model develops the ability to process text input (a "prompt") and generate coherent, contextually relevant text in response.
‍
The "large" in LLM refers to scale: both the volume of training data and the number of parameters (internal variables) the model uses to process information. Modern LLMs have hundreds of billions of parameters, trained on datasets that represent a significant fraction of publicly available human writing.
‍
What distinguishes LLMs from earlier AI systems is their generality. Rather than being trained to perform a single narrowly defined task, LLMs can adapt to a wide range of tasks—answering questions, writing and editing text, summarizing documents, translating languages, explaining code, generating code, and reasoning through multi-step problems—based solely on how they're prompted.
‍
LLMs appear in enterprise environments in several forms, each with distinct security implications:
‍
Standalone AI assistants—Tools like ChatGPT, Claude, and Google Gemini that employees access directly, typically through a browser, to assist with individual work tasks. These often exist outside IT visibility and formal data governance.
‍
Developer and coding tools—Tools like GitHub Copilot and Cursor that integrate LLMs into the development workflow. These process actual source code, which may include sensitive logic or hardcoded credentials.
‍
Embedded AI features—LLM capabilities built into existing SaaS applications. Salesforce Einstein, Microsoft Copilot integrated into Office 365, Notion AI, Slack AI—many applications employees already use now incorporate LLM features that process organizational data.
‍
Custom LLM deployments—Organizations building their own applications on top of LLM APIs (from OpenAI, Anthropic, Google, or open-source models). These create custom data flows between organizational systems and LLM infrastructure.
‍
The core security concern with LLMs is data. Users input information into LLM prompts—sometimes carefully, often without much thought about what that information is or where it goes. That information may include customer data, confidential contracts, internal financial models, source code, or employee records.
‍
What happens to that data depends entirely on which tool is being used, under which account type, and what the vendor's data handling policies are. Consumer versions of AI tools often include terms that allow prompt data to be used for model training or retained for quality review. Enterprise versions typically provide stronger protections—but only if the employee is using the enterprise version through an approved account.
‍
The challenge for security teams is that LLM usage has outpaced governance. Employees adopted AI tools quickly and independently. The tools are easy to access, often free or low-cost for individuals, and provide genuine value. By the time most organizations began developing AI use policies, significant ungoverned usage was already underway.
‍
Additionally, LLM capabilities are now embedded in SaaS applications organizations already use. An AI summary feature added to a project management tool, or a generative assistant built into a CRM platform, creates LLM exposure that organizations may not have evaluated or even noticed.
‍
See how Nudge Security identifies LLM tool usage and AI-related exposure across the organization →