A Large Language Model (LLM) is an artificial intelligence system trained on massive amounts of text data to understand and generate human-like language.
‍
At its core, an LLM:
- Uses machine learning (deep learning) — often based on the transformer architecture — to process text, recognize patterns, and predict the next word or token in a sequence.
- Is trained on large, diverse datasets (books, articles, websites, code, transcripts) so it can respond to many types of prompts.
- Generates outputs probabilistically, meaning it doesn't “know” facts but uses learned patterns to produce likely responses.
- Can be adapted (fine-tuned) for specific domains or tasks, such as legal writing, customer support, or code generation.
Key traits of an LLM:
- Input/Output: Works with natural language or code as input and generates text-based output.
- General-purpose: Can answer questions, summarize content, write creatively, translate languages, or help with coding — but relies on prompts for each task.
- No inherent agency: On its own, an LLM is reactive — it waits for input and responds — unlike an AI agent, which can take autonomous actions.
Examples: GPT-4o (OpenAI), Claude (Anthropic), Gemini (Google DeepMind), LLaMA (Meta), Mistral models.