"A neural network is not a model of the brain so much as a model of one thing the brain does — turning patterns into responses, and getting better at it with practice."- Claude 2026
How a network of simple units learns to behave intelligently — and how researchers use such networks to model the human mind.
By the end of this page you should be able to:
An artificial neural network (ANN) — a computing system made of many small processing units connected together, loosely inspired by how brain cells connect — learns to perform a task by adjusting the strengths of those connections rather than by following hand-written rules. The whole field rests on one small building block, repeated thousands or millions of times.
The basic unit is the artificial neuron (also called a node or, in its simplest form, a perceptron). It does three things in order: it takes several numbers as input, combines them, and produces a single number as output. Each input arrives with a weight — a number that says how much that input matters — and the neuron multiplies each input by its weight and adds the results together (a weighted sum). It then adds a bias (a fixed offset that shifts the result up or down) and passes the total through an activation function — a rule that decides the neuron's final output, often squashing it into a fixed range or deciding whether the neuron "fires."
Compactly, a neuron computes y = φ(Σ wᵢxᵢ + b), where the wᵢ are the weights, b is the bias, and φ is the activation function. Geometrically, the weights and bias define a hyperplane (a flat decision boundary) through the input space; before the activation is applied, a single neuron is therefore just a linear classifier, splitting its inputs into two halves. The activation is what bends that boundary and gives the unit its real power.
One number per input, controlling its influence. Large weight means the input strongly affects the output; near zero means it is mostly ignored. These are the values the network adjusts as it learns.
A single extra number added to the weighted sum. It lets the neuron shift its threshold for firing, so it does not have to pass through zero — making the unit far more flexible.
A rule applied to the total (common choices include the S-shaped sigmoid and the ReLU, which keeps positive values and zeroes the rest). It introduces non-linearity, letting networks model complex relationships.
* ReLU is now the default in hidden layers because sigmoids saturate and cause vanishing gradients in deep stacks; an output layer doing classification typically ends in softmax, which turns raw scores into a probability distribution over the classes.)
A single neuron can only draw simple distinctions. The power comes from arranging many of them in layers and connecting the output of each layer to the input of the next. A typical network has three kinds of layer:
Passing data forward through the layers to produce an answer is called the forward pass. When every unit in one layer connects to every unit in the next and the data flows in one direction only, the network is called feedforward (and such all-to-all layers are termed fully connected or dense) — the most basic design and the foundation for the rest.
Crucially, a network is not programmed with the right weights — it learns them. It is shown examples, compares its output to the correct answer using a loss function (a measure of how wrong it is), and adjusts its weights to reduce that loss. The standard procedure is gradient descent driven by backpropagation — the chain rule applied backward through the network to find how each weight affects the loss — repeated over many examples until the network performs the task. For the plain-language version, it is enough to know that learning means changing weights; the technical version is that it means descending a loss surface.
A cognitive model is a working system — usually a computer program — built to reproduce some aspect of human thinking precisely enough to be tested. The aim is not just to get the right answer but to get it the way a person would: making the same kinds of mistakes, taking longer on harder problems, showing the same memory limits. If a model behaves like a human on a task, it becomes evidence for how the mind might actually work. Cognitive models fall into a few broad traditions.
Connectionist models — also called parallel distributed processing (PDP) models — use neural networks directly as theories of cognition. Knowledge is not stored as explicit facts but is spread across the connection weights of a network (a distributed representation, where each concept is a pattern of activity over many units rather than a single symbol), and behavior emerges from many simple units acting together. These models are well suited to explaining abilities that feel automatic and pattern-based — recognizing a familiar face, learning the past tense of verbs, filling in a half-heard word. Their strength is that they learn from examples and show graceful degradation — performance falls off gradually rather than collapsing when information is noisy, incomplete, or units are damaged — much as human cognition does.
A cognitive architecture is a unified theory of the mind expressed as software: a fixed set of mechanisms — for memory, perception, and action — that together aim to model cognition as a whole, rather than one isolated task. Two are especially influential:
ACT-R (Adaptive Control of Thought—Rational), developed by John Anderson at Carnegie Mellon, divides the mind into specialized modules (visual, manual, declarative memory, and others), each accessed through a small holding area called a buffer. It separates declarative knowledge (facts — "Paris is the capital of France") from procedural knowledge (rules for action — how to type a letter), and produces step-by-step simulations whose timing and errors can be compared directly with human data.
Soar, developed by Allen Newell and colleagues, models intelligent behavior as a continual cycle of applying rules to reach goals. When the system reaches a point where it does not know what to do (an impasse), it sets up a sub-goal to work it out, and remembers the solution for next time (a learning process called chunking). Soar emphasizes general problem-solving across many tasks.
These architectures are usually classified as symbolic — they represent knowledge as discrete, readable symbols and manipulate them with explicit rules — in contrast to the connectionist approach, where knowledge is distributed across numerical weights. The two traditions answer different questions: symbolic architectures excel at modeling deliberate, step-by-step reasoning, while connectionist models excel at fast, intuitive pattern recognition.
Because the two traditions have complementary strengths, a growing line of work tries to combine them. A neuro-symbolic model — a system that joins a neural network's pattern-learning with a symbolic component's explicit rules and reasoning — aims to do both at once: a neural part turns raw, messy input into meaningful pieces, and a symbolic part reasons over those pieces using logic and stored knowledge. In a typical arrangement, the neural component looks at a scene and identifies the objects in it, and the symbolic component then applies rules to draw conclusions — for example, checking that an interpretation obeys commonsense constraints (a cup cannot rest in mid-air without support) and rejecting it if it does not.
This pairing is appealing as a cognitive model because it mirrors a familiar picture of human thought: a fast, automatic mode that recognises patterns at a glance, and a slower, deliberate mode that reasons step by step (an idea often described as dual-process cognition). Concrete systems include DeepProbLog, which adds neural predicates to probabilistic logic programming, and Logic Tensor Networks, which embed logical constraints into a network's training signal — both examples of differentiable reasoning, where symbolic rules are made smooth enough to learn through. Combining learning and reasoning in one system is an active research area — sometimes called the third wave of AI, after the earlier symbolic and neural waves — and it remains an open challenge to get the two halves to work together smoothly. Researchers pursue it partly because the symbolic half can make a system's decisions easier to inspect and check, and partly because explicit rules let a system generalise from far fewer examples than a neural network alone would need.
The line between these models and ordinary AI is the intent. An engineer building a face-recognition system wants accuracy and does not care whether it works like a human. A cognitive modeler wants the system to match human behavior — its speed, its limits, its errors — because the goal is explanation, not just performance.
Different cognitive abilities have different shapes, and over time researchers have developed network designs (also called architectures) whose structure matches the structure of the problem. Choosing a design is really a claim about what kind of processing the task requires. Four families cover most cognitive applications.
The pattern across all four is the same: the structure of the network encodes an assumption about the structure of the cognitive task. A convolutional design assumes that what matters can appear anywhere in an image; a recurrent design assumes the order of inputs carries meaning; an attention-based design assumes that only part of the input is relevant at any moment. Analyzing a design therefore means asking what claim its structure makes — and whether that claim fits the ability being modeled.
When a network is built as a cognitive model rather than just an engineering tool, this matching becomes the central question. A researcher modeling how people recognize objects would reach for a convolutional design, because its feature-detecting structure mirrors a known property of biological vision. A researcher modeling how people understand a sentence word by word would reach for a recurrent or attention-based design, because comprehension clearly depends on what came earlier. The design is not chosen for raw accuracy alone but because its internal organization makes a testable claim about how the corresponding human ability is organized.
They learn from experience rather than requiring every rule to be specified, they tolerate noisy and incomplete input, and their layered feature-building resembles real perceptual systems — making them natural models of fast, automatic cognition.
A network that performs a task well does not automatically explain how humans do it; it can be hard to interpret why a network responds as it does; and matching human accuracy is not the same as matching human process — the real test of a cognitive model.
Where a neural network is built from one repeated unit, a symbolic cognitive architecture is built from several specialized modules that pass information between them — a very different picture of how a mind might be organized.