Why graph structure can't just be flattened
A graph has no canonical ordering — relabel the nodes and it's the same graph. Flatten it into a vector and you've baked in a fake order; train an MLP on that and the model will treat node 3 differently from node 7 even when they play identical structural roles. On top of that, nodes have variable numbers of neighbours — atom 1 might bond to two others, atom 2 to four — so you can't even use a fixed-size input vector. GNNs solve both: the aggregation step is permutation-invariant (sum / mean / max don't care about order), and it works on any neighbourhood size.
How message passing works intuitively
Picture each node holding a small vector — its current "state". One round of message passing is:
- Every node sends its state along each outgoing edge.
- Every node collects everything it received and combines it (sum, mean, max, or learned attention).
- Every node mixes that aggregate with its old state to produce a new state.
Repeat K times. After one round, a node knows about its immediate neighbours. After two, its neighbours' neighbours. After K, its entire K-hop neighbourhood. Click a node on the edge of the figure and count how many rounds the signal needs to cross the graph.
Why depth matters — but also hurts
More layers means longer-range information, which sounds good. The catch is over-smoothing: keep averaging neighbours into yourself and after enough rounds, every node converges to the same blurry mean of the whole graph. You've lost the very distinctions you were trying to learn. That's why most GNNs use only 2–4 layers, with skip connections or jumping-knowledge tricks if you need more.
Node, edge, or graph-level tasks
The same backbone serves three task families:
- Node-level — one prediction per node (which user will churn? what role does this protein play?).
- Edge-level — predict whether an edge should exist or what kind it is (link prediction, knowledge-graph completion).
- Graph-level — one prediction for the whole graph, formed by pooling all node embeddings into a single vector (will this molecule bind to the target?).