Why the bottleneck matters
If the code were as wide as the input, the network could just copy values through and learn nothing — the trivial identity function. The bottleneck makes that impossible: only a fixed number of dimensions get to pass, so the encoder has to choose what's worth keeping. The decoder, working from those few numbers alone, becomes a reconstruction prior — it fills in what it has learned tends to be there. The two halves trained jointly converge on a compact summary of the data's structure.
What the latent space encodes
The 2-D latent in the figure is doing a job a lot like PCA, but non-linear. Each preset lands at its own spot in the plane; points between presets decode to blends. Whatever varies most across the training set ends up as a direction in latent space. Walk along that direction and the reconstruction smoothly morphs — without you ever telling the network what "morph" means.
Why this is unsupervised
There are no labels. The target is the input — every example supervises its own reconstruction. That's the appeal: any pile of unlabelled data is enough to learn an embedding you can then plug into a classifier, a clustering algorithm, or an anomaly detector.
Denoising autoencoders
A small twist with outsized benefits: corrupt the input (add noise, mask pixels) and train the decoder to recover the clean original. The model can no longer memorise inputs because the inputs change every step — it has to learn what should be there. This is the direct ancestor of masked-token pretraining in language models.
Turning it generative: the VAE
A plain AE's latent space has holes — sample a random code and the decoder usually returns garbage, because nothing trained it on that point. The variational autoencoder fixes that: the encoder outputs a distribution over codes and every one is pulled toward a shared prior, so the whole space becomes samplable. That turns an autoencoder into a proper generative model — and it gets its own treatment (the ELBO, the reparameterisation trick, why it's a lower bound, sampling) on the Variational Autoencoders page. The VAE mode toggle in the figure above is a taste of it: the code becomes a little cloud rather than a point.