Deep Learning

Lectures & Suggested Readings:

Reports of errors in the resources below are always welcome

2025.03.07 (theory)
Introduction [pdf]
AI spring? Artificial Intelligence, Machine Learning, Deep Learning: facts, myths and a few reflections.
2025.03.07 (theory)
Fundamentals: Artificial Neural Networks [pdf]
Foundations of machine learning: dataset, representation, evaluation, optimization. Feed-forward neural networks as universal approximators.
2025.03.14 (theory)
Flow Graphs and Automatic Differentiation [pdf]
Tensorial representation, flow graphs. Automatic differentiation: primal graph, adjoint graph.
2025.03.14 (theory)
Deep Networks [pdf]
Deeper networks: potential advantages and new challenges. Tensorial layerwise representation. Softmax and cross-entropy.

Aside 1: Tensor Broadcasting [pdf]

Shannon Entropy (Wikipedia)

Cross Entropy (Wikipedia)
2025.03.21 (theory)
Learning as Optimization [pdf]
Vanishing and exploding gradients. First and second order optimization, approximations, optimizers. Further tricks.

Aside 2: Exponential Moving Average [pdf]

Aside 3: Predictions [pdf]
From in-sample optimization to out-of-sample generalization.
2025.03.28 (theory)
Deep Convolutional Neural Networks [pdf]
Convolutional filter, filter banks, feature maps, pooling, layerwise gradients.
2025.04.04 (theory)
Deep Convolutional Neural Networks and Beyond [pdf]
Some insight into what happens in convolution layers. Different DCNN architectures. Transfer learning. Segmentation and object detection.

J Yosinski, J Clune, Y Bengio, H Lipson, "How transferable are features in deep neural networks?" in Advances in Neural Information Processing Systems (NIPS 2014) [link]

Aside 4: Hardware for Deep Learning [pdf]
Main differences bewtween CPUs and GPUs, SIMT parallelism, bus-oriented communication, a few caveats.

Aside 5: Differentiating Algorithms [pdf]
Wengert list, ahead-of-time and runtime autodiff, lazy mode, just-in-time compilation, differences among TensorFlow, PyTorch, JAX.
2025.04.11 (theory)
Deep Learning and Time Series [pdf]
Recurrent Neural Networks (RNN), temporal unfolding, LSTM Cells, GRU cells, encoder / decoder, convolution, time series analysis-

Aside 6: Auto-Encoders [pdf]
A very popular and powerful network architecture pattern, which is also the basis for diffusion models. The relation between Auto-Encoders and Principal Component Analysis.
2025.05.09 (theory)
Aside 7: Word Embedding [pdf]
Skip-grams, probability distributions of context and center words, training and results, continuous bag of words (CBOW) model.

Attention and Transformers [pdf]
Attention as a kernel, attention maps, queries, key and values, attention-based encoder and decoder, transformer architecture, translator.

A Vaswani, N Shazeer, N Parmar, J Uszkoreit, L Jones, A N Gomez, L Kaiser, I Polosukhin, "Attention Is All You Need" in Advances in Neural Information Processing Systems (NIPS 2017) [link]
2025.05.16 (theory)
Aside 8: Kullback-Leibler divergence [pdf]
Shannon's entropy in the theory of information: intuition and formalism. Cross-entropy, KL divergence, intuition and formalism.

Kullback-Leibler Divergence (Wikipedia)

Generative Networks: VAE [pdf]
Generative adversarial networks (GAN), Variational Auto-Encoders (VAE): structuring the latent space, Gaussian-Mixture VAE: adapting to multiple classes
2025.05.23 (theory)
Generative Networks: Diffusion Models [pdf]
Denoising Diffusion Probabilistic Models (DDPM), mathematical foundations, practical implementation, conditioning on multimodal labels
2025.05.30 (theory)
Aside 9: Reinforcement Learning [pdf]
A short recap about RL foundations, Markov decision process, state value function, policy, optimality, action value function, Q-learning.

Deep Reinforcement Learning [pdf]
Integrating DNNs into the RL paradigm, DQN algorithm, policy gradient, Actor-Critic methods
2025.06.06 (theory)
Monte Carlo Tree Search [pdf]
Game trees, Monte Carlo strategy, Monte Carlo Tree Search (MCTS), Upper Confidence Bounds applied to Trees (UCT).
2025.06.13 (theory)
AlphaZero, MuZero [pdf]
MCTS + DNN, network architecture, replacing MCTS rollout with estimation, network training, model-free MCTS: MuZero.

D J Mankowitz et al., "Faster sorting algorithms discovered using deep reinforcement learning", Nature 618, 257:263 (2023) [link]

Instructor

Marco Piastra
Contact: marco.piastra@unipv.it

Kiro

Course info

Exams

See Faculty website

Further resources:

Video recordings and Colab notebooks are available on Kiro

(There are no required textbooks for this course. The following books are recommended as optional readings)

Christopher Bishop, Hugh Bishop
Deep Learning: Foundations and Concepts
Springer, 2024
[Online version]
Aston Zhang, Zachary Lipton, Mu Li, Alexander Smola
Dive into Deep Learning
Cambridge University Press, 2024
[Online version, with exercises]
Ian Goodfellow, Yoshua Bengio, Aaron Courville
Deep Learning
MIT Press, 2017
[Online version]
Kevin P. Murphy
Probabilistic Machine Learning: Advanced Topics
MIT Press, 2023
[Pre-print]
Richard s. Sutton, Andrew G. Barto
Reinforcement Learning: An Introduction (second edition)
MIT Press, 2018
[Online version]

Università degli Studi di Pavia

Facoltà di Ingegneria