04-650   Mathematical Foundations of Machine Learning

Location: Africa

Units: 12

Semester Offered: Fall

Course description

This course offers a comprehensive mathematical foundation for machine learning, covering essential topics from linear algebra, calculus, probability theory, and optimization to advanced concepts including information theory, statistical inference, regularization, and kernel methods. The course aims to equip students with the necessary mathematical tools to understand, analyze, and implement various machine-learning algorithms and models at a deeper level.

Learning objectives

In this course, students will:

  • Learn the foundational concepts and techniques of linear algebra, including vector and matrix operations, eigenvectors, and eigenvalues, with a focus on their application in machine learning
  • Learn calculus concepts, such as derivatives and optimization techniques, and apply them to solve machine-learning problems
  • Gain a comprehensive understanding of probability theory and statistics, including multivariate random variables and maximum likelihood estimation, and their role in machine learning
  • Learn various optimization methods, including gradient descent and convex optimization, and their application in machine learning
  • Learn information theory and its relevance to machine learning


Upon the completion of this course, students will be able to:

  • Use linear algebra concepts such as matrices, vectors, and eigenvalues to represent and manipulate data
  • Students will be able to use calculus concepts such as differentiation and gradients to optimize machine learning models
  • Use probability and statistical concepts to model and infer from data
  • Use optimization techniques such as gradient descent and convex optimization to optimize machine learning models
  • Explain the role of information entropy for assessing model accuracy

Content details

This course includes:

  • Linear algebra: vectors and matrices, vector spaces, systems of linear equations, eigenvalue decomposition, singular value decomposition, least-squares
  • Calculus: Chain Rule and Jacobians, gradient
  • Probability: probability axioms, Bayes rule, random variables, probability distributions
  • Statistics: descriptive stats, inferential stats, sampling and MCMC Methods, statistical tests
  • Optimization: Convex functions and convex optimization problems, duality, and Lagrange Multipliers
  • Information theory: Entropy and Mutual Information, KL Divergence, and Cross-Entropy




Moise Busogi