Mathematics for Machine Learning

Mathematics for Machine Learning

Maths is Essential for Machine Learning???

Math and code are highly intertwined in machine learning workflows. Code is often built directly from mathematical intuition, and it even shares the syntax of mathematical notation. In fact, modern data science frameworks (e.g. NumPy) make it intuitive and efficient to translate mathematical operations (e.g. matrix/vector products) to readable code.

I encourage you to embrace code as a way to solidify your learning. Both math and code depend on precision in understanding and notation. For instance, practicing the manual implementation of loss functions or optimization algorithms can be a great way to truly understanding the underlying concepts. The cynical view of machine learning research points to plug-and-play systems where more compute is thrown at models to squeeze out higher performance. In some circles, researchers remain skeptical that empirical methods lacking in mathematical rigor (e.g. certain deep learning methods) can carry us to the holy grail of human-level intelligence.

Recently, there has been an upsurge in the availability of many easy-to-use machine and deep learning packages such as scikit-learn, Weka, Tensorflow, R-caret etc. Machine Learning theory is a field that intersects statistical, probabilistic, computer science and algorithmic aspects arising from learning iteratively from data and finding hidden insights which can be used to build intelligent applications. Despite the immense possibilities of Machine and Deep Learning, a thorough mathematical understanding of many of these techniques is necessary for a good grasp of the inner workings of the algorithms and getting good results.

To inform this section, To figure out where math was most helpful in debugging their systems. The following are questions that engineers found themselves answering with mathematical insights.

  • What clustering method should I use to visualize my high-dimensional customer data?

            ○ Approach: PCA vs. tSNE

  • How should I calibrate a threshold (e.g. confidence-level 0.9 vs. 0.8?) for “blocking” fraudulent user transactions?

            ○ Approach: Probability calibration

  • What’s the best way to characterize the bias of my satellite data to specific regions of the world (Silicon Valley vs. Alaska)?

○ Approach: Open research question. Perhaps, aim for demographic parity?

Generally, statistics and linear algebra can be employed in some way for each of these questions. However, to arrive at satisfactory answers often requires a domain-specific approach.

Why the mathematics of Machine Learning is important

There are many reasons why the mathematics of Machine Learning is important and I will highlight some of them below:

  1. Selecting the right algorithm which includes giving considerations to accuracy, training time, model complexity, number of parameters and number of features.
  2. Choosing parameter settings and validation strategies.
  3. Identifying under-fitting and over-fitting by understanding the Bias-Variance trade off.
  4. Estimating the right confidence interval and uncertainty.

What Level of Mathematics Do You Need?

The main question when trying to understand an interdisciplinary field such as Machine Learning is the amount of mathematics necessary and the level of mathematics needed to understand these techniques. The answer to this question is multidimensional and depends on the level and interest of the individual. Research in mathematical formulations and theoretical advancement of Machine Learning is ongoing and some researchers are working on more advance techniques. I will state what I believe to be the minimum level of mathematics needed to be a Machine Learning Scientist/Engineer and the importance of each mathematical concept.

math data

  1. Linear Algebra: In ML, Linear Algebra comes up everywhere. Topics such as Principal Component Analysis (PCA), Singular Value Decomposition (SVD), Eigende composition of a matrix, LU Decomposition, QR Decomposition/Factorization, Symmetric Matrices, Orthogonalization & Orthonormalization, Matrix Operations, Projections, Eigenvalues & Eigenvectors, Vector Spaces and Norms are needed for understanding the optimization methods used for machine learning. The amazing thing about Linear Algebra is that there are so many online resources.
  2. Probability Theory and Statistics: Machine Learning and Statistics aren’t very different fields. Actually, someone recently defined Machine Learning as ‘doing statistics on a Mac’. Some of the fundamental Statistical and Probability Theory needed for ML are Combinatorics, Probability Rules & Axioms, Bayes’ Theorem, Random Variables, Variance and Expectation, Conditional and Joint Distributions, Standard Distributions (Bernoulli, Binomial, Multinomial, Uniform and Gaussian), Moment Generating Functions, Maximum Likelihood Estimation (MLE), Prior and Posterior, Maximum a Posteriori Estimation (MAP) and Sampling Methods.
  3. Multivariate Calculus: Some of the necessary topics include Differential and Integral Calculus, Partial Derivatives, Vector-Values Functions, Directional Gradient, Hessian, Jacobian, Laplacian and Lagragian Distribution.
  4. Algorithms and Complex Optimizations: This is important for understanding the computational efficiency and scalability of our Machine Learning Algorithm and for exploiting sparsity in our datasets. Knowledge of data structures (Binary Trees, Hashing, Heap, Stack etc), Dynamic Programming, Randomized & Sublinear Algorithm, Graphs, Gradient/Stochastic Descents and Primal-Dual methods are needed.
  5. Others: This comprises of other Math topics not covered in the four major areas described above. They include Real and Complex Analysis (Sets and Sequences, Topology, Metric Spaces, Single-Valued and Continuous Functions, Limits, Cauchy Kernel, Fourier Transforms), Information Theory (Entropy, Information Gain), Function Spaces and Manifolds.

Learning Math as per your need

Diving headfirst into a machine learning workflow, you might find that there are some steps that you get stuck at, especially while debugging. When you’re stuck, do you know what to look up? How reasonable are your weights? Why isn’t your model converging with a particular loss definition? What’s the right way to measure success? At this point, it may be helpful to make assumptions about the data, constrain your optimization differently, or try different algorithms.
Often, you’ll find that there’s mathematical intuition baked into the modeling/debugging process (e.g. selecting loss functions or evaluation metrics) that could be instrumental to making informed, engineering decisions.

Machine learning focuses more on the concepts of Linear Algebra as it serves as the main stage for all the complex processes to take place (besides the efficiency aspect). On the other hand, multivariate calculus deals with the aspect of numerical optimization, which is the driving force behind most machine learning algorithms.
Mathematics in data science and machine learning is not about crunching numbers, but about what is happening, why it’s happening, and how we can play around with different things to obtain the results we want.

The main purpose of this blog post is to give a well-intention advice about the importance of Mathematics in Machine Learning and the necessary topics and useful resources for a mastery of these topics.

I hope this post helps you to understand the “Importance of Mathematics behind the Machine Learning”.

Keep learning 🙂

3 thoughts on “Mathematics for Machine Learning”

Leave a Reply

Your email address will not be published. Required fields are marked *