Recreating a Machine Learning Master’s degree with Online Courses

Recreating a Machine Learning Master’s degree with Online Courses
Photo by MD Duran on Unsplash

A Bachelor’s study usually takes six semesters; a Master’s study takes four. But this is only an outline. I’ve witnessed people doing their BA in three semesters and some taking nine semesters. Sometimes there are so many exciting courses that you voluntarily stay longer to learn it all. Therefore, I’ve loosely structured the recreated curriculum into four semesters (as opposed to six semesters for a custom-made Bachelor's degree). The primary focus is on Machine Learning and Deep Learning.

First semester

Artificial Intelligence

As the first course in your curriculum, I recommend taking AI for Everyone. It’s a low-level course and an ideal start to learn about the common terminology, the capabilities of AI systems, how to handle ML projects, and ethical aspects. While it’s a non-technical one, I still propose to take this course early on. This way, you’ll get comfortable with the concepts and will be well prepared for the upcoming classes. As an alternative, you can also do IBM’s Introduction to AIcourse, which has a similar focus.

Mathematics for Machine Learning

Machine Learning is a math-heavy field. While packages like TensorFlow and PyTorch make it easy to create, train, and deploy neural networks, you are in a better position if you know the concepts behind them. The required math is not rocket science but builds on linear algebra and probability theory. Also, once you get to examine custom datasets, you will often find yourself working with a statistical description.

Courses can make the learning process easier and more accessible. Take the Mathematics for Machine Learningspecialization, for example. The complete package costs you a small fee, but you can audit the individual courses free of charge. The curriculum covers discrete mathematics, graph theory, derivatives, linear algebra, and probability theory. The chapters are accompanied by practical exercises in python, too.

Machine Learning

The Stanford Machine Learning course gives a broad introduction to the topics of Machine Learning. It is taught by Andrew Ng, one of the most prominent figures of Deep Learning research. The course starts with supervised learning, which includes classic ML techniques and neural networks alike. Next, it covers unsupervised learning, which for example, is used in clustering algorithms. Lastly, it also goes over best practices used in Machine Learning. Real-world examples accompany all this. This course is available either with certification or free of charge, but without a certificate.

Second semester

Deep Learning

To get an in-depth introduction to the field of Deep Learning, you can select between three courses.

The first one is CS230 Deep Learning and is offered by Stanford University. The lecture videos are available at no charge and cover a broad range of topics. They include adversarial attacks, interpretability, and how to read papers.

The second course, Deep Learning Specialization, also taught by Andrew Ng, is more practice-focused. In this course, you will actively code neural networks, train them, and change hyperparameters. The networks covered include CNNs, LSTMs, and word-embedding techniques. All this is done with the help of TensorFlow.

The third course, Introduction to Deep Learning, is offered by MIT. I heard this lecture last year and found it very well done and thought out. It is taught by two Ph.D. researchers who cover sequence modelling, computer vision, generative networks, and also reinforcement learning. Apart from this standard schedule, the presentations include talks by invited speakers from Nvidia, Google, and similar companies.

Full Stack Deep Learning

The Full Stack Deep Learning course is offered by UC Berkeley as an official course, but all material is available at no charge. To do this course, you should bring some experience with both python and model training. The lectures then focus on the production side of Deep Learning: estimating project costs, selecting compute infrastructure, and deploying models at scale.

The course begins with a review of the fundamentals (CNNs, RNNs, Transformers) and then proceeds to cover project management and experimentation topics. This includes managing data, monitoring, ethical considerations, and teamwork. A majority of these lectures is accompanied by practical lab session. If you don’t have the time for the complete lecture video or only want to review select parts, you can consult the detailed notes written below each video.

Generative Networks

Since the “Generative Adversarial Nets” paper was published back in 2014, generative techniques have become very popular. Many improvements have since been proposed to the original GAN structure, but most still rely on the idea of a two-player game. (See here for my thoughts on this progress).

In this setting, one player (one network) generates artificial samples. The quality of the samples is judged by the second player (another network). This second player sees both real and generated data samples and learns to distinguish them. With the feedback of the second player, commonly termed discriminator, the first player, called the generator, learns to produce more realistic samples. During the training, both networks get better: The generator creates better artificial samples, and the discriminator — the adversarial (hence the name) — becomes better at detecting fake samples.

A course that teaches this is the Generative Adversarial Networks Specialization. In this course, you learn the fundamentals by building basic architectures and then improve your models. The complete specialization is fee-based, but you can audit the individual sub-courses free of charge.

Third semester

Reinforcement Learning

The Reinforcement Learning lecture is a teaching partnership between DeepMind and UCL and covers the core techniques of RL. Among these techniques are Markov Decision Processes, a fundamental way of modelling an environment. Having such a model, you can use value functions to create an (optimal) policy. With this optimal policy, you and your agent can solve a goal in the best possible way. If that sounds interesting to you, then head over to the playlist or visit an alternative lecture series (also by DeepMind) here.

Natural Language Processing

The field of Natural Language Processing has advanced enormously. Starting from simple one-hot encoded vectors for classification, we now witness the widespread use of attention-based architectures. This advance has not happened overnight; there are a couple of fascinating intermediate steps, which are covered in detail in this deep dive:

Pascal Janetzky - Private Site Access

Take embeddings as an example. Instead of encoding a word (or anything textual) as a single integer, you use the context and world knowledge to increase the informative value of its representation. For the term “tree,” this means we no longer encode it as 17 (an arbitrary index here) but represent it as a vector of float values. This vector captures more information about “tree”: That it cleanses the air, that it gives shelter to animals of all kinds, that it provides shade.

But embeddings are only one technique and are complemented (or superseded) by other ones. For example, to learn more about NLP, you can take the Natural Language Processing specialization. It begins by covering classic NLP techniques, advances to deep-learning-based methods, and lastly, covers attention models. As with the other specializations, you can audit the individual courses free of charge to get a first-hand impression.

Computer Vision

The ImageNet dataset remains a popular benchmark for the classification ability of neural networks. It is often described as a crucial factor for the advances of computer vision technologies. Back in 2010, when the ImageNet Large Scale Visual Recognition Challenge was hosted the first time, the classification accuracy was around 50 %. Nowadays, we are at 90 % accuracy, a truly impressive advance.

There are a couple of things that enabled this progression:

  1. Model training is more comfortable now.
  2. Data processing is more accessible.
  3. Researchers utilized many image augmentation techniques throughout the years.

To learn more about the basics, you can head over to the Computer Vision Basics. Once you have finished this course, you can proceed to the Advanced Computer Vision with TensorFlow course. This course covers image classification, object detection, segmentation, class-activation maps, and many more. Also, have a look at TensorFlow Lucid to dive deeper into neural networks for image-related tasks.

Fourth semester


Machine Learning and Computer Science are not lonely fields, standing separate from others. Instead, like other Sciences, both unfold their power when combined with real-world problems.

Think about protein folding: Given a sequence of amino acids, it’s unclear how they fold into a three-dimensional structure. And proteins fulfil many important tasks: they transport material, receive signals from the cell’s surface, are part of hormones.

The Bioinformatics Specialization, offered by UC San Diego, covers this — and many more. It begins with introducing DNA, sequencing, comparing genomes, and finishes with genome sequencing on actual data. It is an extensive specialization, featuring seven individual courses. I think using computers to run the sequencing algorithms makes this field so interesting to explore.

Medical AI

The previous course features Bioinformatics. Consequently, we’ll now also cover medical informatics. The Introduction to Healthcare course, offered by Stanford and auditable free of charge, is a good resource to explore the healthcare system. While it focuses on the United States, most of the knowledge applies generally. After all, you need medical personal in any country.

After you have gained an overview of the system, you can proceed with the AI for Medicine Specialization to gain practical experience. In this three-part program, you’ll learn to classify diseases with CNNs, predict the likelihood of injury, extract data from health records, and assess the effectiveness of treatments.

Pascal Janetzky

Pascal Janetzky

Avid reader & computer scientist