Note: As an Amazon Associate I earn from qualifying purchases. I get commissions for purchases made through links in this post. See a more full disclaimer here
Another Note: This series assumes you know some high school maths to get the most from the notes takeaways sections.
Bruh, Why are you starting from scratch? There’s PyTorch, Tensorflow, etc…
Around Thanksgiving 2019, I was able to visit my grandparents and talk with my grandfather. That conversation was so life-giving in so many ways and will stay in my memory. One of the golden nuggets of wisdom he told me was:
“If you wish to get good at anything, learn the theory behind what you seek to learn/master”.
This wisdom has stayed with me to this day, including in starting this Deep Learning (DL) journey. I am more much more interested in learning the fundamentals and sharing what I’ve learned with others (in my language of course), than diving headfirst into the most popular frameworks. While I am not against frameworks (and will use them later), this is the route I have chosen for myself. In this scenario, I’d rather learn what makes the car drive, then to just get in the car and drive 🤷🏾♂️. To continue with the metaphor, you can’t learn about the car without the manual, so below is my chosen manual 👇🏾.
Resource used during this journey
Out of the multitude of books and online learning resources, I’ve picked this as my beginning guide into the DL world. This pick was based on reading the trial version and being impressed with Andrew Trask’s articulation of complex topics in very easy-to-understand language. For someone starting on the theory side of DL, I view this as very important. Therefore, the posts in this series will be more-so a companion alongside the book. It’s recommended that you purchase the book as posts and Jupyter notebooks (more on those here, if you are unfamiliar) that follow will be derived from the book’s content.
If you are looking to purchase the ebook version, which is relatively cheaper, go to Manning Publications or if you like having physical copies of the book check out the Amazon link below 👇🏾:
Without further ado, let’s get started 😃
Jupyter Notebook & Major Takeaways From Chapter 2 & 3
Seeing as the book is more in-depth, the takeaways in the series will be a summarization of what I took from the chapters (and other thoughts) and the link to my Jupyter notebook at the end. My Jupyter notebooks go deeper into the concepts explained in the book with code and pictures/diagrams. Thus, these blog posts will be over-arching in scope (big-picture) and have the ironing out of more difficult topics in DL. So let’s dive into some big takeaways from Chapter 2 and 3 🏊🏾♂️
Chapter 2 Summaries/Notes
- Artificial Intelligence (AI) and DL are not synonymous! Rather DL, and even greater Machine Learning (ML), is a subset of ML which is a subset of AI.
- Supervised Machine Learning is just a fancy word for “taking /what you know/ as input and quickly transforming it into /what you want to know/. 
- Unsupervised Machine Learning is wanting to know how your input data relates to each other without having prior knowledge/labeling of what the data is exactly.
Chapter 3 Summaries/Notes
- A Neural Network helps you make a prediction based on the input values given and their corresponding weights. For a simple example, your favorite memorized math formula from middle school and high school, y = mx + b would be considered a “neural network”.
- y -> is the predicted value
- m -> would be the weight/“knob” which to alter the predicted value of the input. Andrew says, “Another way to think about a neural network’s weight value is as a measure of sensitivity between the input of the network and its prediction. If the weight is very high, then even the tiniest input can create a really large prediction! 
- x -> your input values
- b -> bias (chapter 3 doesn’t discuss this at all, but I found out this is bias from Becoming Human)
- When a network has multiple inputs, the weighted sum is multiplication of the weights to their corresponding inputs and then the summation of those values. This weighted sum is the y when you have multiple inputs (y = w1x1 + w2x2 + w3x3)
- Multiple inputs and multiple output:
- y1 = w1x1 + w1x2 + w1x3
- y2= w2x1 + w2x2 + w2x3
- y3= w3x1 + w3x2 + w3x3
- y1 y2, y3 correspond to your new outputs respectively. As you can see above, the x values never change but the set of weights do.
- Stacked Neural Networks -> you’re making predictions on your predictions, with new sets of weights 🤯 (following the previous bullet):
- newY1 = newW1y1 + newW1y2 + newW1y3
- newY2 = newW2y1 + newW2y2 + newW2y3
- newY3 = newW3y1 + newW3y2 + newW3y3
- newY1, newY2, newY3 would be your final values in neural network while y1, y2, y3 would be considered your “hidden” values
Like what was stated in the beginning, the Jupyter notebook that will be attached to each post in this series will go more in-depth with code, diagrams, and my explanation of what the book is covering. Check out the notebook below and leave any comments on Kaggle 👇🏾
Next Sunday will be an overview of Gradient Descent!
Until next time ✌🏾
-  “Fundamental Concepts: How Do Machines Learn?” /Grokking Deep Learning/, by Andrew W. Trask, Manning Publications, 2019, p. 95.
-  “Introduction to neural prediction: forward propagation” /Grokking Deep Learning/, by Andrew W. Trask, Manning Publications, 2019, p. 126.