Tag: machine-learning

  • Part 2: From a Line to a Language Model

    Part 2: From a Line to a Language Model

    TLDR: While learning about transformers, I built with Claude a first-principles interactive tutorial that I think can be helpful to others. Link: https://aabdelfattah.github.io/neural-first-principle/

    In the last few weeks, I went through an interesting learning journey, diving into the transformer architecture. I know it is becoming too boring, everyone is writing about AI here, AI there, but I was always curious since the early emergence of ChatGPT to understand how those simple probabilistic neural networks that I studied in Uni could be capable of producing such coherent sentences.

    I like the first-principles learning approach, as I was learning with Claude and through different sources like Andrej Karpathy’s Deep Dive into LLMs and 3Blue1Brown’s intuitive attention explanation I built up an interactive tutorial for my own use that I found useful to share with a larger audience (side note: It is quite astnoishing to me how those guys can explain in such breadth and depth at the same time, absloute legends). I worked through the progression from simple linear regression to the complex transformer architecture, building bridges in between and connecting the dots.

    Who is it for?

    The tutorial is intended for anyone a bit curious and has some level of Math knowledge (not quite necessary though). I tried to keep it high-level, trying to capture the intuition but with a good level of details to reward the readers who do a second pass.

    How to read it?

    Start with the introduction first, and try to grasp the meaning. Through the chapters, it is supposed to be a consistent story. Then, please try the interactive visuals, and afterwards, read the details.

    Too long introduction; let’s get to the tutorial:

    Click through the seven stages above — or open the full tutorial in a new tab.

  • From Lines to Neurons

    From Lines to Neurons

    A statement that has been circulating on Twitter recently:

    “Thinking can be outsourced. Understanding cannot.”

    An understanding of Engineering fundamentals will remain valuable, even as LLMs’ growing ability to generate plausible code and text.

    Start with linear regression

    Neural networks approximate functions.

    Linear regression is the simplest mathematical approach to achieve the same. A cloud of points and an infinite number of lines. What is the best-fitting line among them? The one where the distance between its predictions and the actual points is minimum.

    That distance has a name: the loss. Linear regression fits the line by minimizing it. Pick weights → measure the loss → adjust → repeat.

    Neural networks build on the same concept.

    A neuron is just a line, bent

    Linear regression can only fit straight lines. Real data isn’t straight.

    A neuron takes w · x + b and runs the result through a curved function (a sigmoid). That single bend is the entire upgrade. Now you can fit S-curves. Stack two neurons and you can fit bumps. Stack thousands across many layers and you can fit anything.

    The training algorithm is the same as linear regression: gradient descent. The only addition is backpropagation — the chain rule from calculus, applied backwards to figure out how each weight contributes to the loss.

    Scale this up

    Stack thousands of these neurons. Add an attention mechanism so each neuron dynamically picks which inputs from the previous layer to care about. Train on the internet.

    You get GPT. We will dive deeper into that in my next post.


    Full interactive guide with the live code and a training playground:

    Neural Networks: From Lines to Neurons