Tag: AI

  • Part 2: From a Line to a Language Model

    Part 2: From a Line to a Language Model

    TLDR: While learning about transformers, I built with Claude a first-principles interactive tutorial that I think can be helpful to others. Link: https://aabdelfattah.github.io/neural-first-principle/

    In the last few weeks, I went through an interesting learning journey, diving into the transformer architecture. I know it is becoming too boring, everyone is writing about AI here, AI there, but I was always curious since the early emergence of ChatGPT to understand how those simple probabilistic neural networks that I studied in Uni could be capable of producing such coherent sentences.

    I like the first-principles learning approach, as I was learning with Claude and through different sources like Andrej Karpathy’s Deep Dive into LLMs and 3Blue1Brown’s intuitive attention explanation I built up an interactive tutorial for my own use that I found useful to share with a larger audience (side note: It is quite astnoishing to me how those guys can explain in such breadth and depth at the same time, absloute legends). I worked through the progression from simple linear regression to the complex transformer architecture, building bridges in between and connecting the dots.

    Who is it for?

    The tutorial is intended for anyone a bit curious and has some level of Math knowledge (not quite necessary though). I tried to keep it high-level, trying to capture the intuition but with a good level of details to reward the readers who do a second pass.

    How to read it?

    Start with the introduction first, and try to grasp the meaning. Through the chapters, it is supposed to be a consistent story. Then, please try the interactive visuals, and afterwards, read the details.

    Too long introduction; let’s get to the tutorial:

    Click through the seven stages above — or open the full tutorial in a new tab.

  • From Lines to Neurons

    From Lines to Neurons

    A statement that has been circulating on Twitter recently:

    “Thinking can be outsourced. Understanding cannot.”

    An understanding of Engineering fundamentals will remain valuable, even as LLMs’ growing ability to generate plausible code and text.

    Start with linear regression

    Neural networks approximate functions.

    Linear regression is the simplest mathematical approach to achieve the same. A cloud of points and an infinite number of lines. What is the best-fitting line among them? The one where the distance between its predictions and the actual points is minimum.

    That distance has a name: the loss. Linear regression fits the line by minimizing it. Pick weights → measure the loss → adjust → repeat.

    Neural networks build on the same concept.

    A neuron is just a line, bent

    Linear regression can only fit straight lines. Real data isn’t straight.

    A neuron takes w · x + b and runs the result through a curved function (a sigmoid). That single bend is the entire upgrade. Now you can fit S-curves. Stack two neurons and you can fit bumps. Stack thousands across many layers and you can fit anything.

    The training algorithm is the same as linear regression: gradient descent. The only addition is backpropagation — the chain rule from calculus, applied backwards to figure out how each weight contributes to the loss.

    Scale this up

    Stack thousands of these neurons. Add an attention mechanism so each neuron dynamically picks which inputs from the previous layer to care about. Train on the internet.

    You get GPT. We will dive deeper into that in my next post.


    Full interactive guide with the live code and a training playground:

    Neural Networks: From Lines to Neurons

  • Is Software Solved?

    Is Software Solved?

    In 2011, Marc Andreessen declared that software is eating the world. Fifteen years later, the question has flipped: something is eating software itself, and that something is AI.

    Writing code has become too cheap; this is a fact. Code is a formal language, and LLMs — since the 2017 “Attention Is All You Need” paper — have learned the whole internet by heart. They can dream text, which sounds convincing and is not wrong in most cases, and they can speak multiple languages.

    I, a PM who hadn’t written real code for years, worked through the last few months on 4 projects at different complexity levels. 2 of them were within my company, and one is running just fine in the production pipeline.

    The demand for code is clearly going up, and economics tells us why. When a good becomes cheaper, demand for it increases. As an analogy: cameras got radically cheaper and better; everyone has one; photos are infinite and near-free. People are still buying more cameras and taking more photos — and good photographers are more valuable than ever.

    Software ≠ Code. Software is a solution that runs on a computer; this solution needs to have a purpose. Identifying the purpose is hard; often, neither the customer nor the dev teams know what to build. Along the way, frameworks and best practices were developed to define clearly and, most importantly, align on what to build.

    The solution then needs to be coded in a way that the computer understands; it can be done in spaghetti style or using clean code. In both cases, the solution will serve its purpose, but in the former case, it will fail the test of time quickly, depending on how bad the code is. Smart engineers spent decades inventing ways to make solutions maintainable: testing, design patterns, static code analysis — you name it.

    All those activities and artifacts are complementary to the code; code cannot live in a vacuum. To have meaningful software, they are needed as well. They didn’t get cheaper, though. Copilot won’t help you build the right thing faster. On the contrary, it helps you build many wrong things faster. Call it a diversion, or call it fast prototyping.

    Those complements will become more expensive in my opinion, or at least economic theory tells us so. If there is a huge demand for cars in one city, tires become more expensive, not cheaper. Morgan Stanley analysts recently made a similar point: as code generation gets cheap, the scarce inputs shift elsewhere — to data, to judgment, to the specification of what is actually worth building.

    We are in the middle of the euphoria and shock, still wrapping our heads around how the computer can one-shot code that used to take weeks to write. But that’s not software development. The craft still matters, and will matter even more.