Andrej Karpathy and the famous “Let’s build GPT” video
If you are learning AI, especially Large Language Models, Andrej Karpathy’s video “Let’s build GPT: from scratch, in code, spelled out” is one of the most valuable resources you should not miss.
This is one of the most popular videos on Karpathy’s YouTube channel, with more than 7 million views. The video is nearly two hours long, but instead of giving only a high-level explanation, Karpathy walks viewers through the process of building a small GPT model from scratch, step by step, directly in code.
What makes this video special is that it does not simply answer the question: “What is GPT?”
More importantly, it helps learners understand: “How is GPT actually built?”
Why this video is worth watching for AI learners
In the current AI boom, many people start by using tools like ChatGPT, Claude, Gemini, or other generative AI platforms. However, if you only use AI tools without understanding what happens behind the scenes, it is difficult to truly understand how large language models work.
Karpathy’s tutorial is especially useful for learners who prefer a more technical and “hardcore” approach: reading code, understanding architecture, and seeing how data flows through each layer of the model.
Instead of treating GPT as a black box, the video guides learners through important concepts such as:
- How a language model predicts the next token.
- How text data is processed before being passed into the model.
- How self-attention works.
- How the Transformer architecture becomes the foundation of GPT.
- How a small model can learn from data and generate text.
This is why the video is often considered an advanced introduction for anyone who wants to understand LLMs at a deeper technical level.
Who is Andrej Karpathy?
Andrej Karpathy is one of the most influential figures in modern artificial intelligence. He was a founding member of OpenAI and later served as Director of AI at Tesla, where he worked deeply on computer vision and Autopilot systems.
Karpathy is also well known for his educational work, including his contributions to the famous Stanford deep learning course CS231n. More recently, he has continued to attract attention in the AI community after joining Anthropic, the company behind the Claude family of AI models.
Because of this background, his AI educational content is highly respected. It comes from someone who has both strong research foundations and real-world experience at some of the world’s leading AI organizations.
What does the “Let’s build GPT from scratch” video teach?
The main goal of the video is to build a small GPT-style model from scratch. Learners are guided through the process of creating a language model, starting from simple concepts and gradually moving toward the Transformer architecture.
Some of the most important parts of the video include:
1. Understanding language models at a basic level
Karpathy begins by explaining how a language model learns from text data. At its core, GPT is trained to predict the next token based on the context that comes before it.
This idea may sound simple, but it is the foundation of many modern generative AI systems. When trained on large amounts of data with a powerful architecture, a language model can generate natural text, answer questions, write code, summarize content, and perform many other complex language tasks.
2. Building the model step by step in code
The most valuable part of the video is that Karpathy does not only explain GPT in theory. He writes the code directly, allowing learners to see how each part of the model is constructed.
Starting from a simple model, the tutorial gradually expands into more advanced components. This makes the learning process easier to follow because viewers are not forced to understand the entire Transformer architecture all at once.
3. Explaining self-attention and Transformer architecture
Self-attention is one of the most important concepts for understanding GPT. It allows the model to decide which parts of the input context are most relevant when predicting the next token.
In the video, Karpathy explains self-attention by connecting it directly to code and data. This makes the concept easier to understand compared to learning it only through mathematical formulas.
After that, components such as multi-head attention, feed-forward layers, residual connections, and layer normalization are combined to form the Transformer architecture.
4. Connecting the tutorial to GPT, ChatGPT, and modern LLMs
Although the model built in the video is much smaller than systems like ChatGPT, it still helps learners understand the core principles behind GPT-style models.
In other words, this video will not immediately teach you how to build ChatGPT at production scale. However, it gives you the mental framework needed to understand how large language models work: data, tokens, probabilities, attention, Transformers, training, and text generation.
Who should watch this video?
This video is especially suitable for:
- People who already know basic Python and want to learn AI more deeply.
- Students in computer science, data science, or machine learning.
- Developers who want to understand LLMs instead of only calling APIs.
- Learners studying Transformers, GPT, ChatGPT, or large language models.
- Anyone who wants to code a small AI model to understand how it works internally.
However, this video may not be ideal for complete beginners who have no programming background or no basic understanding of Python, tensors, neural networks, and machine learning. If you are just starting out, it is better to first learn Python, neural networks, gradient descent, and PyTorch before going through this tutorial.
Why this video still matters in a fast-moving AI industry
The AI field changes very quickly. New models appear, new tools are released, and APIs are updated constantly. However, fundamental concepts such as language modeling, self-attention, and Transformer architecture remain extremely important.
That is why Karpathy’s video continues to have long-term value. It does not simply teach learners how to run an existing library. It helps them understand the internal structure of a GPT model.
Once you understand this foundation, it becomes much easier to learn more advanced topics such as fine-tuning, RLHF, inference optimization, RAG, agentic AI, and model pretraining.
Should you learn GPT from scratch?
Yes — if your goal is to understand AI at a technical level.
Today, many people can use ChatGPT or call an API to build AI-powered applications. But if you want to go further — for example, to research LLMs, optimize models, build serious AI products, or understand how AI systems actually work — then learning GPT from scratch is a worthwhile step.
Andrej Karpathy’s video is one of the best resources for this learning path because it combines theory, code, and technical intuition in a practical way.
Conclusion
“Let’s build GPT: from scratch, in code, spelled out” is more than just an AI programming tutorial. It is a foundational lesson that helps learners understand how a GPT model can be built from the most basic components.
For anyone learning AI, LLMs, or the technology behind ChatGPT, this video is absolutely worth watching. It will not turn you into an AI expert in two hours, but it will help you see GPT differently: not as magic, but as a system with architecture, logic, and code that can be built step by step.
Video link:
Let’s build GPT: from scratch, in code, spelled out. — YouTube


