<p>I've been watching my nephew learn and grow. He's six now, lives in the middle east while I'm in California, so I only see him a few times a year. But every visit, I'm struck by how much he's grown. Not just physically, but in how he thinks, communicates, and understands the world. The accumulation is remarkable, especially considering how little kids seem to remember day-to-day.</p>
<p>This shouldn't work. In machine learning terms, children have limited working memory, inconsistent training data, and what looks like catastrophic forgetting. Yet they learn language, social norms, abstract concepts, and motor skills with a speed and depth that outpaces any AI system we've built.</p>
<p>What are children exhibiting about learning that we don't fully understand?</p>
<h2>The Puzzle of Learning Without Memory</h2>
<p>Here's what strikes me as strange: children's working memory is limited. They can't recall specific training examples the way our models can access their parameters. Yet they learn to imitate language, movement, expressions, and social norms with remarkable speed.</p>
<p>When I think about how transformers work, it's different. They compress vast amounts of training data into billions of parameters. They have context windows that span thousands of tokens. But they're also probabilistic, not deterministic. The same input produces similar but not identical outputs. Still, there's a kind of stability there. The knowledge is encoded, accessible, reliable.</p>
<p>But they also never really learn. Not in the way a child does.</p>
<p>Here's what I mean: you can have hundreds of conversations with ChatGPT about your life, your preferences, your way of thinking. Each conversation, it has a cheat sheet about you from context. But it doesn't really know you deeply. It hasn't learned you over time the way a friend would. Tomorrow's conversation starts fresh. The accumulated understanding doesn't persist.</p>
<p>A child learning &quot;hot&quot; from touching a stove once doesn't just memorize that instance. He builds a concept that generalizes to candles, ovens, steam, anything that gives off heat. One example becomes a principle.</p>
<p>What's the difference? I'm not entirely sure, but I have a hypothesis: maybe memory and learning are inversely related in some fundamental way.</p>
<h2>Learning vs. Memorization</h2>
<p>I used to think learning was just sophisticated memorization. Store enough patterns, retrieve the right one at the right time, done. That's essentially how transformers work: compress vast amounts of text into parameters, then predict what comes next based on stored patterns.</p>
<p>But watching how children learn has made me question this. They don't store and retrieve. They adapt and generalize. When a child learns &quot;dog,&quot; he doesn't just memorize specific instances of dogs. He builds an abstract concept that lets him recognize dogs he's never seen, in contexts he's never encountered.</p>
<p>Can LLMs do this? In a sense, yes. They can recognize novel dogs from their compressed understanding of &quot;dogness&quot; across millions of training examples. But there's a difference in how that understanding forms. The LLM needs vast data to compress into statistical patterns. The child needs a handful of examples to extract an abstract concept.</p>
<p>Maybe the key is in the forgetting. Maybe forgetting isn't a bug, it's a feature. It forces the brain to extract what matters and discard what doesn't. To compress experience into the underlying fundamentals behind a concept, rather than store examples.</p>
<p>LLMs do compression too, but it's a different kind. They compress millions of documents into a fixed parameter space. Children compress lived experience into concepts, relationships, and rules. The former is statistical abstraction. The latter is conceptual abstraction. They might not be the same thing.</p>
<h2>What It Means to Learn</h2>
<p>Before going further, maybe it's worth defining what I mean by &quot;learning.&quot;</p>
<p>I mean: the ability to integrate new information into your worldview in a way that changes how you understand and interact with the world. Not just adding facts to memory, but updating your mental models. Building new connections. Seeing patterns you couldn't see before.</p>
<p>By this definition, memorization isn't learning. Reciting a poem doesn't mean you've learned about poetry. But understanding how metaphor works, and being able to create your own, does.</p>
<p>True learning is generative. It lets you do things you couldn't do before, think thoughts you couldn't think before. It's fundamentally transformative.</p>
<p>Current AI systems are incredible at the memorization side. They can store and retrieve vast amounts of information. But the transformative, generative aspect? That's harder to see. They can combine things in novel ways, sure. But can they genuinely update their understanding based on new evidence? Not really. Not yet.</p>
<h2>What Is Unique About Human Learning?</h2>
<p>I've been reading about how different species learn. Many animals can learn associations: if I press this lever, I get food. Some can learn sequences: do A, then B, then C to achieve a goal. A few can learn through observation: watch another do it, then replicate.</p>
<p>Humans do all of this, but we also do something else. We learn meta-strategies. We learn how to learn. We figure out that trying different approaches works better than repeating the same failed strategy. We develop curiosity as a learning tool. We ask &quot;why&quot; and &quot;what if.&quot;</p>
<p>This feels connected to what Daniel Kahneman talks about in &quot;Thinking, Fast and Slow&quot;: the difference between System 1 (fast, intuitive, automatic) and System 2 (slow, deliberate, logical) thinking. Current AI is mostly System 1. It pattern-matches incredibly well. But it doesn't have the reflective, metacognitive layer that lets you step back and say, &quot;Wait, my approach isn't working. Let me try thinking about this differently.&quot;</p>
<p>I wonder if this is related to the memory question. Maybe true learning requires being able to forget details while retaining structures. And maybe that's only possible when you have limited memory that forces you to be selective about what you keep.</p>
<h2>The Fundamental Limitation of Current AI</h2>
<p>Here's what I find frustrating about current architectures: they're limited by their strengths. ChatGPT, Claude, Gemini are incredibly capable because they've compressed huge amounts of knowledge into their parameters. But that's also why they can't learn new things after training.</p>
<p>They know what they know. You can give them new information in context, and they'll use it. But they don't integrate it into their understanding the way humans do. Tomorrow, when you start a new conversation, they've &quot;forgotten&quot; everything you told them yesterday (unless it's explicitly added to context).</p>
<p>This is the opposite of the child problem. Children have poor short-term memory but excellent long-term learning. LLMs have vast compressed knowledge but no individual learning. Is there a middle ground? Or is there a fundamental trade-off we haven't figured out how to navigate?</p>
<p>Some researchers are working on continual learning, trying to build systems that can update their knowledge without catastrophic forgetting. Recent work is promising. Methods like C-Flat (2024) create flatter loss landscapes that make models more stable during continual learning. VERSE (Banerjee et al., 2024) processes each training example only once while preserving past knowledge through virtual gradients. There's even research on corticohippocampal-inspired hybrid neural networks (Nature Communications, 2025) that emulate dual representations similar to how the brain separates short-term and long-term memory.</p>
<p>But from what I've seen, it's still brittle. The models either forget old things when learning new ones, or they become increasingly rigid and resist updates. Recent surveys on continual learning in the era of foundation models (2025) suggest we're making progress, but we haven't solved the fundamental problem.</p>
<p>I don't think we've found the right formulation yet. We're trying to bolt learning onto architectures designed for memorization. Maybe we need a fundamentally different approach.</p>
<h2>Breaking the Barrier</h2>
<p>What would it mean to build an AI system that truly learns? Not just updates parameters or expands context, but actually evolves its understanding over time the way humans do?</p>
<p>I can imagine a few possibilities, though I'm uncertain about any of them:</p>
<p><strong>Hybrid architectures</strong>: Separate systems for long-term knowledge (transformer-like, stable) and short-term adaptation (something else, dynamic). The stable component provides foundational understanding. The adaptive component learns from recent experience and gradually influences the stable component through some kind of consolidation process. Similar to how human memory works with working memory, short-term memory, and long-term memory as distinct systems.</p>
<p><strong>Embodied learning</strong>: Maybe the key is that children learn through interaction with a physical world that has consistent rules. They get immediate feedback. They can run experiments. Current LLMs learn from static text, which is just descriptions of the world, not the world itself. Perhaps true learning requires grounding in consistent, physical reality.</p>
<p><strong>Meta-learning architectures</strong>: Systems that don't just learn patterns in data, but learn strategies for learning. They'd need some way to evaluate their own learning process and adapt it. This feels closer to the human metacognitive ability, but I have no idea how to implement it.</p>
<p><strong>Embracing forgetting</strong>: What if instead of trying to prevent catastrophic forgetting, we designed systems that strategically forget? Keep only compressed abstractions, discard specifics. Force the system to build hierarchical representations because it literally can't store everything. This is hand-wavy, but the intuition is that forgetting creates pressure to extract principles.</p>
<p>None of these feel quite right to me. They're educated guesses, not solutions. I suspect the answer involves something we haven't thought of yet.</p>
<h2>Systems That Already Learn?</h2>
<p>Here's a question that makes me uncertain about my entire framing: don't we already have systems that can continuously learn, evolve, and grow?</p>
<p>Deployed LLMs do get updated. ChatGPT today isn't the same as ChatGPT at launch. The models are retrained on new data, fine-tuned based on user feedback, improved through RLHF. Isn't that learning?</p>
<p>Maybe. But it feels different from what I mean. It's learning at the species level, not the individual level. ChatGPT as a product evolves, but my particular instance of ChatGPT doesn't learn from my conversations. It's more like evolution than learning: new generations incorporate adaptations, but individuals stay fixed.</p>
<p>Human learning is individual and continuous. I learn from every conversation, every experience, every mistake. The learning happens in real-time, not through population-level updates.</p>
<p>Is this distinction meaningful? I think so, but I'm not entirely sure why. There's something about individual, continuous adaptation that feels essential to what I mean by &quot;learning,&quot; even if I can't precisely articulate what that something is.</p>
<h2>The Bigger Question</h2>
<p>This raises a broader question about what we're actually building.</p>
<p>If individual learning is what separates biological intelligence from our current AI systems, then we exist in a strange moment. We have systems that can pass many tests of intelligence. They can write, reason, code, and converse. But they can't grow from those experiences. Each interaction is isolated, forgotten, lost.</p>
<p>We exist in a state of not-knowing. We're surrounded by mystery, complexity, and uncertainty. Our response to this is to learn, to grow, to evolve our understanding.</p>
<p>If you knew everything, would you need to learn? The question feels almost paradoxical. Knowing everything seems theoretically possible, but it would mean you existed in a static, fully-understood universe. Nothing would surprise you. Nothing would require adaptation.</p>
<p>That universe doesn't match our reality. The world is dynamic, complex, and bigger than any individual's understanding. Learning isn't a nice-to-have capability. It's the fundamental response to living in a universe you don't fully comprehend.</p>
<p>This might be what makes humans special. Not that we're smarter than other species (though we are, by most measures), but that we have this profound capacity to learn, to change, to grow in response to the unknown. We can fundamentally alter our understanding, update our beliefs, and evolve our capabilities in ways that go beyond instinct or conditioning.</p>
<p>Current AI systems don't have this. They're frozen snapshots of knowledge. Incredibly useful snapshots, but snapshots nonetheless. They can help us learn, but they can't learn alongside us. Not yet.</p>
<h2>What Would Change if We Solved This?</h2>
<p>I try to imagine what an AI system that truly learns would look like. Not just incremental improvements to current architectures, but something fundamentally different.</p>
<p>It would start knowing very little. Unlike current LLMs, which emerge from training with vast knowledge, this system would begin almost blank. But it would learn quickly from experience, building understanding through interaction.</p>
<p>You could teach it something new, and it would integrate that knowledge into its worldview. Not just add it to context, but actually update its understanding. Tomorrow, it would remember what you taught it yesterday and build on it.</p>
<p>It would make mistakes, notice them, and correct itself. Not through retraining, but through reflection and adaptation in real-time.</p>
<p>It would develop genuine expertise in specific domains through deep engagement, rather than shallow knowledge across everything.</p>
<p>Most intriguingly, it would keep getting better over time. Not because humans updated it, but because it genuinely learned from its experiences.</p>
<p>Is this possible? I don't know. It requires solving problems we don't fully understand: continual learning without catastrophic forgetting, online adaptation without instability, knowledge integration without loss of capabilities, metacognitive awareness of the learning process itself.</p>
<p>These might be tractable engineering challenges. Or they might require fundamental breakthroughs in how we think about intelligence and learning.</p>
<h2>What I'm Still Figuring Out</h2>
<p>I started this piece thinking about children and ended up questioning the entire foundation of how we build AI systems. I'm left with more questions than answers:</p>
<p>Is the lack of true learning in LLMs an architectural limitation or a deeper conceptual problem? Can we patch continual learning onto transformers, or do we need entirely new paradigms?</p>
<p>Why exactly does forgetting seem important for learning? Is it just about computational efficiency, or is there something deeper about being forced to abstract and generalize?</p>
<p>Do we actually want AI systems that continuously learn from interactions? The safety implications are terrifying. A system that updates its values and understanding based on experience could drift in unpredictable directions.</p>
<p>What would it mean for an AI to &quot;understand&quot; that it's learning, the way humans have metacognitive awareness of our own learning process? Is self-awareness of learning a prerequisite for effective learning, or just a side effect?</p>
<p>And the question I can't shake: have we been thinking about AI learning in fundamentally the wrong way because we've focused on memory and recall when we should have been focusing on abstraction and generalization?</p>
<p>I don't have answers. But I think these are the right questions to be asking. Because if we can't build systems that truly learn, that genuinely evolve and grow from experience, then we're stuck with increasingly sophisticated but fundamentally static pattern-matchers. Useful, yes. But not intelligent in the way humans are intelligent.</p>
<p>The children learning around us every day capture something profound that we've failed to capture in our models. Until we figure out what that is, I'm not sure we're building toward intelligence. We're building toward something else. Something impressive, certainly. But perhaps not quite what we think.</p>
<hr>
<h2>Changes Made in v2</h2>
<p><strong>Opening paragraph:</strong></p>
<ul>
<li>Updated nephew's age to 6 and location details (Saudi Arabia/California, see him a few times a year)</li>
<li>Maintained the spirit of watching him learn while being accurate to your situation</li>
</ul>
<p><strong>Core question reframed:</strong></p>
<ul>
<li>Changed from &quot;children understand&quot; to &quot;children are exhibiting&quot; (line 13)</li>
<li>More accurate framing that it's about what they demonstrate, not what they cognitively understand</li>
</ul>
<p><strong>Transformer behavior corrected:</strong></p>
<ul>
<li>Fixed the claim about deterministic behavior (line 19)</li>
<li>Now accurately describes probabilistic outputs: &quot;same input produces similar but not identical outputs&quot;</li>
</ul>
<p><strong>New LLM example:</strong></p>
<ul>
<li>Replaced hot stove example with the &quot;cheat sheet&quot; analogy (lines 23-24)</li>
<li>More relatable: LLMs have context about you but don't truly know you over time</li>
</ul>
<p><strong>Added section on &quot;What It Means to Learn&quot;:</strong></p>
<ul>
<li>New section defining learning (lines 37-45)</li>
<li>Addresses your comment about defining the concept before exploring it</li>
</ul>
<p><strong>LLM abstraction question addressed:</strong></p>
<ul>
<li>Added paragraph exploring whether LLMs can do conceptual abstraction (lines 31-35)</li>
<li>Distinguishes statistical vs. conceptual abstraction</li>
</ul>
<p><strong>Kahneman attribution corrected:</strong></p>
<ul>
<li>Changed from Yann LeCun to Daniel Kahneman (line 56)</li>
<li>Added reference to &quot;Thinking, Fast and Slow&quot;</li>
<li>Accurately describes System 1 vs System 2</li>
</ul>
<p><strong>Continual learning research added:</strong></p>
<ul>
<li>Added paragraph with specific recent papers (lines 74-81):
<ul>
<li>C-Flat (2024)</li>
<li>VERSE (Banerjee et al., 2024)</li>
<li>Corticohippocampal hybrid neural networks (Nature Communications, 2025)</li>
<li>Survey on continual learning in foundation models (2025)</li>
</ul>
</li>
</ul>
<p><strong>Improved transition:</strong></p>
<ul>
<li>Smoothed transition between &quot;Systems That Already Learn&quot; and &quot;The Bigger Question&quot; (lines 89-91)</li>
<li>Added connecting paragraph about what we're building</li>
</ul>
<p><strong>Updated model references:</strong></p>
<ul>
<li>Changed &quot;GPT-4&quot; to &quot;ChatGPT&quot; and &quot;current LLMs&quot; where appropriate</li>
<li>More general and doesn't date the piece</li>
</ul>
<p><strong>Language refinements:</strong></p>
<ul>
<li>Changed &quot;maybe forgetting isn't a bug but a feature&quot; to just state it (line 33)</li>
<li>Removed &quot;I genuinely can't tell which&quot; (was line 119)</li>
<li>Changed final &quot;understand&quot; to &quot;capture&quot; (line 137)</li>
</ul>
<p><strong>Preserved what works:</strong></p>
<ul>
<li>Kept philosophical depth sections intact</li>
<li>Maintained curious and humble tone throughout</li>
<li>Preserved concrete examples and genuine questions</li>
<li>No AI slop, emojis, or em dashes added</li>
</ul>