I've been watching my nephew learn and grow. He's six now, lives in the middle east while I'm in California, so I only see him a few times a year. But every visit, I'm struck by how much he's grown. Not just physically, but in how he thinks, communicates, and understands the world. The accumulation is remarkable, especially considering how little kids seem to remember day-to-day.
This shouldn't work. In machine learning terms, children have limited working memory, inconsistent training data, and what looks like catastrophic forgetting. Yet they learn language, social norms, abstract concepts, and motor skills with a speed and depth that outpaces any AI system we've built.
What are children exhibiting about learning that we don't fully understand?
Here's what strikes me as strange: children's working memory is limited. They can't recall specific training examples the way our models can access their parameters. Yet they learn to imitate language, movement, expressions, and social norms with remarkable speed.
When I think about how transformers work, it's different. They compress vast amounts of training data into billions of parameters. They have context windows that span thousands of tokens. But they're also probabilistic, not deterministic. The same input produces similar but not identical outputs. Still, there's a kind of stability there. The knowledge is encoded, accessible, reliable.
But they also never really learn. Not in the way a child does.
Here's what I mean: you can have hundreds of conversations with ChatGPT about your life, your preferences, your way of thinking. Each conversation, it has a cheat sheet about you from context. But it doesn't really know you deeply. It hasn't learned you over time the way a friend would. Tomorrow's conversation starts fresh. The accumulated understanding doesn't persist.
A child learning "hot" from touching a stove once doesn't just memorize that instance. He builds a concept that generalizes to candles, ovens, steam, anything that gives off heat. One example becomes a principle.
What's the difference? I'm not entirely sure, but I have a hypothesis: maybe memory and learning are inversely related in some fundamental way.
I used to think learning was just sophisticated memorization. Store enough patterns, retrieve the right one at the right time, done. That's essentially how transformers work: compress vast amounts of text into parameters, then predict what comes next based on stored patterns.
But watching how children learn has made me question this. They don't store and retrieve. They adapt and generalize. When a child learns "dog," he doesn't just memorize specific instances of dogs. He builds an abstract concept that lets him recognize dogs he's never seen, in contexts he's never encountered.
Can LLMs do this? In a sense, yes. They can recognize novel dogs from their compressed understanding of "dogness" across millions of training examples. But there's a difference in how that understanding forms. The LLM needs vast data to compress into statistical patterns. The child needs a handful of examples to extract an abstract concept.
Maybe the key is in the forgetting. Maybe forgetting isn't a bug, it's a feature. It forces the brain to extract what matters and discard what doesn't. To compress experience into the underlying fundamentals behind a concept, rather than store examples.
LLMs do compression too, but it's a different kind. They compress millions of documents into a fixed parameter space. Children compress lived experience into concepts, relationships, and rules. The former is statistical abstraction. The latter is conceptual abstraction. They might not be the same thing.
Before going further, maybe it's worth defining what I mean by "learning."
I mean: the ability to integrate new information into your worldview in a way that changes how you understand and interact with the world. Not just adding facts to memory, but updating your mental models. Building new connections. Seeing patterns you couldn't see before.
By this definition, memorization isn't learning. Reciting a poem doesn't mean you've learned about poetry. But understanding how metaphor works, and being able to create your own, does.
True learning is generative. It lets you do things you couldn't do before, think thoughts you couldn't think before. It's fundamentally transformative.
Current AI systems are incredible at the memorization side. They can store and retrieve vast amounts of information. But the transformative, generative aspect? That's harder to see. They can combine things in novel ways, sure. But can they genuinely update their understanding based on new evidence? Not really. Not yet.
I've been reading about how different species learn. Many animals can learn associations: if I press this lever, I get food. Some can learn sequences: do A, then B, then C to achieve a goal. A few can learn through observation: watch another do it, then replicate.
Humans do all of this, but we also do something else. We learn meta-strategies. We learn how to learn. We figure out that trying different approaches works better than repeating the same failed strategy. We develop curiosity as a learning tool. We ask "why" and "what if."
This feels connected to what Daniel Kahneman talks about in "Thinking, Fast and Slow": the difference between System 1 (fast, intuitive, automatic) and System 2 (slow, deliberate, logical) thinking. Current AI is mostly System 1. It pattern-matches incredibly well. But it doesn't have the reflective, metacognitive layer that lets you step back and say, "Wait, my approach isn't working. Let me try thinking about this differently."
I wonder if this is related to the memory question. Maybe true learning requires being able to forget details while retaining structures. And maybe that's only possible when you have limited memory that forces you to be selective about what you keep.
Here's what I find frustrating about current architectures: they're limited by their strengths. ChatGPT, Claude, Gemini are incredibly capable because they've compressed huge amounts of knowledge into their parameters. But that's also why they can't learn new things after training.
They know what they know. You can give them new information in context, and they'll use it. But they don't integrate it into their understanding the way humans do. Tomorrow, when you start a new conversation, they've "forgotten" everything you told them yesterday (unless it's explicitly added to context).
This is the opposite of the child problem. Children have poor short-term memory but excellent long-term learning. LLMs have vast compressed knowledge but no individual learning. Is there a middle ground? Or is there a fundamental trade-off we haven't figured out how to navigate?
Some researchers are working on continual learning, trying to build systems that can update their knowledge without catastrophic forgetting. Recent work is promising. Methods like C-Flat (2024) create flatter loss landscapes that make models more stable during continual learning. VERSE (Banerjee et al., 2024) processes each training example only once while preserving past knowledge through virtual gradients. There's even research on corticohippocampal-inspired hybrid neural networks (Nature Communications, 2025) that emulate dual representations similar to how the brain separates short-term and long-term memory.
But from what I've seen, it's still brittle. The models either forget old things when learning new ones, or they become increasingly rigid and resist updates. Recent surveys on continual learning in the era of foundation models (2025) suggest we're making progress, but we haven't solved the fundamental problem.
I don't think we've found the right formulation yet. We're trying to bolt learning onto architectures designed for memorization. Maybe we need a fundamentally different approach.
What would it mean to build an AI system that truly learns? Not just updates parameters or expands context, but actually evolves its understanding over time the way humans do?
I can imagine a few possibilities, though I'm uncertain about any of them:
Hybrid architectures: Separate systems for long-term knowledge (transformer-like, stable) and short-term adaptation (something else, dynamic). The stable component provides foundational understanding. The adaptive component learns from recent experience and gradually influences the stable component through some kind of consolidation process. Similar to how human memory works with working memory, short-term memory, and long-term memory as distinct systems.
Embodied learning: Maybe the key is that children learn through interaction with a physical world that has consistent rules. They get immediate feedback. They can run experiments. Current LLMs learn from static text, which is just descriptions of the world, not the world itself. Perhaps true learning requires grounding in consistent, physical reality.
Meta-learning architectures: Systems that don't just learn patterns in data, but learn strategies for learning. They'd need some way to evaluate their own learning process and adapt it. This feels closer to the human metacognitive ability, but I have no idea how to implement it.
Embracing forgetting: What if instead of trying to prevent catastrophic forgetting, we designed systems that strategically forget? Keep only compressed abstractions, discard specifics. Force the system to build hierarchical representations because it literally can't store everything. This is hand-wavy, but the intuition is that forgetting creates pressure to extract principles.
None of these feel quite right to me. They're educated guesses, not solutions. I suspect the answer involves something we haven't thought of yet.
Here's a question that makes me uncertain about my entire framing: don't we already have systems that can continuously learn, evolve, and grow?
Deployed LLMs do get updated. ChatGPT today isn't the same as ChatGPT at launch. The models are retrained on new data, fine-tuned based on user feedback, improved through RLHF. Isn't that learning?
Maybe. But it feels different from what I mean. It's learning at the species level, not the individual level. ChatGPT as a product evolves, but my particular instance of ChatGPT doesn't learn from my conversations. It's more like evolution than learning: new generations incorporate adaptations, but individuals stay fixed.
Human learning is individual and continuous. I learn from every conversation, every experience, every mistake. The learning happens in real-time, not through population-level updates.
Is this distinction meaningful? I think so, but I'm not entirely sure why. There's something about individual, continuous adaptation that feels essential to what I mean by "learning," even if I can't precisely articulate what that something is.
This raises a broader question about what we're actually building.
If individual learning is what separates biological intelligence from our current AI systems, then we exist in a strange moment. We have systems that can pass many tests of intelligence. They can write, reason, code, and converse. But they can't grow from those experiences. Each interaction is isolated, forgotten, lost.
We exist in a state of not-knowing. We're surrounded by mystery, complexity, and uncertainty. Our response to this is to learn, to grow, to evolve our understanding.
If you knew everything, would you need to learn? The question feels almost paradoxical. Knowing everything seems theoretically possible, but it would mean you existed in a static, fully-understood universe. Nothing would surprise you. Nothing would require adaptation.
That universe doesn't match our reality. The world is dynamic, complex, and bigger than any individual's understanding. Learning isn't a nice-to-have capability. It's the fundamental response to living in a universe you don't fully comprehend.
This might be what makes humans special. Not that we're smarter than other species (though we are, by most measures), but that we have this profound capacity to learn, to change, to grow in response to the unknown. We can fundamentally alter our understanding, update our beliefs, and evolve our capabilities in ways that go beyond instinct or conditioning.
Current AI systems don't have this. They're frozen snapshots of knowledge. Incredibly useful snapshots, but snapshots nonetheless. They can help us learn, but they can't learn alongside us. Not yet.
I try to imagine what an AI system that truly learns would look like. Not just incremental improvements to current architectures, but something fundamentally different.
It would start knowing very little. Unlike current LLMs, which emerge from training with vast knowledge, this system would begin almost blank. But it would learn quickly from experience, building understanding through interaction.
You could teach it something new, and it would integrate that knowledge into its worldview. Not just add it to context, but actually update its understanding. Tomorrow, it would remember what you taught it yesterday and build on it.
It would make mistakes, notice them, and correct itself. Not through retraining, but through reflection and adaptation in real-time.
It would develop genuine expertise in specific domains through deep engagement, rather than shallow knowledge across everything.
Most intriguingly, it would keep getting better over time. Not because humans updated it, but because it genuinely learned from its experiences.
Is this possible? I don't know. It requires solving problems we don't fully understand: continual learning without catastrophic forgetting, online adaptation without instability, knowledge integration without loss of capabilities, metacognitive awareness of the learning process itself.
These might be tractable engineering challenges. Or they might require fundamental breakthroughs in how we think about intelligence and learning.
I started this piece thinking about children and ended up questioning the entire foundation of how we build AI systems. I'm left with more questions than answers:
Is the lack of true learning in LLMs an architectural limitation or a deeper conceptual problem? Can we patch continual learning onto transformers, or do we need entirely new paradigms?
Why exactly does forgetting seem important for learning? Is it just about computational efficiency, or is there something deeper about being forced to abstract and generalize?
Do we actually want AI systems that continuously learn from interactions? The safety implications are terrifying. A system that updates its values and understanding based on experience could drift in unpredictable directions.
What would it mean for an AI to "understand" that it's learning, the way humans have metacognitive awareness of our own learning process? Is self-awareness of learning a prerequisite for effective learning, or just a side effect?
And the question I can't shake: have we been thinking about AI learning in fundamentally the wrong way because we've focused on memory and recall when we should have been focusing on abstraction and generalization?
I don't have answers. But I think these are the right questions to be asking. Because if we can't build systems that truly learn, that genuinely evolve and grow from experience, then we're stuck with increasingly sophisticated but fundamentally static pattern-matchers. Useful, yes. But not intelligent in the way humans are intelligent.
The children learning around us every day capture something profound that we've failed to capture in our models. Until we figure out what that is, I'm not sure we're building toward intelligence. We're building toward something else. Something impressive, certainly. But perhaps not quite what we think.
Opening paragraph:
Core question reframed:
Transformer behavior corrected:
New LLM example:
Added section on "What It Means to Learn":
LLM abstraction question addressed:
Kahneman attribution corrected:
Continual learning research added:
Improved transition:
Updated model references:
Language refinements:
Preserved what works: