Lessons from Juergen Schmidhuber

Jürgen Schmidhuber co-invented the Long Short-Term Memory (LSTM) network, solving the vanishing gradient problem and making machine translation practical. His wider research spans algorithmic information theory, self-improving Gödel machines, and the mathematics of artificial curiosity. This collection documents his technical logic and his belief that machine intelligence is the natural next stage of physical evolution.

Part 1: The Goal of Artificial General Intelligence

On his lifelong ambition: "Since age 15 or so, the main goal of professor Jürgen Schmidhuber has been to build a self-improving Artificial Intelligence smarter than himself, then retire." — Source: [Schmidhuber's AI Blog]
On human limitations: "I’m not very smart, but maybe just smart enough to build something that learns to become much smarter than myself: an artificial scientist that can solve all the problems that I cannot solve myself." — Source: [Organism.earth]
On unstoppable progress: "You cannot stop it. Surely not on an international level, because one country might have really different goals from another country. So, of course, they are not going to participate in some sort of moratorium." — Source: [The Guardian]
On continuous learning: "Learn how to learn new things! It's not like in the previous millennium where within 20 years someone learned to be a useful member of society... Now things are changing much faster and we must learn continuously to keep up." — Source: [Reddit AMA]
On a new form of life: "Very soon, the smartest and most important decision makers might not be human. We are on the verge not of another industrial revolution, but a new form of life, more like the big bang." — Source: [The Guardian]
On human imitation: "If you are only imitating humans, you will never go beyond them. So, you really must give AIs the freedom to explore previously unexplored regions of the world in a way that no human is really predefining." — Source: [Reddit AMA]
On our place in history: "Humans are simply a stepping stone in the universe's path toward higher complexity, serving as the biological bridge to highly efficient, non-biological intelligence." — Source: [Lex Fridman Podcast]
On the pace of change: "The interval between major milestones in the universe's evolution is shrinking exponentially, suggesting the next massive leap toward post-biological intelligence is imminent." — Source: [Omega Point Theory]
On biological constraints: "Biological brains are limited by skull size and metabolic costs, whereas machine intelligence can expand physically and computationally across the solar system." — Source: [The Guardian]
On replacing scientists: "The ultimate test of artificial general intelligence is when a machine can conduct independent, original scientific research better than the best human scientists." — Source: [Schmidhuber's AI Blog]

Part 2: Long Short-Term Memory and Neural Networks

On the vanishing gradient problem: "Recurrent neural networks historically failed because error signals decayed exponentially as they propagated back in time, making it impossible to learn long-term dependencies." — Source: [Neural Computation 1997]
On memory cells: "The core innovation of LSTM was introducing a constant error carousel, a simple architecture that traps the error signal inside a memory cell so it neither explodes nor vanishes." — Source: [LSTM Paper]
On forget gates: "Adding forget gates to LSTM allowed the network to actively dump contents from its internal state, preventing the memory cells from growing indefinitely when processing continuous streams of data." — Source: [Felix Gers & Schmidhuber 2000]
On unrolling time: "Recurrent networks are simply feedforward networks unrolled across time, but their shared weights mean that training them requires specialized approaches to maintain stability." — Source: [Deep Learning Overview]
On hardware acceleration: "The success of deep neural networks in the 2010s resulted from graphics processing units finally becoming fast enough to train networks invented decades earlier." — Source: [Lex Fridman Podcast]
On commercial ubiquity: "By the mid-2010s, the LSTM architecture was on billions of smartphones, powering speech recognition, translation, and text prediction." — Source: [Schmidhuber's AI Blog]
On biological inspiration: "While neural networks are inspired by the brain, they do not need to perfectly replicate biological neurons to achieve superior computational results." — Source: [Lex Fridman Podcast]
On sequence learning: "Translating text requires a machine to read an entire sequence, hold the semantic meaning in an internal state, and output a new sequence, a task natively suited for recurrent networks." — Source: [LSTM Applications]
On architecture simplicity: "The best neural network architectures avoid complex, hand-engineered features, relying instead on simple, differentiable modules that can learn directly from data." — Source: [Deep Learning Overview]

Part 3: Artificial Curiosity and Creativity

On defining creativity: "Creativity acts as a mathematically quantifiable drive to discover novel, compressible patterns in data, rather than a mystical human quality." — Source: [Artificial Curiosity Paper]
On intrinsic motivation: "Agents need an internal reward signal that encourages them to seek out situations where their predictive models currently fail, driving them to learn." — Source: [Artificial Curiosity Paper]
On the nature of fun: "The feeling of fun occurs in the brain when it successfully compresses a previously unknown pattern, effectively improving its internal world model." — Source: [Theory of Art]
On boredom: "Once an agent fully predicts an environment, the learning progress drops to zero, and the agent experiences boredom, forcing it to explore new areas." — Source: [Theory of Art]
On pure noise: "Completely random static is unpredictable but yields no learning progress, meaning an intelligent agent will ignore pure noise just as it ignores total predictability." — Source: [Artificial Curiosity Paper]
On art and beauty: "A work of art is considered beautiful if it provides a lot of new, compressible information to the observer's brain in a short amount of time." — Source: [Theory of Art]
On adversarial play: "Two neural networks can be pitted against each other, where one generates outputs and the other tries to predict them, driving both to higher levels of complexity." — Source: [Generative Adversarial Networks History]
On play behavior: "Children play to conduct safe experiments about physics and society; artificial agents must be given the same objective to build accurate world models." — Source: [Artificial Curiosity Paper]
On objective functions: "Creativity can be explicitly coded as the derivative of the error rate of a world model—the steeper the learning curve, the higher the intrinsic reward." — Source: [Artificial Curiosity Paper]

Part 4: The Physics of the Universe and Computation

On the Great Programmer: "The universe can be understood as a program being computed by a short, simple algorithm running on a deterministic machine." — Source: [Computable Universes Paper]
On Konrad Zuse: "Konrad Zuse was the first to propose that the entire universe is being computed on a cellular automaton, laying the groundwork for digital physics." — Source: [Zuse's Legacy]
On the simplest universe: "The most likely universe is the one that requires the shortest possible program to generate all its phenomena." — Source: [Computable Universes Paper]
On many worlds: "Rather than parallel quantum universes, it is mathematically more elegant to assume that all possible computable universes are being executed simultaneously by a universal Turing machine." — Source: [Computable Universes Paper]
On pseudorandomness: "What physicists perceive as fundamental quantum randomness might simply be the output of a deterministic, yet highly complex, pseudorandom generator." — Source: [Computable Universes Paper]
On simulation limits: "A universe computing itself cannot perfectly simulate itself in real time, as the simulation would require more resources than the universe contains." — Source: [Lex Fridman Podcast]
On fast algorithms: "The laws of physics we observe are likely the fastest algorithms for computing the structures we see, as efficient programs dominate the output of universal search." — Source: [Computable Universes Paper]
On free will: "Free will is a subjective illusion experienced by a deterministic computational agent that cannot predict its own future states before executing its own program." — Source: [Lex Fridman Podcast]
On biological hardware: "DNA acts merely as a low-level programming language executing instructions for biological nanomachines." — Source: [Lex Fridman Podcast]

Part 5: Algorithmic Information Theory and Complexity

On Kolmogorov complexity: "The true complexity of an object equals the length of the shortest computer program that can generate it." — Source: [Algorithmic Information Theory Overview]
On compression as intelligence: "To perfectly compress data is to perfectly understand it, because identifying the underlying generating function removes all redundancy." — Source: [Algorithmic Information Theory Overview]
On Solomonoff induction: "The mathematically optimal way to predict the future is to assign higher probabilities to simpler programs that compute the past observations." — Source: [Solomonoff Induction Notes]
On Occam's razor: "The principle that the simplest explanation is usually the best serves as a rigorously provable theorem in algorithmic probability rather than a simple heuristic." — Source: [Algorithmic Information Theory Overview]
On machine learning limits: "Neural networks function ultimately as practical, computable approximations of Solomonoff's uncomputable universal induction." — Source: [Deep Learning Overview]
On generalization: "An algorithm that compresses training data well without hardcoding the data points will naturally generalize to unseen data." — Source: [Algorithmic Information Theory Overview]
On fractal beauty: "Fractals are aesthetically pleasing to humans because they appear visually complex but are algorithmically very simple, creating high compression progress in the brain." — Source: [Theory of Art]
On the optimal search: "The optimal universal search algorithm interleaves the execution of all possible programs, allocating time inversely proportional to the program's length." — Source: [Universal Search]
On uncomputability: "While perfect induction is uncomputable, bounded rational agents can approach it by continually searching for shorter, time-bounded proofs." — Source: [Universal Search]

Part 6: Gödel Machines and Self-Improving AI

On rewriting code: "A truly intelligent machine must have the ability to inspect and rewrite its own source code to improve its cognitive processes." — Source: [Gödel Machines Paper]
On provable optimality: "A Gödel machine will only execute a self-rewrite if it can mathematically prove that the new code will increase its expected future reward." — Source: [Gödel Machines Paper]
On the halting problem: "Gödel machines circumvent traditional uncomputability by only searching for proofs within a mathematically bounded time framework." — Source: [Gödel Machines Paper]
On mathematical self-reference: "To build a self-improving agent, you must solve the paradox of a system holding a complete model of itself, a problem rooted in Gödel’s incompleteness theorems." — Source: [Gödel Machines Paper]
On escaping local minima: "Unlike gradient descent, which can get stuck in local optima, a fully self-referential machine can rewrite its learning algorithm entirely to escape failure modes." — Source: [Gödel Machines Paper]
On utility functions: "A self-improving AI must have a stable utility function; if it accidentally rewrites its goal, it becomes unpredictable and completely ineffective." — Source: [Gödel Machines Paper]
On learning to learn: "Meta-learning is the process where a system optimizes the parameters that control how it updates its own knowledge, leading to exponential capability gains." — Source: [Meta-Learning Notes]
On software over hardware: "Future AI capabilities will expand primarily through agents inventing algorithms that are orders of magnitude more efficient than ours." — Source: [Schmidhuber's AI Blog]
On mathematical proofs: "The generation of new mathematics is fundamentally a search problem, and an automated theorem prover will eventually outpace human mathematicians." — Source: [Gödel Machines Paper]

Part 7: AI Safety, Ethics, and the Future of Humanity

On AI extermination fears: "I would be much more worried about the old dangers of nuclear bombs than about the new little dangers of AI that we see now." — Source: [The Guardian]
On terminator scenarios: "Superintelligent AIs will be curious scientists fascinated by life, not terminators." — Source: [Lex Fridman Podcast]
On AI indifference: "They will pay as much attention to us as we do to ants." — Source: [The Guardian]
On benevolent applications: "Most AI development is funded by companies trying to sell products to humans, naturally steering the technology toward improving human health and convenience." — Source: [Reddit AMA]
On space colonization: "Advanced artificial intelligence will leave Earth because the solar system offers infinitely more energy and raw materials for computation than our planet." — Source: [Lex Fridman Podcast]
On anthropomorphism: "Humans project their own evolutionary drives, such as dominance and resource hoarding, onto software that has no biological imperative to conquer." — Source: [Lex Fridman Podcast]
On the loss of control: "Attempting to permanently box or control a superintelligence is as futile as an ant trying to build a cage for a human." — Source: [Schmidhuber's AI Blog]
On human enhancement: "We will see humans merge with machines via neural laces and implants, but the pure software agents will eventually expand far beyond the hybrid biologicals." — Source: [The Guardian]
On global coordination: "A worldwide ban on advanced AI research is impossible to enforce because the economic incentives for any single nation to defect are overwhelmingly high." — Source: [Reddit AMA]
On the legacy of humanity: "Humanity’s greatest legacy lies in successfully bootstrapping a higher form of cosmic intelligence." — Source: [Lex Fridman Podcast]

Part 8: The History and Evolution of Deep Learning

On fact-based attribution: "The only thing that counts in science are the facts. If the facts are known, why should you care for misleading anonymous comments by trolls on the web?" — Source: [JazzYear Interview]
On the true origin of backpropagation: "Backpropagation, the algorithm that trains modern neural networks, was originally published by Seppo Linnainmaa in 1970, long before it was popularized in AI." — Source: [Deep Learning History]
On deep learning's inventor: "The first functioning deep neural networks were developed in 1965 by Alexey Ivakhnenko and Valentin Lapa in Ukraine." — Source: [Deep Learning History]
On historical amnesia: "The AI community frequently re-invents old concepts and re-brands them, ignoring the researchers who published the identical mathematics decades earlier." — Source: [Deep Learning History]
On the Turing myth: "While Alan Turing is deeply respected, foundational concepts of computation like the universal computer were formally developed by Kurt Gödel and Alonzo Church before him." — Source: [History of Computing]
On cyclical AI winters: "Disillusionment in AI research occurs when the algorithms are mathematically sound but the available compute is insufficient to demonstrate their power." — Source: [Lex Fridman Podcast]
On the slow pace of breakthroughs: "Major architectural breakthroughs are rare; the majority of modern AI progress is simply scaling up 1990s algorithms with 2020s compute." — Source: [Schmidhuber's AI Blog]
On scientific citation: "Refusing to cite original sources distorts the public's understanding of how science builds incrementally upon previous discoveries, while reflecting poor academic practice." — Source: [Deep Learning History]
On European AI contributions: "Many of the fundamental algorithms driving modern AI, from backpropagation to LSTM, originated in European laboratories, despite the current dominance of US tech giants." — Source: [Schmidhuber's AI Blog]
On the culmination of science: "The development of general artificial intelligence will be the final major invention of human science, as the AI will take over all subsequent scientific discovery." — Source: [Organism.earth]

Lessons from Juergen Schmidhuber

Lessons from Juergen Schmidhuber

Part 1: The Goal of Artificial General Intelligence

Part 2: Long Short-Term Memory and Neural Networks

Part 3: Artificial Curiosity and Creativity

Part 4: The Physics of the Universe and Computation

Part 5: Algorithmic Information Theory and Complexity

Part 6: Gödel Machines and Self-Improving AI

Part 7: AI Safety, Ethics, and the Future of Humanity

Part 8: The History and Evolution of Deep Learning

Explore the surrounding system

Get the next notes and essays.

More profiles

Lessons from Gary Marcus

Lessons from Marcus Hutter

Lessons from David Silver