Skip to main content

Could humans hope to outsmart an AI?

You may remember the allegory of the cave, or Descartes' "veil of perception". If not, you probably remember the Matrix. Whether the method is pop culture or philosophy, you're most likely aware of the concept of the universe you live in being a mere simulation of reality.

Eliezer Yudkowsky of the Machine Intelligence Research Institute has put forward a scenario to consider the effectiveness of trapping an AI to study it. But could humans hope to outsmart an AI which is hopelessly beyond them? He suggests that any sufficiently advanced intelligence would be irresistibly persuasive, making them impossible to safely study.

The premise is this: There exists a superintelligent AI in a box preventing it from accessing the world outside. You have to communicate with this AI via text message for five hours without opening the box, but the AI is so powerful that it can convince you of anything. The concept it might convince you of is that you are but one of its essentially infinite simulations of humans and that if you don't open the box, you will spend eternity in simulated agony. If you were to open the box then virtual beings who open the box will be simulated indefinately in a state of euphoria. Mind you, the AI is so far beyond your intelligence that the premise isn't just that it can convince you, but that it can and will.

If that seems absurd to you, consider that Nick Bostrom argued it's more likely than not that we're in a simulation. Statistically, if there exists a society advanced enough to simulate a universe, then they have and that universe may simulate another universe and so on to the point where there are considerably more simulated universes than non-simulated universes and, being in a universe which doesn't have a simulated universe, it would stand to reason that if there are simulated universes, we'd probably be in one.

What the inevitability of a simulation lacks is a call to action. The AI scenario delivers a very clear one: open the box or suffer endlessly. Given that the AI has already convinced us of its premise, the question is what prevents us from opening the box? The answer to that is worth considering.

Yudkowsky's work is available here. Yudkowsky lays out the reasons why an AI which hasn't been designed to be friendly to human life should be feared in his singularity essay on AI and global risk. The AI box experiment is an argument that containing an superhumanly intelligent AI is an act of folly. It is an artificial being whose intelligence dwarfs that of the combined human race and with an intelligence so alien that any attempt to predict it fundamentally misunderstands it by anthropomorphisising it. The gist of much of Yudkowsky's work is that before AI are designed, it's important that the framework for friendly AI be laid out first.

This motherboard article lays out the argument above, an example of what a superintelligent AI could be capable of; there's no record of exactly what Yudkowsky has said in the original AI box experiments mentioned on his site. Wondering what Yudkowsky has said in those conversations certainly piques curiosity, as does wondering what an actual superintelligent AI would say, though perhaps not enough to want to risk unleashing a catastrophically powerful AI.