[ Pobierz całość w formacie PDF ]
Wolfram's book), had succeeded in finding what as far as I know was the first "natural" example of a
normal number.
Looking like a replay of Liouville, Stoneham had not been able to show that p or e were normal. But he
had been able to show that the sum of a natural-looking infinite series was 2-normal, that is, normal
for blocks of bits of every possible size in base-2!
He and I both found what we were looking for! How delightful!
And in this chapter I'll tell you how we did it. But first, as a warm-up exercise, to get in the right mood,
I want to prove that Turing's halting problem is unsolvable. We should do that before looking at my
halting probability O.
And to do that, I want to show that you can't prove that a program is elegant, except finitely often.
Believe it or not, the idea for that proof actually comes from mile Borel, the same Borel as before.
Although I wasn't aware of this when I originally found the proof on my own...
So let me start by telling you Borel's beautiful idea.
Borel's Undefinability-of-Randomness Paradox
I have a very clear and distinct recollection of reading the idea that I'm about to explain to you, in an
English translation of a book by Borel on the basic ideas of probability theory. Unfortunately, I have
never been able to discover which book it was, nor to find again the following discussion by Borel,
about a paradox regarding any attempt to give a definitive notion of randomness.
So it is possible that this is a false memory, perhaps a dream that I once had, which sometimes seem
real to me, or a "reprocessed" memory that I have somehow fabricated over the years.
However, Borel was quite prolific, and was very interested in these issues, so this is probably
somewhere in his oeuvre! If you find it, please let me know!
That caution aside, let me share with you my recollection of Borel's discussion of a problem that must
inevitably arise with any attempt to define the notion of randomness.
Let's say that somehow you can distinguish between whole numbers whose decimal digits form a
particularly random sequence of digits, and those that don't.
Now think about the first N-digit number that satisfies your definition of randomness. But this
particular number is rather atypical, because it happens to be precisely the first N-digit number that
has a particular property!
The problem is that random means "typical, doesn't stand out from the crowd, no distinguishing
features". But if you can define randomness, then the property of being random becomes just one more
feature that you can use to show that certain numbers are atypical and stand out from the crowd!
So in this way you get a hierarchy of notions of randomness, the one you started out with, then the next
one, then one derived from that, and so forth and so on... And each of these is derived using the
previous definition of randomness as one more characteristic just like any other that can be used to
classify numbers!
Borel's conclusion is that there can be no one definitive definition of randomness. You can't define an
all-inclusive notion of randomness. Randomness is a slippery concept, there's something paradoxical
about it, it's hard to grasp. It's all a matter of deciding how much we want to demand. You have to
decide on a cut-off, you have to say "enough", let's take that to be random.
Let me try to explain this by using some images that are definitely not in my possibly false Borel
recollection.
The moment that you fix in your mind a notion of randomness, that very mental act invalidates that
notion and creates a new more demanding notion of randomness... So fixing randomness in your mind
is like trying to stare at something without blinking and without moving your eyes. If you do that, the
scene starts to disappear in pieces from your visual field. To see something, you have to keep moving
your eyes, changing your focus of attention...
The harder you stare at randomness, the less you see it! It's sort of like trying to see faint objects in a
telescope at night, which you do by not staring directly at them, but instead looking aside, where the
resolution of the retina is lower and the color sensitivity is lower, but you can see much fainter
objects...
This discussion that I shall attribute to Borel in fact contains the germ of the idea of my proof that I'll
now give that you can't prove that a computer program is "elegant", that is to say, that it's the smallest
program that produces the output that it does. More precisely, a program is elegant if no program
smaller than it produces the same output. You see, there may be a tie, there may be several different
programs that have exactly the same minimum-possible size and produce the same output.
Viewed as a theory, as discussed in Chapter III, an elegant program is the optimal compression of its
output, it's the simplest scientific theory for that output, considered as experimental data.
So if the output of this computer program is the entire universe, then an elegant program for it would
be the optimal TOE, the optimal Theory of Everything, the one with no redundant elements, the best
TOE, the one, Leibniz would say, that a perfect God would use to produce that particular universe.
Why Can't You Prove that a Program is "Elegant"?
So let's consider a Hilbert/Turing/Post formal axiomatic system FAS, which (as discussed in Chapter
II) for us is just a program for generating all the theorems. And we'll assume that this program is the
smallest possible one that produces that particular set of theorems. So the size of this program is
precisely the program-size complexity of that theory. There's absolutely no redundancy!
Okay, so that's how we can measure the power of FAS by how many bits of information it contains. As
you'll see now this really works, it really gives us new insight.
Building on Borel's paradox discussed in the previous section, consider this computer program...
Paradoxical Program P:
The output of P is the same as the output of
the first provably elegant program Q that you encounter
(as you generate all the theorems of your chosen FAS)
that is larger than P.
Why are we building on Borel's paradox? Because the idea is, that if we could prove that
a program is elegant, then that would enable us to find a smaller program that produces
the same output, contradiction! Borel's paradox, is that if we could define randomness,
then that would enable us to pick out a random number that isn't at all random,
contradiction. So I view these two proofs, Borel's informal paradox, and this actual
theorem, or, more precisely, meta-theorem, as being in the same spirit.
In other words, P generates all the theorems of the FAS until it finds a proof that a particular program
Q is elegant, and what's more the size of Q has got to be larger than the size of P. If P can find such a Q,
then it runs Q and produces Q's output as its own output.
Contradiction, because P is too small to produce the same output as Q, since P is smaller than Q and Q
is supposed to be elegant (assuming that all theorems proved in the FAS are correct, are true)! The only
way to avoid this contradiction is that P never finds Q, because no program Q that's larger than P is
ever shown to be elegant in this FAS.
So using this FAS, you can't prove that a program Q is elegant if it's larger than P is. So in this FAS, you
can't prove that more than finitely many specific programs are elegant. (So the halting problem is
unsolvable, as we'll see in the next section.)
And P is just a fixed number of bits larger than the FAS that we're using. Mostly P contains the
instructions for generating all the theorems, that's the variable part, plus a little more, a fixed part, to
filter them and carry out our proof as above.
[ Pobierz całość w formacie PDF ]