As kids we used to play that game: “Infinity!” “Infinity plus one!” “…Infinity plus infinity!” Then we grew up and people told us that game was silly, that there’s only one infinity and you can’t go any higher. Finally, we went to math grad school and learned that the former people didn’t know what they were talking about. In yet another case of children outsmarting adults, we were right all along, and the hierarchy of infinitudes is as beautiful as it is vast. It is filled with structure and mystery and even now many open questions remain.

## Omega: The First Infinity

Mathematicians write ω (“omega”) for the smallest infinity. When they’re trying to impress you with how smart they are, they might also write (“Aleph-null”), but when you actually work with this thing, it gets really tiring writing that symbol over and over. This infinity is the size of the counting numbers. If you were to start counting: 1, 2, 3, 4, … and somehow you transcended time and counted into infinity, the first thing after all the finite numbers would be ω.

Here is how the set theorist might define ω. We start with the empty set, ∅, and call this “zero”. Then we union it with the singleton containing it, and get {∅}, and call that “one”. Repeat the operation and get {{∅},∅} and call it “two”, and so on. This process lets us define the individual finite numbers. To go beyond finite, we assume there exists a set I with the following two properties: first of all, I contains ∅; second of all, whenever I contains any set element x, then I contains x union {x}. This assumption is called the Axiom of Infinity. The smallest such I is ω. It is the set of natural numbers.

## Computable ordinals and ω_{1}^{ck}

The first infinity, ω, which is the set of natural numbers, comes with an obvious *ordering*: 0 is less than 1, which is less than 2, and so on. This is a well-ordering: every set of naturals has a smallest element with respect to this order, and there’s a natural way of drawing the naturals on a number line (the order on ω is *linear*).

There are other ways to order the set of naturals. As an example, we can declare that 1 is less than 2, 2 is less than 3, and so on, but as for 0, we promote 0 to the very top, and declare that every other counting number is smaller than 0. This re-shuffling of the naturals is still a well-order. We say that it has *order type* ω+1, because if we ignore the names of the elements and look at the ordering from a bird’s-eye view, it looks like a copy (1, 2, 3, 4, …) of ω, followed by **1** new thing (0) at the top. This order type ω+1 is the 2nd-smallest infinite *ordinal number* (the smallest is ω itself). This is what children mean when they say “infinity plus 1″.

Again, we can take 0 and 1, promote them above all the other naturals, and get a new order on the naturals, which looks like (2,3,4,…,0,1). This looks like a shifted-copy of ω, followed by exactly two new things on top. Its order type: ω+2. Similarly we can get ω+3, ω+4, and so on.

To get to the next level, take just the odd numbers and give them the usual ordering, and take just the even numbers and give them the usual ordering, but stipulate that every odd number is lower than every even number. Then we end up with a shuffling of the naturals which looks like (1,3,5,…,2,4,6,…) It’s a copy of ω followed by another copy of ω, and thus its order type is: ω+ω. “Infinity plus infinity!”

But enough specific examples. Let’s go a bit more abstract. The above shufflings of the counting numbers can be programmed into a computer. For example, to “program” ω+1, we use the following algorithm:

- Function Compare(natural number x, natural number y)
- If y=0 then output “x
__<__y” and halt. - If x=0 then output “y
__<__x” and halt. - Otherwise, if “x
__<__y”, output “x__<__y”, else output “y__<__x”.

To “program” ω+ω, use this algorithm:

- Function compare(natural number x, natural y)
- If x is odd and y is even, then output “x
__<__y” and halt. - If x is even and y is odd, then output “y
__<__x” and halt. - Otherwise, x and y have the same parity. If “x
__<__y”, output “x__<__y”, else output “y__<__x”.

To generalize, suppose you have *any* computer program at all for comparing naturals. It determines an order, and if that order is a well-ordering (every set has a minimal element and they are shuffled on a number line), then you’ve got yourself what’s called a *recursive ordinal*.

Imagine ourselves getting creative with the game we played back then. I don’t just stop at “infinity plus infinity”, I up the ante and say: “infinite times infinity, times infinity, times infinity, and so on infinitely many times”. That’s a recursive ordinal (ω^{ω} to be precise). Then you come back: “infinity to the infinity power, to the infinity power, to the infinity power, and so on infinitely many times!” That’s also a recursive ordinal (it’s called ε_{0} and is of deep mathematical importance). We continue in this manner, constantly one-upping each other with greater and greater creativity in the infinities we describe. But without one of us making a quantum leap, we’ll only ever name recursive ordinals in this way, and in the bigger scheme of things, recursive ordinals are relatively tiny.

To make the quantum leap above recursive ordinals, we have to throw a wrench in the children’s game, interrupt it, and short-circuit it. The recursive ordinals can *themselves* be described as natural numbers (using the fact that a computer program is just a string of 1′s and 0′s), and this leads to yet another reshuffling of the naturals, but one far too complicated to program, lest the universe would implode in a Halting-Problem-Solving poof of logic. This reshuffling is a well-order and its order type is called ω_{1}^{ck} (“omega-one-C.K.”) It is the first non-computable infinity. The C.K. stands for Church-Kleene, two big names from early computability theory.

After ω_{1}^{ck}, you can ask about the ordinals you could get if you could magically solve the Halting Problem, and so on. It is technically subjective where lies the next big “quantum leap”, but I believe most people would agree it comes from Georg Cantor.

## The Cardinality of the Continuum

Georg Cantor was an insane (literally) German mathematician who managed to troll all his contemporaries. They raged and insisted his mathematics couldn’t possibly be true, but today it is accepted without controversy. What Cantor discovered were the *uncountable infinities*. Suppose you take a finite set, say, with 5 elements, and consider the set of all its subsets. The number of those subsets is 2^{5}=32. In general, if you start with a set with n elements, n a finite number, then it has 2^{n} elements. Cantor asked: what happens if you start with ω, and take all of its subsets? He proved the resulting set– the *power set of ω*– was sufficiently larger than ω, there was no way to reshuffle ω to get anything that looked like ω’s powerset. The subsets of ω transcend ω itself in a very powerful way.

If X is a set, let P(X) denote its powerset, the set of all subsets of X. Cantor proved that there is never a perfect one-to-one mapping between X and P(X), no matter which set X be. You can perfectly map ω with ω+1 because ω+1 is just a re-shuffling of ω; ω+1 is *countable* and has the same *cardinality* as ω. But not P(ω). If you try to “cover” P(ω) with a blanket made out of elements of ω, you can only ever cover an infinitesimally tiny sliver of it.

P(ω) turns out to have the same size as the set of real numbers. For that reason it is called the cardinal of the continuum (since the real number line is sometimes called the “continuum” of real numbers). People wondered: is there a cardinality between ω and P(ω)? (Of course, ω+1 is between ω and P(ω), but ω+1 is not its own cardinality, since it is just a reshuffling of ω.) It was conjectured that there is no cardinality between ω and P(ω). This conjecture is now known as the Continuum Hypothesis (CH), but logicians in the 20th century proved that it is a statement independent of set theory. To be precise: based on the classic assumptions of set theory, ZFC, you can neither prove nor disprove the Continuum Hypothesis. To establish this, specific set theories were exhibited (satisfying ZFC) where the Hypothesis holds (so ZFC cannot disprove the hypothesis), and other set theories were exhibited (also satisfying ZFC) where the Hypothesis fails (so ZFC cannot prove the hypothesis).

We can iterate Cantor’s theorem, taking P(P(ω)) and P(P(P(ω))), and so on. We can even iterate the process infinitely often (to be more precise, iterate it x times, where x is any infinity we’ve already built). But now we’re just back in the child’s game: “Power set of ω!” “Power set of *that*!” “Power set of *that*!” The next quantum leap requires even more introspection and short-circuiting.

## Large Cardinals and Consistency of Set Theories

Kurt Gödel out-trolled Cantor by proving that if any set of assumptions is strong enough to allow basic math, then it cannot prove its own consistency, unless it is inconsistent. In particular, ZFC, the basic axioms upon which all of classic mathematics is built, cannot prove the consistency of ZFC– unless ZFC isn’t consistent in the first place (and in that case we’re really screwed).

What would it take for ZFC to prove the consistency of ZFC? One way would be to use ZFC to build a set X and demonstrate that ZFC holds of X: that the elements of X are a model of ZFC, satisfying all its requirements. For example, for any element x inside X, the power set P(x) of x would also have to be in X (X would have to be “closed under power set”), among other requirements.

Call a cardinal X “large” if its elements satisfy the assumptions of ZFC. If we could prove, in ZFC, that a large set X exists, then that would prove, in ZFC, that ZFC is consistent. “Nuh uh uh,” Gödel wags his finger at us, “you can’t do that!”

It follows that if any such cardinal X does exist, then it is an infinity which totally transcends anything Cantor was able to come up with. Set theorists studying large cardinals look at infinitely-iterated power-sets and laugh at their diminutive tininess.

Actually, set theorists work in greater generality. They ask about different axioms you can add to ZFC to gain enough strength to prove CON(ZFC). I just mentioned the most obvious such new axiom, namely, “There exists a cardinal X such that ZFC holds of X.” To the large cardinal theorists, this is a rather boring new assumption. Even as we speak, they’re experimenting with all manner of different new assumptions– some quite exotic-looking!– which imply CON(ZFC) less directly. The upshot is, different “large cardinal axioms” yield different infinities. There is a hierarchy of these assumptions (and a corresponding hierarchy of ludicrously enormous infinities), and, tongue-in-cheek, the top of the hierarchy is labeled “1=0″, for that is the strongest possible assumption you could ever make, and, save only for the minor detail that it’s *false*, it would yield the largest possible sets.

## But All These Infinities Are Tiny…

Even the large cardinals which transcend classic set theory, are tiny in at least one sense.

The Compactness Theorem says that if you have a set of assumptions in any first-order language, and every finite subset of those assumptions is consistent, then the entire set of assumptions is consistent.

Peano Arithmetic is a small set of assumptions which is often taken as a kind of axiomatic definition of the natural numbers. It includes axioms such as “0≠x+1 for any natural number x”, “x+(y+1)=(x+y)+1″, and “x(y+1)=xy+x”. The most complicated assumption made by Peano Arithmetic is the assumption that Mathematical Induction works.

Let X be any set at all, and let L be a language where we’ve added a new constant symbol C_{x} for every element x of X. To Peano Arithmetic, add new assumptions which say C_{x}≠C_{y} whenever x≠y. Depending on the size of X, this might be an ungodly enormous new set of assumptions! But the Compactness Theorem still holds. It says that if every finite subset of these assumptions is consistent, then the entire set of assumptions is. But a finite subset of the assumptions contains only finitely many of the assumptions C_{x}≠C_{y}, and thus can be realized by any model of Peano Arithmetic where we just interpret those finitely many C_{x} by distinct natural numbers. By Compactness, the entire new set of assumptions, in all its infinite hugeness, is consistent.

What this means is that, no matter how large X was, it is consistent that there is a model of Peano Arithmetic containing as many “natural numbers” as X is large. Even if X is a ludicrously large cardinal, there is a model of Peano Arithmetic containing that number of natural numbers. To humans living in such a world, they would believe that X was the smallest infinity of all, and that smaller things you and I know are infinite, are actually finite.

But even people living in such bizarre alternate universes would agree with the comprehensible largeness of…

## Proper Classes

There is no such thing as the “set of all sets”, because if there were such a thing, Russel’s Paradox would cause the universe to explode in a shower of fire and lightning. “Let S be the set,” Russel said, “of all sets which do not contain themselves. Does S contain itself?” If it does, then it doesn’t, and if it doesn’t, then it does, a mathematical paradox. ZFC would contain enough machinery to build S, if there were a set of all sets. So there cannot be a set of all sets. The collection of all sets is the prototypical example of a *proper class*.

More generally, a class is a collection of sets, and a proper class is a class which is not itself a set. Usually, the reason a proper class is not a set is because it is “too big” to be a set.

Once children know about proper classes, the game can continue as normal. “Proper class of all sets!” you say. “Proper conglomerate of all classes!” I answer. “Proper category of all conglomerates!” you return, and so it continues. The game has no end, no matter how much mathematics we learn. Perhaps if there really is some ultimate infinity, it is the game itself.

**Further Reading**

The Higher Infinite

Infinitely Large vs. Arbitrarily Large

Infinite Time Turing Machines

Guessability and the Turing Test

Why is 0.99999…=1?