Abstract: I analyze the algorithm which students are typically taught for evaluating summation notation, and show that it is inefficient. Then I give an alternate algorithm, eliminating the inefficiency. However, the natural question is: how do we know the two algorithms give the same answer? Although it seems intuitively obvious that they do, it is actually a very subtle question, justifying the title of this article, “Is Summation Notation Ambiguous?”

The Usual Algorithm

Let’s look at how we evaluate summation notation. Define expressions inductively: for any number n, n is an expression; the variables x,y,z,… are expressions; if A and B are expressions, then (A)+(B) and (A)(B) are expressions; and if A and B are expressions and x is a variable, then Sum_(x=0)^(B)A is an expression.

A closed expression is an expression with no free variables (this can be rigorously defined by induction but I’ll omit that). For example, Sum_(x=0)^(5)x is a closed expression, but Sum_(x=0)^(y)x is not, because it contains the free variable y; you can’t calculate the latter sum down to a number if you don’t know what y is.

From basic algebra, we know how to evaluate a closed expression. We do it inductively (of course, it usually isn’t explained to us in such precise detail, but this is equivalent to what we are usually taught):

  • The value of a number n, is n itself.
  • The value of (A)+(B) is the sum of the values of A and B.
  • The value of (A)(B) is the product of the values of A and B.
  • The value of Sum_(x=0)^(B)A is the sum of the values of A(0),…,A(b), where b is the value of B and each A(i) is the closed expression obtained by replacing all instances of x in A by i.

The above definition is an algorithm. How do we know it converges to an answer? By induction on expression complexity. Why isn’t there a case listed for evaluating variables? Because the algorithm is only intended to evaluate *closed* expressions, and a variable expression is not closed. In fact, one very eccentric definition of “closed expression” could be “does not cause the above algorithm to enter an undefined case”.

Now let’s zero in on part of the above algorithm. What does it mean to “replace all instances of x in A by i”? Well, it has a very obvious meaning which we typically take for granted. You copy A down but write “i” wherever “x” was. But of course, a computer has neither eyes nor pencil, so we have to make this precise. So we define:

  • For a number n, if you replace x by i in n, you get n itself.
  • For a variable y, if you replace x by i in y, you get: i, if y=x; y, if y≠x.
  • If you replace x by i in (A)+(B), you get (A(i))+(B(i)), where A(i) and B(i) are the results of replacing x by i in A and B.
  • If you replace x by i in (A)(B), you get (A(i))(B(i)).
  • If you replace x by i in Sum_(y=0)^(B)A, where y≠x, you get Sum_(y=0)^(B)A(i).
  • If you replace x by i in Sum_(x=0)^(B)A, you get Sum_(x=0)^(B)A.

Now, the above operations are not without a price. In order to replace x by i in an expression, a computer must run through every sub-expression, checking which case it is and either copying it or copying an altered form of it. For example, consider the expression Sum_(x=0)^(9)1 (which should evaluate to 10). The evaluation algorithm instructs us to evaluate 1(0)+1(1)+1(2)+…+1(9), where 1(i) means “the expression obtained by replacing x by i in 1″. Each time we compute 1(i), we have to run the “replace” subroutine above. The subroutine tells us to check which type of expression 1 is; we see it’s a number, so we copy it verbatim. We do this 10 times, and store 10 separate copies of “1″ in memory. What a waste! Of course, in practice we would skip all this work and immediately see Sum_(x=0)^(9)1=10, but the means by which we do this are ad hoc. How would you program a computer to do something so clever? You could shoehorn this one special case into the algorithm, but I could list ten new special cases for each one you add.

The More Efficient Algorithm

The problem with the algorithm we’re taught for evaluating summation notation is it involves writing down lots of modified copies of subexpressions. If we follow the algorithm to the very letter (as a computer must), we end up doing more substitutions than actual computations, and all those copies-of-subexpressions take up a ton of space (on paper or in RAM).

A better way is to leave the expression and its sub-expressions unchanged, and instead keep track of the values of variables. In fact, we have one of those situations– so beautiful whenever they occur– where the problem is actually easier if you generalize it. In the previous section, we limited ourselves to evaluating *closed* expressions. Let us now turn to the more general problem of evaluating arbitrary expressions (which may include free variables), given the values of the variables. The algorithm writes itself:

  • Input: an expression S and a list L of variable values, including values of all variables occurring free in S.
  • Desired Output: the value of S relative to L.
  • If S is n, a natural number, then output n.
  • If S is x, a variable, then output whatever value L says x has.
  • If S is (A)+(B), then evaluate A and B using L, and output their sum.
  • If S is (A)(B), then evaluate A and B using L, and output their product.
  • Suppose S is Sum_(x=0)^(B)A. Let b be the value of B using L. Let SUM=0. For i=0 to b:
    • Let L(x|i) be the list obtained by adding “x=i” to L (if L already had a value for x, do not include the old value in L(x|i))
    • Increment SUM by A(L(x|i)), the value of A relative to L(x|i).
  • Output SUM.

As for closed terms, we have a quick little “wrapper” algorithm:

  • Input: a *closed* expression S. Desired output: its value.
  • Output the value (using the previous algorithm) of S relative to the empty list.

Note that on a computer, this method does not need to involve any copying at all of subexpressions, since instead of passing the (unmodified) subexpressions themselves, we can just pass pointers to them. We don’t even need to make copies of L, provided we’re sufficiently clever with singly-linked lists.

The Ambiguity

If the two algorithms aren’t equivalent, then we have something of a crisis for mathematics, because both methods seem equally valid as potential definitions for the value of summation notation; which one is the “correct” one? Mathematicians abhor nothing more than having to make an arbitrary choice. So the question is: are the two algorithms equivalent?

I love this question because it seems so obvious the answer is yes! But how the heck do you prove it? It’s actually a special case of a more general result in my paper, The First-Order Syntax of Variadic Functions (PDF), which I’m infinitely happy to say will appear in the very excellent Notre Dame Journal of Formal Logic. We can view “expressions”, as defined here, as terms in first-order logic in an appropriate language, except for one little catch: first-order logic does not come with machinery for summation notation! No problem, we can just add that machinery. And that’s precisely what I do in the paper.

In first-order logic, an assignment is a map which takes variables in the language into elements of the universe. This corresponds to the “lists” in the efficient algorithm above. If s is an assignment and t is a term, we usually write t^s for the interpretation (value) of t in a structure assuming variable values given by s. But, due to the above sections, there is an ambiguity in how this should be defined when the language contains something like summation notation. For simplicity, work in the language with + and *, constant symbols for the naturals, and summation notation, and let the universe be the natural numbers. At least until we resolve the ambiguity question, there are potentially two term interpretations, which I’ll write t^s and t_s:

  • If n is (a constant symbol of) a natural then n^s=n_s=n.
  • If x is a variable then x^s=x_s=s(x).
  • If A and B are terms then ((A)+(B))^s = A^s + B^s and ((A)+(B))_s = A_s + B_s.
  • If A and B are terms then ((A)(B))^s = (A^s)(B^s) and ((A)(B))_s = (A_s)(B_s).
  • If A and B are terms and x is a variable, then (Sum_(x=0)^(B)A)^s = A(x|0)^s +…+ A(x|B^s)^s, where A(x|i) is the result of substituting (the constant symbol) i for x in A.
  • If A and B are terms and x is a variable, then (Sum_(x=0)^(B)A)_s = A_(s(x|0)) +…+ A_(s(x|B_s)), where s(x|i) is the assignment which agrees with s except that it maps x to i.

Now, in order to show that the two algorithms in this article are equivalent, it is enough to show that for every term A and assignment s, A^s=A_s. This seems like it should be a trivial induction on the structure of A. All the cases except for summation are trivial. As for the summation case, we start computing:

  • (Sum_(x=0)^(B)A)^s = A(x|0)^s +…+ A(x|B^s)^s (by definition)
  • (Sum_(x=0)^(B)A)^s = A(x|0)_s +…+ A(x|B_s)_s (by induction)

Now, what we need to make the final step, is that for each i, A(x|i)_s = A_s(x|i). So we’re done by that basic result of first-order logic, the Substitution Lemma, which says precisely this! …right?

Well, not quite. The Substitution Lemma is a theorem of standard first-order logic, and we’re not working in standard first-order logic, we’re working in first-order logic with summation notation added. So, the final step is to re-prove the Substitution Lemma in this nonstandard logic. And I do that in my paper. The details become substantially more complicated than they are in ordinary first-order logic. But boy is it worth it, since now we can use whichever version of summation notation evaluation interchangeably, with no fear of ambiguity.

Acknowledgments

Thanks to the anonymous referees from NDJFL, without whose suggestions I never would’ve been led to notice this ambiguity. Thanks to Mike Fenwick for reading a rough draft of this article and for much feedback and discussion on it.

FURTHER READING

The First-Order Syntax of Variadic Functions (PDF)
Goodstein Sequences
The Illogician
Ambiguities in Mathematics
Obtaining Strong Recursion from Weak Recursion