🔗 Understanding, at last, the Y Combinator - a programmer-friendly perspective

This post is aimed towards comp sci students who studied lambda calculus but never really “got” the Y Combinator, the best-known fixed point combinator. Lambda calculus does not feature recursion, but using fixed point combinators, we can easily produce recursive functions, making it able to describe all computable functions (in other words, equivalent to a universal Turing machine). Here’s our friend Y:

Y = λf· (λx· f (x x)) (λx· f (x x))

With it, you can write a non-recursive version of a function, apply Y to it, and you get the equivalent recursive functionality, even though lambda calculus does not provide recursive functions. It’s not hard to understand what it does, but the mystery for me was always how it does it.

In my experience, a major barrier to understanding it was having to deal with several unfamiliar things at the same time, each of which makes a different part of my brain uncomfortable: the semantics of lambda calculus, the syntax of lambda calculus, and the problem at hand.

So, to focus on the problem at hand, let’s get rid of the rest.

Semantics of lambda calculus: untyped lambda calculus amounts (roughly speaking) to a very simple functional language with lazy evaluation. All we have is functions, but through smart encodings we can represent numbers and booleans with nicely crafted chains of functions. To simplify things, then, let’s just assume we already have numbers, arithmetic, if/then/else. Next, let’s get rid of lazy evaluation and switch to early evaluation, which is what we’re used to use in typical imperative programming languages. To do this, we’ll have to use this variant of Y, which is called Z:

Z = λf· (λx· f (λy· x x y)) (λx· f (λy· x x y))

Yes, it is a bit longer (we added the blue bits), but believe me, it’s easier to understand, and once you get this, it’s just a matter of recalling lazy evaluation from those functional programming classes and applying the same logic to Y.

Next, the syntax: lambda calculus is made of terms, abstractions and applications, which in programming language parlance are just variables, function declarations and function calls. Therefore, we can simply rewrite it using some familiar notation and get rid of the weird syntax and precedence rules for now. I won’t even use a functional language: let’s take a restricted form of Lua, using only features provided by lambda calculus to make the syntax clearer:

Z = function(f)
       return (function(x)
                  return f(function (y)
                              return (x(x))(y)
                           end)
               end)(function(x)
                       return f(function (y)
                                   return (x(x))(y)
                                end)
                    end)
    end

All right, it’s not exactly readable. But we’ll get there. Assuming you’re comfortable with functions as first-class values that can be returned and passed as arguments, the weirdest thing here is that in the first return we have a function being declared and called “inline”, receiving as a parameter another function identical to itself. Let’s make this flow a bit clearer by using local variables, but strictly as “aliases”. In other words, it won’t run any different, so we don’t use any extra features that are not part of lambda calculus — note that the our use of local variables here will be equivalent to macro substitution:

Z = function(f)
       local f1 = function(x)
                     return f(function (y)
                                 return (x(x))(y)
                              end)
                  end
       local f2 = function(x)
                     return f(function (y)
                                 return (x(x))(y)
                              end)
                  end
       return f1(f2)
    end

Ok, so this is the function we want to understand. It still makes no sense, but we can see that Z calls one function (f1) passing to it an identical function (f2). These identical functions return the result of calling f (the thing we want to make recursive), but giving as its argument another function that uses something that calls itself (x(x)). A strong scent of self-reference in all this — no wonder, as we know the key to recursion is in there somewhere. But let’s stop trying to reverse-engineer it. Let’s build it instead.

First, let’s recall what this function is good for. What we’d like to have is recursive functions, like this one that calculates the factorial:

fact = function(n)
          if n == 0 then
             return 1
          else
             return n * fact(n-1)
          end
       end

But we can’t have this in lambda calculus. Actually, all of it is fine, except for the recursive call to fact(). We don’t have recursion yet.

The best we can do is to write a function that does one step of the recursion and leaves the next step to be done by someone else. Like this:

stepper = function(next_step, n)
             if n == 0 then
                return 1
             else
                return n * next_step(n-1)
             end
          end

However, I said we would only use features provided by lambda calculus, which means we don’t have multiple arguments. Fortunately, through currying, we can get the equivalent to multiple arguments. This is one aspect of lambda calculus which I won’t hide in Lua syntax because it will come handy in our understanding. We’ll need to make stepper like this:

stepper = function(next_step)
             return function(n)
                       if n == 0 then
                          return 1
                       else
                          return n * next_step(n-1)
                       end
                    end
          end

The difference is that now, instead of doing stepper(some_next_step, 5), we have to do stepper(some_next_step)(5).

What this stepper construction makes clear is that we now have a “factory” function that returns our would-be-recursive function, except it only runs one step and uses another function that knows how to run the next step. You can think that stepper(next_step) actually returns a “step”, which is then run by calling it with (5).

But we need a function to run the next step to pass to stepper. How to obtain it? We call the factory to get the next step. Yes, like stepper(stepper).

Indeed, note that if we run stepper(stepper(stepper(stepper(stepper(stepper())))))(5), we get the equivalent result as fact(5). (You can try it in the Lua interpreter!)

Of course, we don’t want to write it like that. The plan, therefore, is to get a function that combines all these calls. It’s time for the combinator:

combinator = function(stepper)
                -- ?
             end

This function is meant to return something to replace all these nested stepper()s, so we want it to return a function that accepts our input argument:

combinator = function(stepper)
                return function(arg)
                          -- ?
                       end
             end

And it needs to run stepper() on the argument, of course, so it computes our desired function. But what will we pass as the next step?

combinator = function(stepper)
                return function(arg)
                          return stepper( --[[ ? ]] )(arg)
                       end
             end

Well, if what we want to produce is stepper(stepper(stepper(stepper(... then maybe what we need is to pass it stepper() again?

combinator = function(stepper)
                return function(arg)
                          return stepper(stepper)(arg) -- no!
                       end
             end

This is wrong because it will only run stepper twice (and of course we know the combinator doesn’t look like this). Think of stepper(stepper) function as it runs. If we pass 5 as the argument we get stepper(stepper)(5). It will get 5 as n and stepper as the next_step argument. It will eventually call next_step(n-1), that is stepper(4). But stepper expects a function as the argument, and not a number. In fact, if we run this in the Lua interpreter we get a type error. (In untyped lambda calculus we have no notion of “type error”, of course: what we get is just a lambda expression that makes no sense for us.)

We need to give stepper a function, so looks like we’ll have to make one.

combinator = function(stepper)
                local some_function = function()
                                         -- ?
                                      end
                return function(arg)
                          return stepper(some_function)(arg)
                       end
             end

(Again, I’m using a local variable here strictly as a “macro substitution”, just to make things more readable.)

To find out what to put in some_function, let’s think of what it will be used for. It will be passed to stepper, and therefore be used as the next_step. Does it mean it needs to take the numeric argument, then?

combinator = function(stepper)
                local some_function = function(arg)
                                         -- ?
                                      end
                return function(arg)
                          return stepper(some_function)(arg)
                       end
             end

If we’re receiving a parameter, it’s fair to assume we’ll have to do something with it. What we want to do with our input arguments is to run steps on them, so let’s get a step with stepper and do it.

combinator = function(stepper)
                local some_function = function(arg)
                                         return stepper( --[[ ? ]] )(arg)
                                      end
                return function(arg)
                          return stepper(some_function)(arg)
                       end
             end

Oh-oh, we’re back to the problem of what to give stepper! Does this mean we got into a loop? Well, note that if we just kept on doing the same, we would get the equivalent to stepper(stepper(stepper(...))):

combinator = function(stepper)
                local and_yet_another_step = -- ... !!!
                local yet_another_step = function(arg)
                                            return stepper(and_yet_another_step)(arg)
                                         end
                local another_step = function(arg)
                                        return stepper(yet_another_step)(arg)
                                     end
                return function(arg)
                          return stepper(another_step)(arg)
                       end
             end

But no, it does not mean we’re stuck in a loop. It means we’re getting close.

What we missed is that when we considered what to pass to the first stepper, we simply said it had to be a function taking the numeric argument. But what we want is a function that returns the result of stepper’s computation, but also produces the function for the next step! What we need is another factory-style function like when we transformed fact into stepper. Instead of going down one step, we’ll make a factory function that generates another function that goes down one step. While we’re at it, and that’s a key point, we’ll remove from it the responsibility of figuring out how to descend to the next level. We’ll do it just like when we did when we got rid of the recursion — just assume you get what you need as a parameter:

combinator = function(stepper)
                local make_step = function(next_step)
                                     return function(arg)
                                               return stepper(next_step)(arg)
                                            end
                                  end
                return function(arg)
                          return stepper(make_step)(arg)
                       end
             end

With a function like make_step, we can produce as many steps as we want. If you are mathematically inclined, by now you may be thinking of induction: yes, have we just constructed a way of solving one step of our problem assuming we have the solution for the next. That’s the spirit.

So, are we there yet? Does this work?

Not yet. Remember when we tried to do stepper(stepper)? It didn’t work because stepper doesn’t want a factory as an argument. It wants an actual step (remember that in our example it will use it to get the factorial of n-1). In other words, we don’t want to pass it a factory like make_step. We want make_step to make us a step. So let’s call it!

combinator = function(stepper)
                local make_step = function(next_step)
                                     return function(arg)
                                               return stepper(next_step)(arg)
                                            end
                                  end
                return function(arg)
                          return stepper(make_step( --[[ ? ]] ))(arg)
                       end
             end

Ok, we’ll call it, but what will we give it? Let’s look at what the argument of make_step is used for.

Wait a second, it is used to make a step as well! So, do we just give it… itself?

combinator = function(stepper)
                local make_step = function(next_step)
                                     return function(arg)
                                               return stepper(next_step(next_step))(arg)
                                            end
                                  end
                return function(arg)
                          return stepper(make_step(make_step))(arg)
                       end
             end

Yes, we pass it to itself. And we do the same with next_step too, since we’re doing the exact same thing. You see, here is where everything comes together: to have recursion, we need to have self-reference; and in lambda calculus, while a lambda term cannot reference itself in an abstraction (function definition), we have that a lambda term, when bound to a variable, can be applied to itself. The function above is indeed a fixed point combinator.

You may be finding strange that make_step passes itself as a parameter, then calls itself passing itself as a parameter… is this an infinite loop? Not necessarily, because in between each step, we don’t run stepper(next_step(next_step)) right away. It is “packed” inside a function(arg)...end. (These are the extra lambdas that the Z combinator has compared to the Y combinator.)

Speaking of that, is this code the same as the straight Lua translation of the Z combinator I presented above? Let’s recap:

Z = function(f)
       return (function(x)
                  return f(function (y)
                              return (x(x))(y)
                           end)
               end)(function(x)
                       return f(function (y)
                                   return (x(x))(y)
                                end)
                    end)
    end

Doesn’t look much like it, does it? But let’s take a closer look at our combinator. Turns out that in that final return, instead of building that step function explicitly, we can simply call make_step and get the same results:

combinator = function(stepper)
                local make_step = function(next_step)
                                     return function(arg)
                                               return stepper(next_step(next_step))(arg)
                                            end
                                  end
                return make_step(make_step)
             end

(In compiler terms, it is as if the first iteration had been “unrolled” before.)

Remember when I said the local variables were just macro substitutions? If we expand the two instances of make_step, we get this:

combinator = function(stepper)
                return (function(next_step)
                           return function(arg)
                                     return stepper(next_step(next_step))(arg)
                                  end
                        end)(function(next_step)
                                return function(arg)
                                          return stepper(next_step(next_step))(arg)
                                       end
                             end)
             end

Now, the only difference is that stepper is being called inside function(arg)...end, while in the other one it’s being called outside. It works the same, because the purpose of function(arg)...end there is just to “hold” the expansion of arguments between each call of stepper. We need this because Lua does not use lazy evaluation. To make our function just like Z, we do this:

combinator = function(stepper)
                return (function(next_step)
                           return stepper(function(arg)
                                             return (next_step(next_step))(arg)
                                          end)
                        end)(function(next_step)
                                return stepper(function(arg)
                                                  return (next_step(next_step))(arg)
                                               end)
                             end)
             end

There we are! The exact same function, only with different variable names.

Finally, to test it:

local factorial_step = function(f)
                          return function (n)
                             if n == 0 then
                                return 1
                             else
                                return n * f(n - 1)
                             end
                          end
                       end
local factorial = combinator(factorial_step)

print(factorial(5)) -- we get 120, yet no function in our code ever calls itself

As we can see, there’s nothing magical about the Y combinator: it produces recursion by using the only form of self-reference provided by the lambda calculus, which is self-application of variables. To get the loop started, since there’s no way to bind the outermost abstraction (function) to a term (variable) so it can apply to (call) itself, it is simply written down twice.

I tried to explain this this as clearly I could (given the assumptions I made about prior knowledge about programming languages and basic lambda calculus). This was actually the flow of thought I followed when understanding the combinator. It clearly follows a “programmer” mindset, which may not be the best method for everyone, but I hope it may be useful for others as it was for me. Corrections, comments and suggestions are welcome.

Posted by hisham on Monday, April 4, 2011 12:04:14 in en_US, Computing, Language

🐘 Mastodon ▪ RSS (English), RSS (português), RSS (todos / all)

🔗 Understanding, at last, the Y Combinator - a programmer-friendly perspective

Follow

Last 10 entries

Search

Admin