 # 🔗 Understanding, at last, the Y Combinator - a programmer-friendly perspective

This post is aimed towards comp sci students who studied lambda calculus but never really “got” the Y Combinator, the best-known fixed point combinator. Lambda calculus does not feature recursion, but using fixed point combinators, we can easily produce recursive functions, making it able to describe all computable functions (in other words, equivalent to a universal Turing machine). Here’s our friend Y:

Y = λf· (λx· f (x x)) (λx· f (x x))

With it, you can write a non-recursive version of a function, apply Y to it, and you get the equivalent recursive functionality, even though lambda calculus does not provide recursive functions. It’s not hard to understand what it does, but the mystery for me was always how it does it.

In my experience, a major barrier to understanding it was having to deal with several unfamiliar things at the same time, each of which makes a different part of my brain uncomfortable: the semantics of lambda calculus, the syntax of lambda calculus, and the problem at hand.

So, to focus on the problem at hand, let’s get rid of the rest.

Semantics of lambda calculus: untyped lambda calculus amounts (roughly speaking) to a very simple functional language with lazy evaluation. All we have is functions, but through smart encodings we can represent numbers and booleans with nicely crafted chains of functions. To simplify things, then, let’s just assume we already have numbers, arithmetic, if/then/else. Next, let’s get rid of lazy evaluation and switch to early evaluation, which is what we’re used to use in typical imperative programming languages. To do this, we’ll have to use this variant of Y, which is called Z:

Z = λf· (λx· f (λy· x x y)) (λx· f (λy· x x y))

Yes, it is a bit longer (we added the blue bits), but believe me, it’s easier to understand, and once you get this, it’s just a matter of recalling lazy evaluation from those functional programming classes and applying the same logic to Y.

Next, the syntax: lambda calculus is made of terms, abstractions and applications, which in programming language parlance are just variables, function declarations and function calls. Therefore, we can simply rewrite it using some familiar notation and get rid of the weird syntax and precedence rules for now. I won’t even use a functional language: let’s take a restricted form of Lua, using only features provided by lambda calculus to make the syntax clearer:

```Z = function(f)
return (function(x)
return f(function (y)
return (x(x))(y)
end)
end)(function(x)
return f(function (y)
return (x(x))(y)
end)
end)
end
```

All right, it’s not exactly readable. But we’ll get there. Assuming you’re comfortable with functions as first-class values that can be returned and passed as arguments, the weirdest thing here is that in the first return we have a function being declared and called “inline”, receiving as a parameter another function identical to itself. Let’s make this flow a bit clearer by using local variables, but strictly as “aliases”. In other words, it won’t run any different, so we don’t use any extra features that are not part of lambda calculus — note that the our use of local variables here will be equivalent to macro substitution:

```Z = function(f)
local f1 = function(x)
return f(function (y)
return (x(x))(y)
end)
end
local f2 = function(x)
return f(function (y)
return (x(x))(y)
end)
end
return f1(f2)
end
```

Ok, so this is the function we want to understand. It still makes no sense, but we can see that `Z` calls one function (`f1`) passing to it an identical function (`f2`). These identical functions return the result of calling `f` (the thing we want to make recursive), but giving as its argument another function that uses something that calls itself (`x(x)`). A strong scent of self-reference in all this — no wonder, as we know the key to recursion is in there somewhere. But let’s stop trying to reverse-engineer it. Let’s build it instead.

First, let’s recall what this function is good for. What we’d like to have is recursive functions, like this one that calculates the factorial:

```fact = function(n)
if n == 0 then
return 1
else
return n * fact(n-1)
end
end
```

But we can’t have this in lambda calculus. Actually, all of it is fine, except for the recursive call to fact(). We don’t have recursion yet.

The best we can do is to write a function that does one step of the recursion and leaves the next step to be done by someone else. Like this:

```stepper = function(next_step, n)
if n == 0 then
return 1
else
return n * next_step(n-1)
end
end
```

However, I said we would only use features provided by lambda calculus, which means we don’t have multiple arguments. Fortunately, through currying, we can get the equivalent to multiple arguments. This is one aspect of lambda calculus which I won’t hide in Lua syntax because it will come handy in our understanding. We’ll need to make `stepper` like this:

```stepper = function(next_step)
return function(n)
if n == 0 then
return 1
else
return n * next_step(n-1)
end
end
end
```

The difference is that now, instead of doing `stepper(some_next_step, 5)`, we have to do `stepper(some_next_step)(5)`.

What this `stepper` construction makes clear is that we now have a “factory” function that returns our would-be-recursive function, except it only runs one step and uses another function that knows how to run the next step. You can think that `stepper(next_step)` actually returns a “step”, which is then run by calling it with `(5)`.

But we need a function to run the next step to pass to `stepper`. How to obtain it? We call the factory to get the next step. Yes, like `stepper(stepper)`.

Indeed, note that if we run `stepper(stepper(stepper(stepper(stepper(stepper())))))(5)`, we get the equivalent result as `fact(5)`. (You can try it in the Lua interpreter!)

Of course, we don’t want to write it like that. The plan, therefore, is to get a function that combines all these calls. It’s time for the combinator:

```combinator = function(stepper)
-- ?
end
```

This function is meant to return something to replace all these nested `stepper()`s, so we want it to return a function that accepts our input argument:

```combinator = function(stepper)
return function(arg)
-- ?
end
end
```

And it needs to run `stepper()` on the argument, of course, so it computes our desired function. But what will we pass as the next step?

```combinator = function(stepper)
return function(arg)
return stepper( --[[ ? ]] )(arg)
end
end
```

Well, if what we want to produce is `stepper(stepper(stepper(stepper(...` then maybe what we need is to pass it `stepper()` again?

```combinator = function(stepper)
return function(arg)
return stepper(stepper)(arg) -- no!
end
end
```

This is wrong because it will only run `stepper` twice (and of course we know the combinator doesn’t look like this). Think of `stepper(stepper)` function as it runs. If we pass `5` as the argument we get `stepper(stepper)(5)`. It will get `5` as `n` and `stepper` as the `next_step` argument. It will eventually call `next_step(n-1)`, that is `stepper(4)`. But `stepper` expects a function as the argument, and not a number. In fact, if we run this in the Lua interpreter we get a type error. (In untyped lambda calculus we have no notion of “type error”, of course: what we get is just a lambda expression that makes no sense for us.)

We need to give `stepper` a function, so looks like we’ll have to make one.

```combinator = function(stepper)
local some_function = function()
-- ?
end
return function(arg)
return stepper(some_function)(arg)
end
end
```

(Again, I’m using a local variable here strictly as a “macro substitution”, just to make things more readable.)

To find out what to put in `some_function`, let’s think of what it will be used for. It will be passed to `stepper`, and therefore be used as the next_step. Does it mean it needs to take the numeric argument, then?

```combinator = function(stepper)
local some_function = function(arg)
-- ?
end
return function(arg)
return stepper(some_function)(arg)
end
end
```

If we’re receiving a parameter, it’s fair to assume we’ll have to do something with it. What we want to do with our input arguments is to run steps on them, so let’s get a step with `stepper` and do it.

```combinator = function(stepper)
local some_function = function(arg)
return stepper( --[[ ? ]] )(arg)
end
return function(arg)
return stepper(some_function)(arg)
end
end
```

Oh-oh, we’re back to the problem of what to give `stepper`! Does this mean we got into a loop? Well, note that if we just kept on doing the same, we would get the equivalent to `stepper(stepper(stepper(...)))`:

```combinator = function(stepper)
local and_yet_another_step = -- ... !!!
local yet_another_step = function(arg)
return stepper(and_yet_another_step)(arg)
end
local another_step = function(arg)
return stepper(yet_another_step)(arg)
end
return function(arg)
return stepper(another_step)(arg)
end
end
```

But no, it does not mean we’re stuck in a loop. It means we’re getting close.

What we missed is that when we considered what to pass to the first `stepper`, we simply said it had to be a function taking the numeric argument. But what we want is a function that returns the result of `stepper`’s computation, but also produces the function for the next step! What we need is another factory-style function like when we transformed `fact` into `stepper`. Instead of going down one step, we’ll make a factory function that generates another function that goes down one step. While we’re at it, and that’s a key point, we’ll remove from it the responsibility of figuring out how to descend to the next level. We’ll do it just like when we did when we got rid of the recursion — just assume you get what you need as a parameter:

```combinator = function(stepper)
local make_step = function(next_step)
return function(arg)
return stepper(next_step)(arg)
end
end
return function(arg)
return stepper(make_step)(arg)
end
end
```

With a function like `make_step`, we can produce as many steps as we want. If you are mathematically inclined, by now you may be thinking of induction: yes, have we just constructed a way of solving one step of our problem assuming we have the solution for the next. That’s the spirit.

So, are we there yet? Does this work?

Not yet. Remember when we tried to do `stepper(stepper)`? It didn’t work because `stepper` doesn’t want a factory as an argument. It wants an actual step (remember that in our example it will use it to get the factorial of n-1). In other words, we don’t want to pass it a factory like `make_step`. We want `make_step` to make us a step. So let’s call it!

```combinator = function(stepper)
local make_step = function(next_step)
return function(arg)
return stepper(next_step)(arg)
end
end
return function(arg)
return stepper(make_step( --[[ ? ]] ))(arg)
end
end
```

Ok, we’ll call it, but what will we give it? Let’s look at what the argument of `make_step` is used for.

Wait a second, it is used to make a step as well! So, do we just give it… itself?

```combinator = function(stepper)
local make_step = function(next_step)
return function(arg)
return stepper(next_step(next_step))(arg)
end
end
return function(arg)
return stepper(make_step(make_step))(arg)
end
end
```

Yes, we pass it to itself. And we do the same with `next_step` too, since we’re doing the exact same thing. You see, here is where everything comes together: to have recursion, we need to have self-reference; and in lambda calculus, while a lambda term cannot reference itself in an abstraction (function definition), we have that a lambda term, when bound to a variable, can be applied to itself. The function above is indeed a fixed point combinator.

You may be finding strange that make_step passes itself as a parameter, then calls itself passing itself as a parameter… is this an infinite loop? Not necessarily, because in between each step, we don’t run `stepper(next_step(next_step))` right away. It is “packed” inside a `function(arg)...end`. (These are the extra lambdas that the Z combinator has compared to the Y combinator.)

Speaking of that, is this code the same as the straight Lua translation of the Z combinator I presented above? Let’s recap:

```Z = function(f)
return (function(x)
return f(function (y)
return (x(x))(y)
end)
end)(function(x)
return f(function (y)
return (x(x))(y)
end)
end)
end
```

Doesn’t look much like it, does it? But let’s take a closer look at our combinator. Turns out that in that final return, instead of building that step function explicitly, we can simply call `make_step` and get the same results:

```combinator = function(stepper)
local make_step = function(next_step)
return function(arg)
return stepper(next_step(next_step))(arg)
end
end
return make_step(make_step)
end
```

(In compiler terms, it is as if the first iteration had been “unrolled” before.)

Remember when I said the local variables were just macro substitutions? If we expand the two instances of `make_step`, we get this:

```combinator = function(stepper)
return (function(next_step)
return function(arg)
return stepper(next_step(next_step))(arg)
end
end)(function(next_step)
return function(arg)
return stepper(next_step(next_step))(arg)
end
end)
end
```

Now, the only difference is that `stepper` is being called inside `function(arg)...end`, while in the other one it’s being called outside. It works the same, because the purpose of `function(arg)...end` there is just to “hold” the expansion of arguments between each call of `stepper`. We need this because Lua does not use lazy evaluation. To make our function just like Z, we do this:

```combinator = function(stepper)
return (function(next_step)
return stepper(function(arg)
return (next_step(next_step))(arg)
end)
end)(function(next_step)
return stepper(function(arg)
return (next_step(next_step))(arg)
end)
end)
end
```

There we are! The exact same function, only with different variable names.

Finally, to test it:

```local factorial_step = function(f)
return function (n)
if n == 0 then
return 1
else
return n * f(n - 1)
end
end
end
local factorial = combinator(factorial_step)

print(factorial(5)) -- we get 120, yet no function in our code ever calls itself
```

As we can see, there’s nothing magical about the Y combinator: it produces recursion by using the only form of self-reference provided by the lambda calculus, which is self-application of variables. To get the loop started, since there’s no way to bind the outermost abstraction (function) to a term (variable) so it can apply to (call) itself, it is simply written down twice.

I tried to explain this this as clearly I could (given the assumptions I made about prior knowledge about programming languages and basic lambda calculus). This was actually the flow of thought I followed when understanding the combinator. It clearly follows a “programmer” mindset, which may not be the best method for everyone, but I hope it may be useful for others as it was for me. Corrections, comments and suggestions are welcome.