🔗 Conway’s Law applied to the industry as a whole

Melvin Conway famously said that organizations design systems that mirror their own communication structure. But how about Conway’s Law applied to the entire industry rather than a single company?

The tech industry, and open source (OSS) in particular, are mostly shaped now around the dominating communication structure — GitHub. Nadia Eghbal’s book “Working in Public” does a great job at explaining how OSS’s centralization around a big platform mirrors what happened everywhere on the internet, with us going from personal websites to social networks.

Another huge shift in organizational and communication structure, especially in Open Source, has been the increasing coalescence of maintainership: we historically talk about “a loosely-knit group of contributors” but most OSS nowadays is written by employees of big companies.

The commit stats in big projects like the Linux kernel indicate this, as do GitHub stats and the like. There’s a long tail of small independent contributors, of course, but by quantity major projects are dominated by those hired full-time to work on it.

One thing I haven’t seen discussed a lot is how much this reality changes the way projects are run and developed. Sometimes we see it coming up in particular cases, such as the relationship between Amazon and Rust, but this is a general phenomenon.

When Canonical came into the scene back in 2004-2005, I remember distinctly noticing their impact on OSS; it wasn’t just “more getting done” (yay?) but also what and how—various projects shifted direction around that time (GNOME comes to mind); it didn’t feel like a coincidence.

I don’t mean to imply it’s all bad, just that we don’t discuss enough about how the influence of Big Co development styles affect, in a “Conway’s-law-way”, the development of OSS, and even tech in general, since both open and closed development are so linked nowadays.

OSS has a big impact on how tech in general works (though the reliance of every company on OSS dependencies), and Big Cos have an impact on how OSS works (through their huge presence on the OSS developer community), so in this way they affect everybody. People bring in the experiences they know and how they’re used to working, from coding styles to architecture and deployment patterns to decision processes.

One great example where this is more evident is the “monorepo” discussion, which happens to projects of many sizes nowadays, and where Google and FB experiences are often brought up.

“help our codebase is too big” no, your company is too big. try sharding into microservice entities operating as a cluster in the same management substrate rather than staying as a monolith

— @myrrlyn on Twitter

The tweet above is such a great insight: we often see conversations about how to deal with huge codebases (using the likes of Google and FB as examples) AND we often see conversations about Big Tech monopolies — and how they’ve grown way beyond the status at which other monopolies were broken up in the past — but those two topics are hardly ever linked.

If we agree that some aspects of Big Tech as organizations are negative, how much of those do they bring into tech as technology practices via Conway’s Law? OSS seems to act as a filter that makes this relationship less evident, because contributions come from individuals, even though they work for these companies, and often replicate their practices, even if unknowingly.

These individuals will often, even if unknowingly, replicate practices from these companies. This is after all, a process of cultures spreading and influencing each other. It just seems to me that we as an industry are not aware enough of this phenomenon, and we probably should be more attuned to this.

Posted by hisham on Wednesday, March 23, 2022 14:41:28 in en_US, Coding, Computing, Culture

🔗 Data Oriented Design, a.k.a. Lower Level Programming?

I’m not sure if this title is clickbaity, but it certainly summarizes some of the impressions I wanted to write about.

Yesterday I watched Andrew Kelley’s fun talk on Practical Data Oriented Design — do check it out! — and this post will contain some “spoilers” (as in, I will discuss his takeaways). I was drawn to the talk for two reasons: first, because I wanted to check if I was up-to-date on my programming TLAs, but also because he starts by talking about how he felt he had been stuck in a plateau as a programmer for the past decade — a feeling I’m sure many of us have felt at times! — and how this new knowledge got him out of it.

The bulk of the talk, and his takeways on refactoring his Zig compiler to use Data Oriented Design, is on how to get better runtime performance by making data structures smaller, so they are easier on the cache.

DOD techniques

Lots of the examples involved understanding struct alignment, to raise awareness of how much space gets wasted if you don’t take it into account. One way to deal with it includes replacing 64-bit pointers with 32-bit array indices (pointing out the assumption that we can only then have at most 4G items, which is often fair) and, most importantly, that type safety is lost once you no longer have a `MyStruct*` but just a `u32`. This comes along with moving from arrays of structures to structures of arrays, so you can pack data more tightly.

Another method is to apply “encodings” of data to avoid additional booleans in structs. Instead of an enum Creature { Elf, Orc } and a boolean isAlive, you do a enum Creature { AliveElf, DeadElf, AliveOrc, DeadOrc }, effectively moving that bit of data into the byte used by the enum. This is no different than packing structures using bitfields. Combining this with the switch to arrays, you can possibly even avoid using that bit altogether, by keeping two arrays dead_creatures and living_creatures.

As he went through the various examples of refactors to reach this goal, one by one I kept getting this sense of deja vu: “hey, this is how we used to program in the olden days!”

8-bit coding

If you look at how assembly for the 6502, the 8-bit processor used in the NES (my first game console) and the Apple II (my first computer!), you’ll see some of those tricks embedded in the processor design itself.

The 6502 is an 8-bit processor with a 16-bit address space: each instruction features a 1-byte opcode optionally followed by up to two bytes. Since the address space is 16-bits, addresses can go from 0 ($0000) to 65535 ($FFFF). So, to load a byte from memory position $1234 into the A register, you do a `LDA $1234`, which takes three bytes: `AD 34 12` (yes, the 6502 is little-endian!). However, to allow for more compact code, the first 256 bytes of memory have special processor support: addresses $0000 to $00FF, the “Zero Page”. So, just like in the enum trick for `AliveElf` and `DeadElf`, the “enum of opcodes” in the 6502 processor uses a separate number for loading from the Zero Page, so `LDA $0012` encodes into two bytes only: `A5 12`. This also reminds me of switching from pointers to integers, since that one-byte offset into the Zero Page is also a half-sized index that can be used given a set of assumptions.

Going from structs of arrays to arrays of structs is also a very old trick. In fact, I recall my earliest days of BASIC programming where we didn’t have structs and only had arrays, so storing each “attribute” in its own array was essentially the only way, so if I wanted to store x/y coordinates and a name for a bunch of characters, I’d have three arrays `XS`, `YS` and `NS$`. I also remember how, over time, using parallel arrays like this started to get frowned upon as “poor technique”, since arguably, code using arrays of structs is easier to read and maintain that that using structs of arrays, where you need to manually juggle more things in sync.

Refactoring for performance

And this is a common theme: all those old-school techniques being reframed in the talk as Data Oriented Design were in fact one day the norm, and they started to be phased out in the name of ease of development and maintenance. Yes, they do result in faster code — sometimes much faster code! — if you restructure your code to count each byte and optimize for cache usage. But a key word there is restructure. Writing code this way makes sense when you know how the data is be used, and how it will continue to be used. I was happy to see Andrew doing real-world measurements in his talk, and he correctly points out the assumptions involved, with comments such as “if we assume that most monsters are alive”, etc.

It’s very difficult to do this from the get-go, as you’re still iterating around your problem space. But once you know the typical behavior of the program, you can rework the data to match it. And yes, that will most likely give you a performance boost, but most often not without a cost in maintainability: how does that change in the structure changes the client code that uses it?

Further, how hard would it be to change it over again if the underlying assumptions change — for example, if the usage patterns change, if we port it over and the architecture changes, or if we need to add another bit of data into that structure. Sometimes those are important concerns, for example in a codebase of projects that change often and fast (think a startup evolving its product as market targets move), but sometimes projects reach a stage of maturity where you can step back, look at it and say: “Well, I think I have a good understanding of how this behaves now. What is the most memory-efficient representation for the data?”

Andrew’s case looks like a prime example for that. Once you get the tokenizer for a compiler done, you don’t really expect big seismical changes to its codebase (in fact, I think I could benefit from making some similar changes to my own Teal compiler!). In fact, a compiler is a perfect project for these kind of techniques: it’s fairly low-level and performance-critical code. If I recall correctly, Andrew used to work for a web company before Zig, so it makes sense that the style of code he gravitated towards before was higher-level than the one he’s excited about now.

What about maintenance

Optimizing code for performance always feels like a fun puzzle, but the maintenance cost is always in the back of my mind. Even in something like a compiler, making the code “as tight as possible” can backfire, if your implementation language does not allow for proper abstractions. The difficulties in adapting LuaJIT’s C codebase to the changes in newer versions of the Lua language come to mind. One such low-level trick in that codebase hinged on the fact that 32-bit address spaces were limited to 4GB, which allowed for some neat packing of data; that assumption, which was perfectly fair in the early 2000s, became central to the implementation. Of course, 64-bit systems arrived and assumptions changed. Getting rid of that limitation in a codebase full of smart data packing turned out to be a multi-year process.

Of course, if you can get a memory-efficient representation without hitting a maintenance cost, that’s the ideal situation. Some languages are better for this than others. I was impressed that Zig implements structs-of-arrays as MultiArrayList using apparently the same client interface as a regular ArrayList, such that changing from one to the other seems to be a “5-character change”. If you think of other languages that offer no such abstraction, that’s a much more impactful change throughout a codebase (think of all the places where you’d have to change a `monsters[i]->health` into `monster_healths[i]`, and how the memory management of those arrays and their contents change). I’ve also seen Edward Kmett pull some very cool tricks in Haskell combining super-efficient internal representations with very clean high-level abstractions.

In conclusion…

Still, I think it’s nice that some “old-school” techniques are getting a fresh coat of paint and are being revisited. We all benefit from being more performance conscious, and thinking about also means thinking about when to do it.

There’s something to be said about bringing back “old-school” techniques for programming, though, especially for those of us old enough to remember them: the trade-offs for modern architectures are definitely different. Andrew raises a good point about memoization vs. recomputation: the kinds of things you should choose to memoize when coding for the 6502 processor on an NES are very different than those for a modern x86-64. So it’s actually good that those things are being rethought over rather than just rehashed — there’s too much outdated advice out there, especially regarding performance.

The one piece of advice regarding performance that never goes old is: measure. And keep measuring, to see if the tricks you’re keen on using still make sense as the years go by! Another conclusion we get from this is that optimization and abstractions are not at odds with each other, but in fact, combining them, across language and application levels, is the right way to do it, so that we can keep the performance and the high-level code — but that’s probably a subject for another time!

Posted by hisham on Saturday, February 19, 2022 15:08:36 in en_US, Coding, Computing, Language

🔗 The algorithm did it!

Earlier today, statistician Kareem Carr posted this interesting tweet, about what people out there mean when they say “algorithm”, which I found to be a good summary:

When people say “algorithms”, they mean at least four different things:

1. the assumptions and description of the model

2. the process of fitting the model to the data

3. the software that implements fitting the model to the data

4. The output of running that software

Unsurprisingly, this elicited a lot of responses from computer scientists, raising the point that this is not what the word algorithm is supposed to mean (you know, a well-defined sequence of steps transforming inputs into outputs, the usual CS definition), including a response from Grady Booch, a key figure in the history of software engineering.

I could see where both of them were coming from. I responed that Carr’s original tweet not was about what programmers mean when we say “algorithms” but what the laypeople mean when they say it or read it in the media. And understanding this distinction is especially important because variations of “the algorithm did it!” is the new favorite excuse of policymakers in companies and governments alike.

Booch responded to me, clarifying that his point is that “even most laypeople don’t think any of those things”, which I agree with. People have a fuzzy definition of what an algorithm is, at best, and I think Carr’s list encompasses rather well the various things that are responsible for the effects that people credit on a vague notion of “algorithm” when people use that term.

Booch also added that “it’s appropriate to establish and socialize the correct meaning of words”, which simultaneously extends the discussion to a wider scope and also focuses it to the heart of the matter about the use of “algorithm” in our current society.

You see, it’s not about holding on to the original meaning of a word. I’m sure a few responses to Carr were of the pedantic variety, “that’s not what the dictionary says!” kind of thing. But that’s short-sighted, taking a prescriptivist rather than descriptivist view of language. Most of us who care about language are past that debate now, and those of us who adhere to the sociolinguistic view of language even celebrate the fact language shifts, adapts and evolves to suit the use of its speakers.

Shriram Krishnamurthi, CS professor at Brown, joined in on the conversation, observing that this shift in the language as a fait accompli:

I’ve been told by a public figure in France (who is herself a world-class computer scientist) — who is sometimes called upon by shows, government, etc. — that those people DO very much use the word this way. As an algorithms researcher it irks her, but that’s how it is.

Basically, we’ve lost control of the world “algorithm”. It has its narrow meaning but it also has a very broad meaning for which we might instead use “software”, “system”, “model”, etc.

Still, I agreed with Booch that this is still a fight worth fighting. But not to preserve our cherished technical meaning of the term, to the dismay of the pedants among our ranks, but because of the observation of the very circumstances that led to this linguistic shift.

The use of “algorithm” as a vague term to mean “computers deciding things” has a clear political intent: shifting blame. Social networks boosting hate speech? Sorry, the recommendation algorithm did it. Racist bias in criminal systems? Sorry, it was the algorithm.

When you think about it, from a linguistic point of view, it is as nonsensical as saying that “my hammer assembled the shelf in my living room”. No, I did, using the hammer. Yet, people are trained to use such constructs all the time: “the pedestrian was hit by a car”. Note the use of passive voice to shift the focus away from the active subject: “a car hit a pedestrian” has a different ring to it, and, while still giving agency to a lifeless object, is one step closer to making you realize that it was the driver who hit the pedestrian, using the car, just like it was I who built the shelf, using the hammer.

This of course leads to the “guns don’t kill people, people kill people” response. Yes, it does, and the exact same questions regarding guns also apply regarding “algorithms” — and here I use the term in the “broader” sense as put forward by Carr and observed by Krishnamurthi. Those “algorithms” — those models, systems, collections of data, programs manipulating this data — wield immense power in our society, even, like guns, resulting in violence, and like guns, deserving scrutiny. And when those in possession of those “algorithms” go under scrutiny, they really don’t like it. One only needs to look at the fallout resulting from the work by Bender, Gebru, McMillan-Major and Mitchell, about the dangers of extremely large language models in machine learning. Some people don’t like hearing the suggestion that maybe overpowered weapons are not a good idea.

By hiding all those issues behind the word “algorithm”, policymakers will always find a friendly computer scientist available to say that yes, an algorithm is a neutral thing, after all, it’s just a sequence of instructions, and they will no doubt profit from this confusion of meanings. And I must clarify that by policymakers I mean those both in public and private sphere, since policies put forward by the private tech giants on their platforms, where we spend so much of our lives, are as effecting on our society as public policies nowadays.

So what do we do? I don’t think it is productive to start well-actually-ing anyone who uses “algorithm” in the broader sense, with a pedantic “Let me interject for a moment — what you mean by algorithm is in reality a…”. But it is productive to spot when this broad term is being used to hide something else. “The algorithm is biased” — What do you mean, the outputs are biased? Why, is the input data biased? The people manipulating that data created a biased process? Who are they? Why did they choose this process and not another? These are better interjections to make.

These broad systems described by Carr above ultimately run on code. There are algorithms inside them, processing those inputs, generating those outputs. The use of “algorithm” to describe the whole may have started as a harmless metonymy (like when saying “White House” to refer to the entire US government), but it has since been proven very useful as a deflection tactic. By using a word that people don’t understand, the message is “computers doing something you don’t understand and shouldn’t worry about”, using “algorithm” handwavily to drift people’s minds away from the policy issues around computation, the same way “cloud” is used with data: “your data? don’t worry, it’s in the cloud”.

Carr is right, these are all things encompassing things that people refer to as “algorithms” nowadays. Krishnamurthi is right, this broad meaning is a reality in modern language. And Booch is right when he says that “words matter; facts matter”.

Holding words to their stricter meanings merely due to our love for the language-as-we-were-taught is a fool’s errand; language changes whether we want it or not. But our duty as technologists is to identify the interplay of the language, our field, and society, how and why they are being used (both the language and our field!). We need to clarify to people what the pieces at play really are when they say “algorithm”. We need to constantly emphasize to the public that there’s no magic behind the curtain, and, most importantly, that all policies are due to human choices.

Posted by hisham on Wednesday, March 31, 2021 17:39:33 in en_US, Coding, Philosophy, Computing, Culture, Freedom, Language, Politics

🔗 A love letter to bands, in music and code

This starts with the Beatles, but it’s not about them. It’s about us.

So, a friend tweeted earlier today the question “what is the final Beatles song“, and as usual, these topics lead to fun conversations. You see, the “end” of the Beatles is a fuzzy matter, because they recorded their last two albums in 1969 and 1970, but released them in reverse order — Abbey Road was recorded last and released first, and Let it Be was the other way around. And then, do the Anthology sessions from 1994 with Paul, George and Ringo playing over an old recording of John count? (I feel they do!)

As the conversation went on, she shared with me this little story (on everything2! they’re still around!!), which only drives its point home really deep if you’re a hardcore Beatles fan, but one element in it is that it imagines a Beatles album made from songs that the individual members released immediately post-breakup. That on itself is not very remarkable: building those imaginary what-if-they-stayed-together album setlists is a favorite pastime for Beatles fans. Invariably, the semi-obvious conclusion is that if you took the best tracks from each of the four solo albums from a given year, you’d make an album that’s better than the individual works. And as you can imagine, those playlists are widely available. Still, those always sound like compilations to my ears (and, without giving away too much, the story linked above addresses that beautifully).

To me, the fact that such fan-fictional albums sound like compilations and not like true albums has to do with the reason why I think collective works — be it in music or not — tend to be better in general than solo efforts. Because work done in a collective yields results of a different nature.

“People change like colors bleed,
As I sense a shade of you in me”
— “Color Bleed pt. 2”, from the album Color Bleed (2011)

Work done by two or more people always reflects that multitude, it’s always unlike any single person’s work. When I’m in a collective environment — and by that I mean any setting where my work is presented to and discussed by others as it is developed — even when I’m doing work completely on my own, even before I’ve had my first piece of feedback, I feel a sort of mind game playing in my head where I “play the part” of my peers and imagine what their feedback would be, be it consciously or subconsciously. I’m doing the work not only for myself, but for others too, whose opinions I care about. That affects in subtle and not-so-subtle ways the results of what I do. Even the work I do alone, when in such environment, is different and better than the work I do alone-alone.

When I recorded Color Bleed, I invited my friends who had played in a Pink Floyd cover band with me to join in. Even though I wrote all the songs — some written years prior, some written as the project took shape — all of them ended up influenced by the fact that I was playing the songs with them, and even though I tried to venture away from the cover stuff we were doing, there are clearly some floydian moments here and there, which maybe don’t sound so much like Pink Floyd to my ears, but certainly alluded to our favorite moments on stage together (and let me tell you, playing keyboards along with another keyboardist is really cool!) — by the point we were recording the original songs, the point was never to sound like the cover band, but for it to sound like us. So even if that wasn’t truly a “band effort” in the idealized sense that we imagine “four people in a garage”, it was never a lonely project: from the very beginning it was me and Coutinho, who was the other keyboard player in our band, who also owned the studio and took the role of producer/engineer, and then the other folks joined in contributing parts as we went. The end result was much better than anything I could achieve on my own, and most importantly, even the things that I did were better than if I had done them on my own.

This lineup ended up playing only one gig! I swear that if this pandemic is ever over, I’ll get the band together for a Color Bleed 10th Anniversary concert.

That feeling of being “in a band” is not exclusive to music. I definitely felt it as a software developer as well. At my last job, I made the interesting observation that it was the third time in my career that I was part of a team called “Core Team”. The first one was back in college, and it was the most special one — maybe because it was the first, maybe because it was the experience that shaped the rest of my career: the Core Team for GoboLinux, my first successful open source project.

Looking back, it’s funny how it started very much like Color Bleed. Back in 2002, I was at the university and I had this idea for a crazy Linux distribution which would require recompiling the entire system from scratch. A friend joined in and we did it together over the course of a weekend. One by one, more friends joined in, switched over their systems, created bootable CDs, made kernel patches adding cool features, we were just having lots of fun tinkering with the OS. Then we were slashdotted (think “HN frontpage x10”), then we were on a magazine cover, photo shoots and all, then we were invited for internships in Silicon Valley, all the way from Brazil. I never took music seriously, but at least in tech I had the “indie band having its one-hit-wonder moment” experience. And yes, it was as cool as it sounds.

Cover story! And a cool band pic!

There we are, next to KDE and Gentoo! We’re “for real”!

The second Core Team was at a local startup, where I worked with Guilherme, who was also in GoboLinux. Since we already had the chemistry from that project, this really felt more like a side project than a new band — think Petrucci and Portnoy doing LTE away from Dream Theater (yes, I just blatantly made that comparison haha).

In between the first and second Core Teams I got my Masters degree, and between the second and third, I got my PhD. I loved it at PUC-Rio (or else I wouldn’t have gone there twice!), and made great friends each time, but it never felt like a band. In both occasions we had a research group, but each person was running their own project, with little or no overlap. Opportunities for collaboration were limited, everyone was on a different schedule, and while we created a great environment which I’ve called home for many years, it just wasn’t “a band”.

The third Core Team was at my last gig, at Kong. Again, that felt like a band — a scattered group of hackers from all over the world — China, Finland, Spain, Brazil, Peru, Canada, US — brought together because of their Lua knowledge to maintain this open source project. Each one with very different skills and backgrounds, and it was complimentary: it felt like each one of us played a different instrument. And doing creative work as part of that group felt like doing it in a band context: even when I did stuff on my own, I had it in my mind that it is was being done for that particular team to review and maintain (even if each of us would still put our personal flavor to the code). I had a great 3½ years with that team, where I learned a lot and played different instruments—I mean, roles, and then I put in practice something that I learned from the many bands I played since I was a teenager: leave on a high note. Looking back, the only regret I have is that… apparently we never took a picture of our team? (To be honest, I’m not sure we’ve ever got my last lineup of the team together in the same room — it was supposed to have happened in 2020, but then the pandemic hit. Maybe a pic of some previous lineup at least?)

Not all of my coding projects were “bands”. Even though I had tons of pull requests with contributions over the years, the process of developing htop was always a solo endeavor. I liked it this way, for a good while it was my chill-on-my-own thing away from everything and everyone. But then I drifted away from it, and free/open source projects (FOSS) projects need maintenance. From a distance, I feel like that the new team who picked it up works like a band. I’m happy for them!

Maybe it’s better if FOSS projects work more like bands than solo projects — bands often outlast their members, after all. But then sometimes you just want to pick up an acoustic guitar and do stuff on your own. There’s got to be a place for that too. Now that I think of it, I’ve never been part of a really huge FOSS project — I have a tendency of starting projects rather than joining established ones! — and I don’t know if this “band mentality” of mine has prevented any of my projects from growing (whenever I read about the structure of the Rust project, even before the Foundation, it seemed super sprawling!) but I know that a team can only feel like a team when it is about the size of a band, and I know that a team feels best when it feels like a band.

Not all teams feel like a band, and to be honest, not even all bands. But when it happens, it’s somewhat magical. It’s something that build memories that you take with you forever, and which change you in some way or another. Whenever I listen to the solo works from my favorite artists who left my favorite bands, I can always tell that the influence of their old bandmates is always obviously there, whether they want it or not, whether they’re Paul McCartney or John Lennon, David Gilmour or Roger Waters. I’m sure the influences from all the great people from my past history are there whenever I play, and whenever I code.

Posted by hisham on Friday, March 26, 2021 01:55:27 in en_US, Coding, Music, Culture

🔗 Compiler versus Transpiler: what is a compiler, anyway?

Teal was featured on HN today, and one of the comments was questioning the fact that the documentation states that it “compiles Teal into Lua”:

We need better and more rigorous terms in computing science. This use of the compiler word blurs the meaning of interpreted vs compiled languages.

I was under the assumption that it would generate executable machine code, not Lua source code.

I thought that was worth replying to because it allowed to dispel two misconceptions at once.

First, if we want to be rigorous about computer science terms, calling it “interpreted vs compiled languages” is a misnomer, because being interpreted or compiled is not a property of the language, but of the implementation. There have been things such as a C interpreter and an ahead-of-time compiler for PHP which generates machine code.

But then, we get to the main course, the use of “compiler”.

The definition of compiler has never assumed generating executable machine code. Already in the 1970s, Pascal compilers have generated P-code (a form of “bytecode” in Java parlance), which was then interpreted. In the 1980s, Turbo Pascal produced machine code directly.

I’ve seen the neologism “transpiler” being very frowned upon by the academic programming language community precisely because a compiler is a compiler, no matter the output language — my use of “compiler” there was precisely because of my academic background.

I remember people joking around on Academic PL Twitter jokingly calling it the “t-word” even. I just did a quick Twitter search to see if I could find it, and I found a bunch of references dating from 2014 (though I won’t go linking people’s tweets here). But that shows how out-of-date this blog post is! Academia has pretty much settled on not using the “compiler” vs. “transpiler” definition at all by now.

I don’t mind the term “transpiler” myself if it helps non-academics understand it’s a source-to-source compiler, but then, you don’t see people calling the Nim compiler, which generates C code then compiles it into machine code, a “transpiler”, even though it is a source-to-source compiler.

In the end, “compiler” is the all-encompassing term for a program that takes code in one language and produces code in another, be it high-level or machine language — and yes, that means that pedentically an assembler is a compiler as well (but we don’t want to be pedantic, right? RIGHT?). And since we’re talking assembler, most C compilers do not generate executable machine code either: gcc produces assembly, which is then turned into machine code by gas. So gcc is a source-to-source compiler? Is Turbo Pascal more of a compiler than gcc? I could just as well add an output step in the Teal compiler to produce an executable in the output using the same techniques of the Pascal compilers of the 70s. I don’t think that would make it more or less of a compiler.

As you can see, the distinction of “what is a transpiler” reduces to “what is source code” or “what is a high-level language”, the latter especially having a very fuzzy definition, so in the end my sociological observation on the uses of “transpiler” vs. “compiler” tends to boil down to people’s prejudices of “what makes it a Real, Hardcore Compiler”. But being a “transpiler” or not doesn’t say anything about the project’s “hardcoreness” either — I’m sure the TypeScript compiler which generates JavaScript is a lot more complex than a lot of compilers out there which generate machine code.

Posted by hisham on Thursday, February 25, 2021 16:27:01 in en_US, Coding, Computing, Language

🐘 Mastodon ▪ RSS (English), RSS (português), RSS (todos / all)