Assemblers are not compilers!

For some reason, people (read: newbies) often talk about ‘compiling assembly code’, or using an ‘assembly compiler’. This is WRONG, people! And I will try to explain why, by offering a historical perspective, as usual.

Namely, if we go back to the early days of programming… The first programmable computers would take their input in the form of machine code: each instruction would be encoded as a set of bits, consisting of an opcode (the operation itself, such as ‘add’ or ‘subtract’) and its operands (the data to operate on, which can be a constant, a register, or a memory location for example).

When writing a program, the programmer would have to first line out the program in pseudocode, and then convert it to machine code by hand. In order to make things easier, each opcode would have a ‘mnemonic’: a short name which described the instruction. The programmer could then first write down the program in a listing of mnemonics for each instruction, and then convert each instruction to its machine code representation.

Here is an example of some code written by Steve Wozniak (the co-founder of Apple, and designer of the early Apple computers):

Steve Wozniak's floating point routines

As you can see, he’s written out the code in a number of columns. The first column is the memory address of each instruction, the second column contains the machine code bytes for that instruction, then follows the mnemonic representation of the code (the ‘human readable form’), and finally some comments.

Initially you would have to do this all by hand. The process of converting mnemonics to machine code became known as ‘assembling’ the code, and automated tools were developed for this, which became known as ‘assemblers’. The first assembler was written for the EDSAC computer in 1949, and was called “initial orders”.

The first compiler however, was written by Grace Hopper in 1952, for the A-0 programming language. And that in itself already shows that compiling and assembling are not seen as the same thing. Namely, since we already had assemblers at that time, then why did Grace Hopper bother to coin the term ‘compiler’, rather than just re-using the term ‘assembler’ to describe this new tool? Apparently there is a fundamental difference between the two types of tools.

The main difference between assembly language and other programming languages such as the A-0 language, is that assembly languages are always machine-dependent (after all, the mnemonics are merely a more human-readable form of the instructions that the machine supports, so different machines have different mnemonics), where other programming languages abstract away the physical machine, and work at a higher level. Many early compilers would also compile the higher level source listings into a machine-specific assembly listing, which would then be passed on to an assembler to generate the actual machine code.

Since a compiler works from a source listing at a higher level, it also needs to perform a more complex translation than an assembler. For an assembler there is generally a 1:1 mapping from mnemonics to machine code. There are some exceptions where a single instruction may be encoded in multiple ways, but generally any decisions an assembler needs to make at all during translation are very trivial and unambiguous (for example, picking the shortest encoding for a given instruction). For this reason, writing assembly code is as good as writing machine code by hand, as far as performance and size optimizations go.

Compilers however need to map variables used in the source code to machine-specific registers and memory locations, and try to decide the shortest and/or fastest possible sequence of instructions to translate the code. This causes complicated problems for register allocation and re-use for example. There is no 1:1 translation of high-level keywords and expressions to machine code. There are many possible alternatives, and a compiler will need to do a lot of analysis and use clever heuristics to try and come up with fast code.

Especially in the early days, compilers were rather naïve, and their translations would not come anywhere near assembly code optimized by hand. These days however, compilers have come a long way, and perhaps just as important: so have computers. Where early computers were still designed specifically for handwritten programs, over time more and more people started using compiled languages, and computers were designed more and more to make the job of compilers easier. Compilers generate code in an algorithmic way, and so they would only use a given subset of instructions. RISC CPUs would reduce the instructionset to only include the most-used instructions, and make these instructions run as fast as possible. Which would mean that the job of compilers became easier.

This entry was posted in Software development and tagged , , , , , , , . Bookmark the permalink.

20 Responses to Assemblers are not compilers!

  1. snemarch says:

    I’m not entirely sure what the point of this blog entry is?

    And I’d say the premise is wrong. A compiler, comp.sci. wise, is a program that transforms one computer language into another – this covers a wide range of uses, e.g.: the original C++ compilers that had C source code as output… compilers that have assembly as output… compilers that have machine code or bytecode as output… and even “assembly compilers” or, simply, assemblers.

    It does irk me a bit when people use the term “assembly compiler” or ” compile the assembly code” (especially since they’ll often say “assembler code” rather than “assembly” – come on, people, grammar ain’t that hard: an assembler assembles assembly code), but it’s not technically wrong.

    If you insist on a sharp separation, where would you draw the line, anyway? Branch-size optimizations? Macros? Optimizations that don’t ensure a predictable 1:1 mapping from input to output? If you define it as 1:1 mapping, then you’d kinda have to lump some of the early and naïve non-assembly compilers into your assembly category, wouldn’t you?

    • Scali says:

      The point of this blog is that people should stop calling assemblers compilers. As I point out: the term ‘compiler’ would never have existed if they are both the same thing. Therefore compilers can not be assemblers, so assemblers are not compilers.

      I believe I already made the separation: assemblers do not do any analysis or optimization of the code. Merely very trivial hardcoded heuristic choices (as I say, picking the shortest encoding for an instruction… which is an exceptional case anyway, since x86 is one of the few CPUs that even allows for multiple encodings anyway, and only in certain variations of assembly languages, since some require you to specify the size of each operand explicitly, bringing it back to 1:1 translation).

      Macros are part of the pre-processor, and are not assembled any more than templates in C++ are compiled. They are pre-processed, and the resulting code is assembled/compiled. This pre-processing might as well have been done by a separate program, and is not necessarily a function of an assembler or compiler.

      And no, I don’t think other compilers would ever qualify as a 1:1 mapping, since you simply do not specify each instruction and operand specifically (as I said: assembly language is machine-specific by definition, compiled languages are not). At the very least you could compile the same source code with any distribution of general purpose registers.

      And just as with the chiptune blog earlier: a lot of people may disagree, but that doesn’t make them right, historically. And like with chiptunes, I am not interested in trying to find some generic definition that catches all corner-cases anyone can think of. The point is to understand different levels of source code and different approaches to translating them from one form to another.

  2. snemarch says:

    Well, I don’t agree. The process of assembling is a form of compilation, albeit a very limited and simple one. Assembling is a specific (and narrow) form of compilation, as osmosis is a specific form of diffusion.

    I’m actually surprised you don’t agree, given that you’ve taken a comp.sci. degree.

    Anyway, I wouldn’t use the term “assembly compiler” myself, as it sounds silly – and there’s valuable semantic information in the specific word “assembler” compared to the generic word “compiler”.

    “assembly language is machine-specific by definition, compiled languages are not” – really? Java targets a pretty specific virtual machine 🙂 – and one could argue that C targets a (more loosely defined) machine (perhaps C++11 is a better example since it finally introduced a standardized memory model). Do a compiler where you take away optimization and use an übersimple register allocator, and you’ll get something that’s pretty close to a 1:1 mapping (and at least would be reversible from machine code back to source).

    But whatever, I’ll leave it at that – I know I’m right, but I also know you won’t budge 🙂

    • Scali says:

      Well, the thing is that historically “assembling is a form of compilation” does not really make sense. I think the term “compiling” just hasn’t been defined clearly enough to make the distinction more obvious in recent literature (assembly being dead for many years and all that).
      Yes, I did comp.sci, but recall that I did it more than a decade ago. Things change (comp.sci has turned into somewhat of a joke in recent years, where things like compiler design and assembly programming are being replaced with things like HTML, XML, JavaScript and other useless web-related technologies, math content has been watered down etc). Again, look at my chiptunes blog. I was there when the first chiptunes were released on Amiga. Most people talking about ‘chiptunes’ today have no clue about Amiga anyway, and think it has to do with 8-bit machines and their synthesizer chips.

      Where you fail with Java is that Java itself is not machine-specific (as Dalvik and J# prove: same language, different VMs). There is JASMIN, which is a Java assembly language, and THAT one is specific to the JVM (and has a 1:1 mapping, unlike Java). In fact, obfuscators take advantage of the fact that there is no 1:1 mapping by reordering the bytecode or inserting extra instructions to fool decompilers.

      Likewise you fail to grasp the real point of me referencing register allocation: If you cannot specify the registers in the source code, there is no way you can map them 1:1 either. The point about 1:1 mapping goes both ways: All possible assembly code can be translated 1:1 to machine code, and all possible machine code can be translated 1:1 to assembly code. If you were to translate machine code back to a compiled language such as C, you’d get a many-to-one mapping: many different machine code programs could map to the same C code. And then there are machine code programs that cannot be mapped at all (except with extreme kludges involving implementing an entire interpreter for the actual machine, which doesn’t really count anyway, because if you’d compile that, it’d map to a completely different machine code program than the one you started from).

  3. Rob says:

    When I was growing up, compilers compiled high level languages and eventually handed off their output to an assembler that translated it to machine code. If nothing else, that’s one reason for having two different terms but, in any case, I would never call an assembler a compiler cause I’m an old assembly language programmer from way back who wishes he still was doing that.

  4. Timo says:

    The 1:1 translation from assembly to machine code isn’t compilation, agreed. That’s not the end of it when it comes to “compiling assembly code”, though. When the machine code contained in an executable gets executed on an Intel CPU one hardware architecture-dependent instruction can be turned into one or multiple hardware model-dependent microcode operations. These can then be buffered to be executed asynchronously, reordered to better utilize the available ALUs and combined into fewer more optimized micro-operations. This is far removed from the 1:1 translation assemblers do and much more like a JIT compilation step.

    As an extreme example of there not being a 1:1 correspondence between assembly code and what the CPU does, two identical Intel Core 2 Duo processors manufactured in the same batch can execute different microcode if one of them was used to install Windows Update KB936357, a microcode update for Intel CPUs.

    • Scali says:

      True, but that isn’t relevant as far as compilers and assemblers go, since they only target the x86 instructionset, and the 1:1 translation that assemblers do is only from x86 mnemonics to x86 machine code.

      I think it’s quite obvious that there have been many different implementations of the x86 instructionset in more than 3 decades of x86-compatible CPUs, from various vendors.
      This earlier article goes into more detail about that:
      The same can be said for pretty much every architecture around these days. ARM, PowerPC, MIPS, SPARC, you name it. All of them have been around for many generations, and exist in many different forms.

      • Timo says:

        I don’t know if a firm distinction between assembling and compiling is even possible when assemblers can have bugs so widespread they become de-facto standardized, like this:

        “The IA-32 Assembler generates the wrong object code for some of the floating-point opcodes fsub, fsubr, fdiv, and fdivr when there are two floating register operands, and the second op destination is not the zeroth floating-point register. This error has been made in many IA-32 assemblers and would probably cause problems if it were fixed. “

      • Scali says:

        Rules and exceptions…
        In this case the exception is the extremely quirky x86 architecture (of which I had already mentioned another quirk in the article itself).
        Unless of course you’re one of those newbies who thinks x86 is the only CPU architecture in the world. I invite you to read some of my ‘Just keeping it real’ articles.

  5. dbgarf says:

    i’m not sure historical usage is a good justification for applying arbitrary dividing lines onto contemporary concepts. alot of words change their meaning over time, and alot of ideas are arrived at in a non-linear fashion. sometimes a more specific case of something is discovered independently before a general version is discovered, and then retroactively we can see clearly that the earlier thing was just a specific case of it. I think thats the phenomenon we’re dealing with here. Assemblers were invented first but they are just specific and narrow instances of Compilers.

    • Scali says:

      I don’t think this is a strong argument, and it also does not argue against anything I said.
      Yes, it is obvious that compilers as we know them today are not quite what the A-0 compiler was. We’d have to look at later iterations of the A-language and ultimately FORTRAN to start seeing compilers in the form that we know them today.

      However, during all this time, assemblers have always been called assemblers, which indicates that to compiler/assembler developers (usually the same teams, since most compiler suites also come with an assembler, or even have one built in), there is enough of a distinction to hold on to the assembler name specifically, rather than naming them both a compiler.

      Besides, even if you were to see assemblers as specific and narrow instances of compilers (which I don’t quite agree with, as they are performing distinctive tasks, not similar tasks in more specific or narrow ways), then it still goes that assemblers aren’t compilers, just as much as cows are animals, but animals are not cows.

      • snemarch says:

        “there is enough of a distinction to hold on to the assembler name specifically, rather than naming them both a compiler.”
        Again: diffusion/osmosis. Osmosis is diffusion, but the specific term has semantic value.

        “then it still goes that assemblers aren’t compilers, just as much as cows are animals, but animals are not cows.”
        Argument flaw – you’ve got things backwards. Assembler is-a compiler:
        Animal: compiler
        Cow (sheep?): assembler.

        Heck, even preprocessing can be thought of as compilation – cfront was a preprocessor for C++->C, and GCC is a preprocessor for {languages}->GAS 🙂

      • epsy says:

        >Besides, even if you were to see assemblers as specific and narrow instances of compilers ([…]), then it still goes that assemblers aren’t compilers, just as much as cows are animals, but animals are not cows.

        That doesn’t make any sense. If assemblers are instances of compilers, then they are compilers. Nobody said anything about compilers being (instances of) assemblers.

      • Scali says:

        Sure it makes sense. In natural language it depends on the context whether or not “assemblers are compilers” and “compilers are assemblers” are the same statement or not. Since my title is “assemblers are not compilers”, I referenced it directly, rather than turning it around, then added a context to explain how it is meant (“cows are animals” illustrates that cows are a proper subset of animals, therefore animals and cows are not the same set, so extra restrictions must apply to elements belonging to the set of cows).
        Claiming it doesn’t make sense only shows a lack of comprehension of natural language.

        Anyway, my point is this:
        1) Even if you are to claim that assemblers are compilers, you have to acknowledge that they are a special case.
        2) If you only see things in terms of compilers and assemblers, you are missing the bigger picture: The real superset you are looking for here is ‘translators’, not ‘compilers’. Compilers and assemblers are both special cases of translators. But as I have been saying all along: assemblers are not a subset of compilers.
        See also the PDF I linked elsewhere.

  6. MacOS9 says:

    Here’s some quoted material from a 2003 posting on antionline[ dot] c o m. The author tries to distinguish between assembling and compiling. Scali can of course offer more input on this (stimulating post by the way):

    “[…] like i said, assemblers are not compilers. a real compiler takes a high-level language like C and converts it to assembly, then optimizes the assembly, and then goes through the rest of the process of assembling and linking and whatnot. as for assemblers, they are generally not that hard to find. of course you need to specify which ISA you’re trying to find an assembler for. […] without a basic concept of computer architecture, assembly won’t make much sense.”

    • Scali says:

      Well, here is another reference that makes a clear distinction:
      It says that compilers and assemblers are both translators, but different types of translators.
      This I can agree with, unlike the people who just define the term ‘compiler’ vaguely enough so that any type of translator will fit the definition, including assemblers.
      It also makes a distinction between pre-processing and compiling, as I also did in one of the comments.

      So this document from 2002 is more or less the formal definition that I was taught as well, when I was at university.
      As I already said, comp.sci changed a lot over the years, so perhaps what is taught at university today is slightly different. This document however proves that what I have said was taught at at least some universities, at least up to 2007 (the last time the document was edited, apparently).
      So people who went all personal on me, about how this was just my opinion, and I was just being stubborn, and they knew they were right: hah!

  7. Pingback: Just keeping it real, part 10 | Scali's OpenBlog™

  8. chibesakunda says:

    Compilers input source code and output an assembly program while Assemblers input the assembly code and outputs a relocatable m/c code which then goes to the load/link editor to be generated into an absolute m/c code.

    Therefore assembly and compilation are two different things. I Think people mix up the two because during the “compilation process” an assembler is actively involved.However , the two programs perform different tasks

    • Scali says:

      Yup… also, they seem similar, because they both convert source code to binary code. Also, modern compilers no longer output assembly code, but compile and optimize with an internal bytecode format, then generate machine code directly from that (they can also generate assembly code from the bytecode, but it is optional).

  9. Pingback: SCALIBQ: the mnemonics are merely… | Yudi Supriyadi

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s