For some reason, people (read: newbies) often talk about ‘compiling assembly code’, or using an ‘assembly compiler’. This is WRONG, people! And I will try to explain why, by offering a historical perspective, as usual.
Namely, if we go back to the early days of programming… The first programmable computers would take their input in the form of machine code: each instruction would be encoded as a set of bits, consisting of an opcode (the operation itself, such as ‘add’ or ‘subtract’) and its operands (the data to operate on, which can be a constant, a register, or a memory location for example).
When writing a program, the programmer would have to first line out the program in pseudocode, and then convert it to machine code by hand. In order to make things easier, each opcode would have a ‘mnemonic’: a short name which described the instruction. The programmer could then first write down the program in a listing of mnemonics for each instruction, and then convert each instruction to its machine code representation.
Here is an example of some code written by Steve Wozniak (the co-founder of Apple, and designer of the early Apple computers):
As you can see, he’s written out the code in a number of columns. The first column is the memory address of each instruction, the second column contains the machine code bytes for that instruction, then follows the mnemonic representation of the code (the ‘human readable form’), and finally some comments.
Initially you would have to do this all by hand. The process of converting mnemonics to machine code became known as ‘assembling’ the code, and automated tools were developed for this, which became known as ‘assemblers’. The first assembler was written for the EDSAC computer in 1949, and was called “initial orders”.
The first compiler however, was written by Grace Hopper in 1952, for the A-0 programming language. And that in itself already shows that compiling and assembling are not seen as the same thing. Namely, since we already had assemblers at that time, then why did Grace Hopper bother to coin the term ‘compiler’, rather than just re-using the term ‘assembler’ to describe this new tool? Apparently there is a fundamental difference between the two types of tools.
The main difference between assembly language and other programming languages such as the A-0 language, is that assembly languages are always machine-dependent (after all, the mnemonics are merely a more human-readable form of the instructions that the machine supports, so different machines have different mnemonics), where other programming languages abstract away the physical machine, and work at a higher level. Many early compilers would also compile the higher level source listings into a machine-specific assembly listing, which would then be passed on to an assembler to generate the actual machine code.
Since a compiler works from a source listing at a higher level, it also needs to perform a more complex translation than an assembler. For an assembler there is generally a 1:1 mapping from mnemonics to machine code. There are some exceptions where a single instruction may be encoded in multiple ways, but generally any decisions an assembler needs to make at all during translation are very trivial and unambiguous (for example, picking the shortest encoding for a given instruction). For this reason, writing assembly code is as good as writing machine code by hand, as far as performance and size optimizations go.
Compilers however need to map variables used in the source code to machine-specific registers and memory locations, and try to decide the shortest and/or fastest possible sequence of instructions to translate the code. This causes complicated problems for register allocation and re-use for example. There is no 1:1 translation of high-level keywords and expressions to machine code. There are many possible alternatives, and a compiler will need to do a lot of analysis and use clever heuristics to try and come up with fast code.
Especially in the early days, compilers were rather naïve, and their translations would not come anywhere near assembly code optimized by hand. These days however, compilers have come a long way, and perhaps just as important: so have computers. Where early computers were still designed specifically for handwritten programs, over time more and more people started using compiled languages, and computers were designed more and more to make the job of compilers easier. Compilers generate code in an algorithmic way, and so they would only use a given subset of instructions. RISC CPUs would reduce the instructionset to only include the most-used instructions, and make these instructions run as fast as possible. Which would mean that the job of compilers became easier.