ASSEMBLER PROGRAMMING USING DEBUG

Assembler Programming - tutorial 1 (understanding machine code and opcodes)
Assembler Programming - tutorial 2 (understanding interrupts, RAM and some DOS functions)
Assembler Programming - Assignment
Assembler Programming and PC Architecture - Links
Hexadecimal Summary
ASCII Reference Chart

Introduction

These tutorials were written with the intention of introducing information technology students to some of the features and constraints of assembler programming. It covers many of the objectives laid down for the SACE Information Technology and Information Technology Studies courses by tackling the concepts of compiled programs from the bottom up. I reasoned that if the students can see a program in RAM, see what each instruction is doing and identify data from code, they might, with a bit of prodding, think about the process that turns a Visual Basic program from source code to machine code. In essence, the tutorial is not so much about assembler programming, but about what goes into generating machine code.

Assembler programming is made easier by using a true assembler such as MASM, which can take a text file and convert it into machine code. MASM allows some sophistication to be introduced; labels, pseudo operations and a modular approach. Instead of these little luxuries I have stuck to a raw, no frills approach and started with machine code values before assembling code using the ubiquitous debug program. It is not difficult, in fact it is really rote learning and following instructions, to the letter ! The difficulty comes in understanding what has been done. I have tried to start off with full explanations and then expect the student to at least remember some of the steps.

Overview of Generating Executable Files

These tutorials in no way qualify a user as an assembler programmer, they are merely an introduction to show what complexities lie in the process of compiling, assembling, linking and running programs. I have found that these concepts are not well understood. To give an overview:


So where does assembler fit into all of this ? Assembler is really a less sophisticated programming language and therefore requires less from the compiler. In fact, it requires so little work that instead of compiling the code, we talk of merely assembling the code. A fully fledged compiler must decide how to change the high level commands like do while...loop, or for(n=0; n<10;n++) into the instructions the computer understands like JNE (jump not equal) and CMP (compare). Different languages lend themselves to being compiled more or less easily. Different compilers produce more or less efficient objective modules. If we write in assembler we remove this requirement and literally tell the computer that these are the actual instructions we want to use. The compiler is removed and the assembler is used alone.

Interpreters

There are a few complications to point out along the way. The first is that students doing this course will undoubtedly be using an interpreter. Pushing the "play" button in Visual Basic invokes the interpreter, a double step process. The code is compiled and assembled line by line, it is then run on the fly. Certain book keeping tasks are performed before the code is interpreted, such as identifying procedures and functions, so the code is in essence linked. Since the student never sees the objective module or the machine code (executable) file the process is simplified. However, if the student wants to get an executable version of the program, which can be run without the compiler, they must generate the executable file using the steps described above. This is usually hidden to a large degree as well.

Editors

The second point is that very few programs are written using a standard text editor. Software companies who make compilers usually develop "Integrated Development Environments" (IDE, Borland's term for it), which are powerful editors that check the syntax and let the user specify a whole host of options before the program is run. These editors also enable the programmer to invoke the interpreter so the program can be checked before an executable file is generated. The VB program is such an example, and since it works to generate programs for the windows operating system, it is called "visual".

Executable Files

The third point is that there are different types of executable files. Files which can be read straight into RAM and fed to the CPU come in different flavours. In the DOS world *.com and *.exe files are both executable. They differ in how they address memory and the size they can be. In the windows world the situation is more complex, as well as *.exe (windows specific executables and dos executables which are run through windows as legacy programs) there can be a raft of other executable files including *.dll (dynamic link library, a collection of procedures which can be called from any other windows program) and differences between files which run on different versions of windows. In other words, the format of executable files varies greatly.
 

..To machine code and back again ?

A common question that arises is "why can't an executable file be decrypted into some source code, say in VB, and then altered ?". In other words, why can't programs be reverse engineered. The simple answer is that they can. The bigger answer is that it takes a lot of effort. Programs are available that take the machine code, make a few assumptions (or are told) about where the code actually starts and produce assembler code as output. These programs are called "unassemblers". The debug program we are using in the tutorials can do this using the "u" command. The problem starts when a hacker tries to work out what it all means. There are no nice variable names, any labels the unassembler allocated are meaningless, the data section often gets confused with the code section, so the data gets unassembled as if it were code...and so on. A hacker therefore has to rely upon recognising some sequence of assembler instructions and make educated guesses about what is happening. Given that even simple programs occupy thousands of instructions, the task is even more formidable.

References

The tutorial has been written using the venerable book by Peter Norton called "Peter Norton's Assembly Language Book for the PC". I have followed very closely several of his examples and the style of writing and standard of the code make it both readable and an excellent reference. The fact that it dates back to 1986 and is specific to the 8088 instruction set in no way detracts from its usefulness. It is a great starting point. If you understand the basics then the rest falls simply into place.

So that has got a little of the background out of the way and it is time to start the hard work.
 

Assembler Programming - tutorial 1 (understanding machine code and opcodes)
Assembler Programming - tutorial 2 (understanding interrupts, RAM and some DOS functions)