ASSEMBLER PROGRAMMING USING DEBUG -
INTS, MOV, Dumps, Unassemble and Assemble

This tutorial will take us a few steps further. We will still be writing in machine code, but we will investigate running a small, two line program and exiting from it gracefully instead of tracing through it. Along the way we will discuss the wonderful world of DOS/BIOS interrupts and learn about moving data into the registers. We will also perform the miracle of writing to the display. Finally, we will learn how to write in assembler instead of machine code.
 

The Registers Hi and Lo Bytes

There are four general purpose registers used in an intel CPU. AX, BX, CX and DX. Each register is 2 bytes long and is divided up into a high byte and a low byte. AX is made up from AH and AL, BX is BH and BL and so on.

Processors from the 386 and up have an additional 4 registers called EAX, EBX, ECX and EDX, which are extended registers which can hold a double word or 4 bytes. This makes the processor a 32 bit processor. Unfortunately debug is quite old and can't access the extended registers.
 

Terminating Normally- The Technical stuff

In procedural languages such as VB or C, the "end" statement must appear somewhere so that the code can stop running. If we want to run a program using assembler we must also tell the computer when to stop executing the code, otherwise the IP will keep clicking over into instructions we never intended to execute.

What actually happens during a termination is that the current process (our assembler program) is booted out of the CPU and control returned to the Operating System. This is hard to visualise, even with a DOS program, since the actual code is only running for a fraction of the time the CPU is ticking over, it has more important things to do like update the video display, run the clock, check the keyboard and so on. With windows95/98/NT and other time sharing multitasking operating systems, such as Unix, our humble assembler program is only one of many programs sharing the CPU. In any case, we are running the program from within another program. The debug program. It is our mother and we are aptly called the daughter. Debug is looking after us and we really want to terminate and let debug carry on as before.  To do this the operating system needs to be told something fundamental, rather than just add something up using registers. The way this is handled is through interrupts.

Interrupts

There are many hundreds of interrupts. A complete list of interrupts can be downloaded from the internet and consumes a small forest if printed. Each interrupt has many variations. If we want to tell a video card to change modes we use an interrupt. If we want to write to a floppy, we use an interrupt. If we want to make a sound we use an interrupt. An interrupt is a signal to the computer to stop doing what it is currently doing and do something with the hardware (or software). The CPU actually spends part of its time watching for interrupts. It watches the mouse, keyboard, clock and many other devices.

It is important we use the correct interrupt and it is here that the hardware to operating system relationship is strained. How does the operating system know that the particular video card installed uses interrupt #10 with AH=0h and AL=29h to set a graphics screen with 256 colours 800*600 ? What if it sets AL=30h instead and the video card thinks this is the instruction to start a small fire in the power supply unit ? As you can see, using the wrong interrupt is bad.

This is where the drivers come in. Drivers let the operating system know what interrupts the particular piece of hardware uses. Some drivers are standard, the mouse driver for example, is pretty well straightforward. Others, such as the latest driver for the voodooIII 3Dfx card uses interrupts designed to give programmers nightmares.

Well, this was a bit of a digression, after all, I really just needed to say that int20 stops program execution.
 

Int 20 - The easy way to stop a program

Something to do...

set the IP to 0100h (use r IP <enter>100<enter>)
place the code CDh 20h at offset address 0100 (use e 0100 <enter> CD <space> 20 <enter>)
do a register dump to check ("r"),

you should see the instruction decoded as INT 20

To run this we can't really use the t command, since the actual interrupt is itself a small section of code and we would need to trace through several hundred lines of code to get the thing to work. This is further complicated by the fact that the interrupt is located in a different segment of RAM and doing a trace means we must reset the CS register afterwards. Instead we need to actually run the code. Whoa !

To run the program (all one line of it, which says to stop !) we use the "g" command, we tell debug where to stop executing by placing an address after "g". We will stop execution at (which means before) 102 (and after 100). Execution starts at the current IP, so make sure it is set to 0100h.

g 102

Why did we tell debug to stop at 102 ? For safety, my dear hacker, for safety !

If we had the wrong interrupt and had inadvertently told the computer to start trying to spin the hard drive off its spindle, the process would at least stop after 0100 before going into limbo with the garbage instructions after that point. It stops before instruction 102 and we have a chance to stop our runaway train. In this case (hopefully), you got a message saying "program terminated normally". Nice message to get, that one.

Now we are confident it works, give it full throttle, reset the IP to 0100 (I'll have to keep reminding you, otherwise it will be bad) and let the interrupt stop the program by itself...

g

So now we know how to stop. Use Int20.

Writing to the Display

DOS is a program that is sitting in a part of RAM. DOS provides a whole pile of interrupts with which you can do useful things. DOS routines all use the same interrupt, int21h. You tell the computer which particular DOS routine you actually want by placing values in the registers. If we set AL to 02h this tells DOS to print the character which is located in the high byte of the DX register (DH).

Lets do it...
 

set the IP to 0100
Place CDh 21h at address 0100h (using the e command remember ?)
Place CDh 20h at address 0102h

Do a register dump (use the r command dummy !)
You should see INT 21 as the command

Now we need to tell DOS what to do when it gets called....
place 02h into AL...(the print command).

Whoa ! not so straightforward. You'd think that with 02h in AL, AX would look like 0020h yes ? But no ! the low byte actually comes first, like in counting 1,2,3 etc AX is made up of AL:AH. The actual command is:

r AX    <enter>
0200    <enter>

AL is actually stored as the first byte and AH as the second, so setting AX to 0200h places 02h into AL and 00h into AH. The same funny thing happened with the double word register AX:DX, the high part comes last. This is worth remembering, bytes are stored back to front to make up words.

finally place the character we want to print in DH....again we need to remember the high byte comes second.

r DX
004C        'high byte is second....
 

Now we are ready to go, (do a register dump to make sure)

g

You should see the letter "L" followed by the words "program terminated normally"
The whole thing is shown below

You can see here where I entered the code, changed the AX and DX registers, set the IP and then, on a wing and a prayer, typed g. Fun isn't it ?

The code we placed in DH was the ASCII code for the letter L. Try placing different ASCII codes in DH to see that it works. Always remember to reset the IP before typing g. If you forget and bad things happen, the simplest way out of the mess is to stop debug and start again. Windows95/98 is quite good at stopping programs that are heading off to woop woop...usually. Remember, quit debug using the q command.

Unassemble

So far we have used the register dump to examine a single line of code and the values of the registers. Debug has a command available that will take the contents of RAM and unassemble the machine code into assembler instructions. The command is "u". Lets try it on our code....

u 100        'Unassemble the code beginning at offset 0100h

As you should see, the first two instructions are

int 21
int 20

the rest are garbage instructions. Debug just unassembles machine code, it does not try and pretend the code makes sense. The instructions after our two lines are what debug deciphers into assembler from the random junk that was sitting in RAM when debug was started.

Assemble

Now we are ready to use the power of Debug to convert assembler into machine code. This way we don't have to mess around with hex machine codes, we can write in somewhat meaningful mnemonics.

Lets assemble some code starting at address 0100

a 100                           'start assembly at offset adress 0100h
mov AH,02
mov DL, 4C
int 21
mov DL, 55
int 21
mov DL, 43
int 21
mov DL, 49
int 21
mov DL, 4E
int 21
mov DL, 44
int 21
mov DL, 41
int 21
mov DL, 4C
int 21
mov DL, 45
int 21
int20
<enter>       'enter on a blank line causes debug to stop assembling

do an unassemble to make sure you have it right

u 100

it should look like this....

Now, set the IP to 100 and type g

Interesting huh ?

We have used a new instruction, MOV, the machine code for MOV AH is B4h, the second byte specifies what to move, in this case on line 0100 it says

B402, move 02.

The code for MOV DL is B2h.

Much easier to remember MOV DL than B2h.

This code moves a succession of ASCII characters into DL, each time calling interrupt 21 to print them on the display. Finally it calls int 20 to terminate.

Writing Text

This was quite cumbersome. Why can't we write a whole string without having to move the characters into DL each time ? DOS, in its generosity, provides us with another interrupt routine which lets us write out a string to the display. Int21 AH=09h does the job. How does DOS know the string is finished ? We must place a special character at the end of the string. In this next example we are going to write the string into memory starting at offset 200 (so that it doesn't interfere with our code which starts at 100). Yippee, we are going to have a data section !

e 200
48 65 6C 6C 6F 2C 20 44 4F 53 20 68 65 72 65 2E 24 <enter>

Use spaces between the bytes.

The last number (24h) is the ASCII code for $, which is the end of string character recognised by DOS. DOS will print out the characters until it gets to 24h.

We can now assemble the code to print out this marvellous string

a 100
MOV AH,09                'the DOS interrupt routine for printing a string
MOV DX, 0200           'this is where the string is located in memory that we wish to print
INT 21                         'call the interrupt
INT 20                         'terminate
<enter>

make sure the IP is set to 100h, then type g.

DUMP

When we unassemble we ask Debug to take machine code and translate into assembler.
When we assemble, we ask debug to make up the machine code for the instructions we give it.

When we dump we ask debug to give us a raw dump of memory, Debug doesn't do anything to it except give us a display of the bytes (in rows of 16) and an ASCII value for each byte, since it might be useful.

The command for dump is...you guessed it..."d"

if you worked through the example above then...

Try dumping 0200

There, in the mess of characters on the right, is our message. On the left are the bytes we typed in.

The rest is garbage. There should be a lesson in the garbage though. It tells us that although we haven't played with RAM at offset 0230, there is in fact something there apart from 00h. Never assume a zero value is in memory, always initialise your data if you want to use it.

Well Done

If you have got this far and have understood what is going on, and have worked through the examples, you have done very well. As you might have gathered, assembler programming is a bit of a black art. There are lots of little secrets, lots of reading and a large potential for stuffing up. Nothing is easy in assembler.

We now have a few debug commands up our sleeve, you should write them down and what they do:
 

e
r
u
d
a
r XX
t
g
q


We also have a few assembler commands that are useful:
 

ADD
SUB
MUL
DIV
MOV
INT        (20 and 21)


We have also learnt heaps about bytes, words, hex and ASCII. If you are game, you might investigate how we write a program in assembler and actually save it to a file so that we can run it as a standalone program independently of debug. The assignment will give you a god test of your understanding of this topic.

Assembler and Computer Architecture Assignment