ASSEMBLER PROGRAMMING USING DEBUG - ADD, SUB, MUL and DIV

So, what's going to happen ?

First we will have a look at the debug command and see what it can do for us. Debug comes free with DOS and windows95/98 and is used to access the registers and RAM directly. It is a hackers dream ! (most hackers use advanced versions of debug called hex editors and disassemblers, debug is still very powerful though). Debug is a command line interface, so we only use the keyboard and it is important to understand some of the basic commands. We will investigate how to look at registers, do a memory dump, do some hexadecimal arithmetic and write some data into the registers so that we can add up and play with two numbers. Once we have the basics in place we will worry about how to actually write a program.

Instructions that you need to do are written in red

DEBUG

First, start the assembler by opening a DOS window in windows95/98/NT (start/programs/msdos prompt)
You should be in the windows directory (c:\windows). If you are not, change to that folder now.
Start the debug program by typing debug <enter>.
The prompt should schange to a simple "-"

You can switch back and forward from this tutorial to the dos window from the taskbar or using <shift> + <tab>. That way you'll be able to follow the steps as we go along.

Okay, lets have a look at the registers, type "r" <enter>
 
 

you should see a display similar to the one above. What does it all mean ? Well....
 


The code 0F70:0100 is the address of the next instruction in full and the actual data at that address currently says 03F1, which debug has thoughtfully translated for us into semi-english and told us it means to add together the SI and CX registers. This means nothing to us at the moment though. In fact it is just garbage.

Your display will undoubtedly have different values for CS and the instruction at offset 0100h.

All of the other symbols we can safely ignore at the moment.

Hexadecimal notation

The numbers being used here are all in hexadecimal or hex for short. Hex numbers start at 0 and go to 15 before counting over again. We are used to counting to 9 before adding a tens column, hex gets to 15. It is a very logical way of counting.

0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 1A, 1B, 1C, 1D, 1E, 1F, 20 etc.

Hex is beautiful because

one hex digit represents 4 bits
two hex digits represent one byte or 8 bits
four hex digits represents two bytes or one word

So FF is the hex number for 255 decimal (15 * 16 + 15). To avoid confusion, hex numbers are sometimes written with an "h" after them,  3A7h is really the hexadecimal for our familiar decimal number 8359. Further, the registers contain 4 hex digits, so they obviously hold 2 bytes, or one word of data.

Hexadecimal Summary page

The addresses you see refer to the position of the data from the start of memory in hex. Due to an historical decision, intel requires that you specify the segment and the offset when specifying an address. When you start the debug program, it sets up a segment that is empty for you to use. When I was working it turned out to be 0F70, but your segment will be different. In this first tutorial we will only have to worry about the offset (0100h) and not the segment.
 

Registers

Lets put a bit of data into the registers....

type:
r AX <enter>
8 <enter>
r <enter>

You have just told debug to replace the contents of the AX register with 8h
typing r the second time did another register dump so you could see the change.

Now place 2h in the BX register using some logical skills to work out the command (r BX).
Do a register dump to check it works.
We have written data directly to the registers in the CPU.

Adding the AX and BX registers

This is a bit trickier, we need to get the command for adding the numbers into memory, tell the IP (Instruction pointer) to point to that instruction, load the instruction into the control unit of the CPU and have the ALU do the right instruction from its set of instructions. We can then have a look at the registers to see if it has worked.

The add instruction will be placed at offset 0100h in whatever segment you are using. Debugs command for entering data into memory is "e". The bits in green are my comments to tell you what is happening.

Type in the following:
e 100 <enter>        'enter address 0100h
01 <enter>          'write 01h at that address
e 101 <enter>        'enter address 0101h
D8 <enter>         'write D8h at that address

The extra hex numbers coming up after the prompt tell us the existing contents of those addresses.

Since we are using the current segment, we don't have to tell debug about the CS  and can use the offset (0100h) alone. We can also leave out leading zeros and debug doesn't mind if we write in upper or lower case for the a,b.c,d,e,f parts of hex numbers. We edited the bytes with two edit commands, we could have used a space after the first edit and done a sort of continuous edit, we will use this method later.

That was the machine code for the instruction ADD AX, BX.
Do a register dump to confirm it.
You should see:

0F70:0100 01.D8        ADD AX, BX

Congratulations, you have just written your first line of code and you have even written it in machine code. You have just set the 16 bits of data at the addresses 0100h and 0101h to 0000000111011000b or 01D8h.
 

Machine Code

On ALL intel machines, from the earliest 8088 to the fastest pentium IIIs, the machine code 01D8 means to add the AX and BX registers. On a motorola processor used by Macs, the machine code would be very different. So when we build a program to run on an IBM intel based processor, the assembler codes up the instructions knowing what the machine code for certain commands is. It is easy to remember ADD AX,BX, but a bit harder to remember  that 01D8h is the machine code instruction for it. The assembler does the conversion for us. Each processor has a certain number of instructions it can use. A simple set of instructions is called a RISC (Reduced Instruction Set Computing) while a larger more complex one is called a CISC (Complex Instruction Set Computing). A RISC runs faster but can't do as much, a CISC runs slower but does much more. An example might be that a RISC has no instruction for multiplication, since multiplying is really just repeated adding, while the CISC can do it in one go. This is an extreme example, since most RISCs can do multiplication. The difference between a pentium and a pentium MMX is that the MMX has a few more instructions so that the processor can do a few more things with data, particularly concerning graphics for games, so the CISC is even more CISC.

Trace

So now the instruction is in place, lets execute it. Since the IP currently points to offset 0100h, the code is ready to run (that's why we selected address offset 0100 when we entered the data eh ?).

The command for running a single line of code is t, which stands for trace. After each line is executed, debug gives us a register dump to show us what has happened. Tracing a program is to execute it line by line and examine all of the variables.

Type t once to execute the command

You should see that the AX register now has Ah in it. You might want to check that 8h + 2h = Ah...it does. (you probably want to see 10, but 10 decimal is written as A in hexadecimal) Note that the result of the addition is stored in the AX register. Sort of logical really.

Note also that the IP now points to 0102h....since we haven't put any instructions in that location in RAM it would be unwise to execute it, the CPU might head off into limbo. That would be a bad thing.

We should set the IP back to 0100h....

try using "r IP", this will let us place 0100h in the IP....

now type "t" again. The instruction to ADD AX,BX is done again. That time it added AX (Ah) to BX (2h) to get....

How about checking some other numbers for AX and BX...try placing 1 in AX, 1 in BX, resetting the IP to 100h and doing a trace.

SUBTRACTION

The machine code for SUB AX,BX is 29D8h.

Using the e command, edit the instruction at 0100 to subtract the BX from the AX register.
As a tip, you can enter it in one go by using a space after the first byte has been entered and continuing.
Note that the old bytes for ADD AX,BX (01 D8 ) come up as we enter the new code.

try running the code, remember to set the IP to 0100h each time, otherwise...it could be bad.
Try placing 0 in AX and 1h in BX.

Mmmm, intels have a funny way of showing negative numbers. It all works out in the end though.

MULTIPLICATION...a trick

The machine code for MUL AX,BX is F7E3h. Try it, (make sure about the IP, otherwise...)

put 5h in AX and 3h in BX and edit the instruction at 0100 to F7E3h

Before you do the trace, do a register dump, the instruction is MUL BX. No reference is made to AX. The processor assumes you are going to multiply the AX register....now do the trace.

You should see the value F stored in AX, which is the hex value for 15 (5 * 3 = F)

put 3A7h in AX and 92Ah in BX, reset the IP to 0100.
 

We are multiplying together 935 and 2346 decimal, which should give an answer 2193510 decimal, or 217866h

do a trace.

The DX register is now not 0000, it should be 21h. This is because multiplying together registers, which are 2 byte words, gives some big answers. To prevent overflow, the answer is stored in a pair of registers, the AX and the DX. The high word is stored in DX, the low word in AX. AX should contain 7866h

set AX to FFFF and BX to 0010, then try the MUL AX,BX instruction again. can you explain what has happened.

AX is full, so it carries into DX when multiplied by BX. BX is 10h, so it shifts the AX one place to the left (like multiplying by 10 eh ?) F is the carry, FFF0 is left in AX. Easy peasy. Multiplying by 16 in the hex world is a breeze.

When the registers are combined like this we write it as a double word (4 bytes) AX:DX, tradtion states that the low word always comes first. This is an important lesson for novice hackers.

DIVISION

The machine code for DIV AX,BX is F7F3h.  Set this at address 0100h

The machine wants to find a double word in AX:DX with which to divide by BX

Set
DX = 007Ch
AX = 4B12h
BX = 0100h.
We are dividing 007C4B12h by 0100h.

now set the IP back to 0100. (or it could be bad)

Register dump to check all is ready and correct.
..trace.

The answer should be 7C4Bh with a remainder 12h. Where did you see the answers ?

The dividend is in AX, the remainder in DX, neat eh ?

How to Quit

To exit from debug you type the command "q":

This is an example of useful information buried at the bottom of the page.
 

Summary

Congratulations, you have learnt the machine code and assembler instructions for addition, subtraction, multiplication and division. Along the way we learnt about hex numbers and the debug commands

r - register
e - enter
t - trace
q - quit

As you have gathered, we are not writing in assembler, we are actually writing in machine code. We are placing into the registers and RAM the machine code for simple arithmetic. Later, we will use ADD AX,BX directly instead of 01D8h, but for now we will continue to use machine code for some more simple examples. The next step is to write a two line program that does something. Wooh Hooo...

Assembler Programming - tutorial 2 (understanding interrupts, RAM and some DOS functions)