Assembly

10 February 2018 · Filed in Tutorial

Assebly

Register

To speed up the processor operations, the processor includes some internal memory storage locations, called registers.

There are ten 32-bit and six 16-bit processor registers in IA-32 architecture.

The registers are grouped into three categories −

General registers, Control registers, and Segment registers. The general registers are further divided into the following groups −

Data registers, Pointer registers, and Index registers.

Data Registers

System calls are APIs for the interface between the user space and the kernel space.

Nibbles

A nibble is a collection of four bits

Bytes

Bits 0..3 comprise the low order nibble, bits 4..7 form the high order nibble. Since a byte contains exactly two nibbles, byte values require two hexadecimal digits.

Since a byte contains eight bits, it can represent 28, or 256, different values. Generally, we’ll use a byte to represent numeric values in the range 0..255, signed numbers in the range -128..+127

Words

A word is a group of 16 bits.

Double Words

A double word is exactly what its name implies, a pair of words.

Signed and Unsigned Numbers

with n bits we can represent the signed values in the range -2n-1 to +2n-1-1.

00000000	+0	0
00000001	1	1
⋮	⋮	⋮
01111101	125	125
01111110	126	126
01111111	127	127
10000000	−127	128
10000001	−126	129
10000010	−125	130
⋮	⋮	⋮
11111101	−2	253
11111110	−1	254
11111111	−0	255

1) Invert all the bits in the number 2) Add one to the inverted result

0000 0101 Five (in binary). 1111 1010 Invert all the bits. 1111 1011 Add one to obtain result.

Shifts and Rotates

Another set of logical operations which apply to bit strings are the shift and rotate operations. These two categories can be further broken down into left shifts, left rotates, right shifts, and right rotates.

shifting it left multiplies it by two.

If you shift a binary value to the left three times, you multiply it by eight (222) If you perform n right shifts, you will divide that number by 2n

The ASCII Character Set

Von Neumann architecture (VNA)

Buses

The system bus

The system bus connects the various components of a VNA machine

The Data Bus

The Address Bus

The Control Bus

The CPU sends data to memory and receives data from memory on the data bus. This prompts the question, “Is it sending or receiving?” There are two lines on the control bus, read and write, which specify the direction of data flow

The 80x86 family deals with this problem by storing the L.O. byte of a word at the address specified and the H.O. byte at the next location

CPU Registers

AX –The accumulator register BX –The base address register CX –The count register DX –The data register

Besides the above registers, which are visible to the programmer, the x86 processors also have an instruction pointer register which contains the address of the next instruction to execute. There is also a flags register that holds the result of a comparison. The flags register remembers if one value was less than, equal to, or greater than another value.

Because registers are on-chip and handled specially by the CPU, they are much faster than memory

Accessing a memory location requires one or more clock cycles. Accessing data in a register usually takes zero clock cycles

The Arithmetic & Logical Unit

For example, if you want to add the value five to the AX register, the CPU: • Copies the value from AX into the ALU, • Sends the value five to the ALU, • Instructs the ALU to add these two values together, • Moves the result back into the AX register.

The Bus Interface Unit

The bus interface unit (BIU) is responsible for controlling the address and data busses when accessing main memory. If a cache is present on the CPU chip then the BIU is also responsible for accessing data in the cache.

The Control Unit and Instruction Sets

CBA Instruction 000 move 001 add 010 subtract 011 multiply 100 divide 101 and 110 or 111 xor DD SS DD -or- SS Register 00 AX 01 BX 10 CX 11 DX

log2(n)

The move instruction would move data from the source register to the destination register the add instruction would add the value of the source register to the destination register, etc.

The x86 Instruction Set

The instructions are mov (two forms), add, sub, cmp, and, or, not, je, jne, jb, jbe, ja, jae, jmp, brk, iret, halt, get, and put.

mov reg, reg/memory/constant mov memory, reg

The arithmetic and logical instructions

add reg, reg/memory/constant
sub reg, reg/memory/constant
cmp reg, reg/memory/constant
and reg, reg/memory/constant
or reg, reg/memory/constant
not reg/memory

The add instruction adds the value of the second operand to the first (register) operand, leaving the sum in the first operand. The sub instruction subtracts the value of the second operand from the first, leaving the difference in the first operand. The cmp instruction compares the first operand against the second and saves the result of this comparison for use with one of the conditional jump instructions (described in a moment). The and and or instructions compute the corresponding bitwise logical operation on the two operands and store the result into the first operand. The not instruction inverts the bits in the single memory or register operand.

The control transfer instructions

ja dest – Jump if above jae dest – Jump if above or equal jb dest – Jump if below jbe dest – Jump if below or equal je dest – Jump if equal jne dest – Jump if not equal jmp dest – Unconditional jump iret – Return from an interrupt

The first six instructions in this class let you check the result of the previous cmp instruction for greater than, greater or equal, less than, less or equal, equality, or inequality

Addressing Modes on the x86

The x86 processors support the register addressing mode10, the immediate addressing mode, the indirect addressing mode, the indexed addressing mode, and the direct addressing mode.

mov ax, [1000] mov ax, [bx] mov ax, [1000+bx]

The first instruction above uses the direct addressing mode to load ax with the 16 bit value stored in memory starting at location 1000 hex. The mov ax, [bx] instruction loads ax from the memory location specified by the contents of the bx register. This is an indirect addressing mode

mov bx, 1000 mov ax, [bx]

are equivalent to the single instruction:

mov ax, [1000]

Of course, the second sequence is preferable

The last memory addressing mode is the indexed addressing mode. An example of this memory addressing mode is mov ax, [1000+bx]

3.3.7 Encoding x86 Instructions

i i i r r m m m
i i i
= special
= or
= and
= cmp
= sub
= add
= mov reg, mem/reg/const
= mov mem, reg

r r
= AX
= BX
= CX
= DX

mmm
0 0 = AX
0 1 = BX
1 0 = CX
1 1 = DX
0 0 = [BX]
0 1 = [xxxx+BX]
1 0 = [xxxx]
1 1 = constant

0 0 i i m m m

i i
= zero operand instructions
= jump instructions
= not
= illegal (reserved)

mmm (if ii = 10)
= AX
= BX
= CX
= DX
= [BX]
= [ xxxx+BX]
= [ xxxx]
= constant

For example, to encode the mov ax, bx instruction you would select iii=110 (mov reg, reg), rr=00 (ax), and mmm=001 (bx). This produces the one-byte instruction 11000001 or 0C0h.

0 0 0 0 i i i
i i i
= illegal
= illegal
= illegal
= brk
= iret
= halt
= get
= put

0 0 0 1 i i i
mmm (if ii = 10)
= j e
= jne
= j b
= jbe
= j a
= jae
= jmp
= illegal

jxx address The jmp instruction copies the 16-bit immediate value (address) following the opcode into the IP register. Therefore, the CPU will fetch the next instruction from this target address; effectively, the program “jumps” from the point of the jmp instruction to the instruction at the target address.

Three of these instructions are illegal instruction opcodes. The brk (break) instruction pauses the CPU until the user manually restarts it. This is useful for pausing a program during execution to observe results. The iret (interrupt return) instruction returns control from an interrupt service routine. We will discuss interrupt service routines later. The halt program terminates program execution. The get instruction reads a hexadecimal value from the user and returns this value in the ax register; the put instruction outputs the value in the ax register.

Step-by-Step Instruction Execution

the mov reg, reg/memory/constant instruction: • Fetch the instruction byte from memory. • Update the ip register to point at the next byte. • Decode the instruction to see what it does. • If required, fetch a 16-bit instruction operand from memory. • If required, update ip to point beyond the operand. • Compute the address of the operand, if required (i.e., bx+xxxx) . • Fetch the operand. • Store the fetched value into the destination register

section.data text db “Hello, World!”, 10

section .text global _start _start: mov rax, 1 mov rdi, 1 mov rsi, text mov rdx, 14 syscall

mov rax, 60
mov rdi, 0
syscall

text db “Hello, World!”, 10

db define bite 10 is new line \n

text name of memmory address

• A bit is a single binary digit, 0 or 1. • A byte is 8 bits side by side. • A word is 2 bytes side by side. • A double word is 2 words side by side. • A quad word is 2 double words side by side.

The Linux kernel was entirely separate from the user interface, and it was protected from damage due to malfunctioning programs elsewhere in the system. System memory was tagged as either kernel space or user space, and nothing running in user space could write to (nor generally read from) anything stored in kernel space. Communication between kernel space and user space was handled through strictly controlled system calls

Direct access to physical hardware, including memory, video, and periph- erals, was limited to software running in kernel space. Programs wishing to make use of system peripherals could only get access through kernel-mode device drivers.

Memory model

The oldest and now ancient memory model is called the real mode flat model

The newest memory model is called protected mode flat model, and it’s the memory model behind modern operating systems such as Windows 2000/XP/Vista/7 and Linux.

Real mode flat model is very much like protected mode flat model in miniature.

NAME VALUE IN DECIMAL VALUE IN HEX Byte 1 01H Word 2 02H Double word 4 04 H Quad word 8 08H Ten byte 10 OAH Paragraph 16 10H Page 256 100H Segment 65,536 10000H

The segment registers exist only to hold segment addresses.

The segment registers have names that reflect their general functions: CS, DS, SS, ES, FS, and GS. FS and GS exist only in the 386 and later Intel x86 CPUs—but are still 16 bits in size. All segment registers are 16 bits in size, irrespective of the CPU. This is true even of the 32-bit CPUs.

• CS stands for code segment. Machine instructions exist at some offset into a code segment. The segment address of the code segment of the currently executing instruction is contained in CS. DS stands for data segment. Variables and other data exist at some offset into a data segment. There may be many data segments, but the CPU may only use one at a time, by placing the segment address of that segment in register DS. SS stands for stack segment. The stack is a very important component of the CPU used for temporary storage of data and addresses. I explain how the stack works a little later; for now simply understand that, like everything else within real mode’s megabyte of memory, the stack has a segment address, which is contained in SS. ES stands for extra segment. The extra segment is exactly that: a spare segment that may be used for specifying a location in memory. FS and GS are clones of ES. They are both additional segments with no specific job or specialty. Their names come from the fact that they were created after ES (think, E, F, G). Don’t forget that they exist only in the 386 and later x86 CPUs!

these general-purpose registers are used to hold the offset addresses that must be paired with segment addresses to pin down a single location in memory.

The ‘‘bitness’’ of the world is almost entirely defined by the width of the x86 CPU registers.

Adding a room to your house doesn’t make it two houses—just one bigger house.

There are eight 16-bit general-purpose registers: AX, BX, CX, DX, BP, SI, DI, and SP. They are all 16 bits in size, and you can place any value in them that may be expressed in 16 bits or fewer.

When Intel expanded the x86 architecture to 32 bits in 1986, it doubled the size of all eight registers and gave them new names by prefixing an E in front of each register name, resulting in EAX, EBX, ECX, EDX, EBP, ESI, EDI, and ESP.

EAX, EBX, ECX, and EDX, but there’s an additional twist: the low 16 bits are themselves divided into two 8-bit halves, so what we have are register names on not two but three levels

The 16-bit registers AX, BX, CX, and DX are present as the lower 16-bit portions of EAX, EBX, ECX, and EDX; but AX, BX, CX, and DX are themselves divided into 8-bit halves, and assemblers recognize special names for the two halves. The A, B, C, and D are retained, but instead of the X, a half is specified with an H (for high half) or an L (for low half). Each register half is one byte (8 bits) in size.

BX there is BH and BL

The Instruction Pointers

usually called IP or, in 32-bit protected mode, EIP

It can do only one thing: it contains the offset address of the next machine instruction to be executed in the current code segment.

A code segment is an area of memory where machine instructions are stored.

The current code segment is that code segment whose segment address is currently stored in code segment register CS.

Together, CS and IP contain the full address of the next machine instruction to be executed.

The nature of this address depends on what CPU you’re using, and the programming model for which you’re using it

The Flags Register

The Three Major Assembly Programming Models

Real Mode Flat Model

CS always points to the current code segment, and the next instruction to be executed is pointed to by the IP register

You can destroy portions of the operating system by careless use of segment registers, which will cause the operating system to crash and take your program with it. This is the danger that prompted Intel to build new features into its 80386 and later CPUs to support a ‘‘protected’’ mode. In protected mode, application programs (that is, the programs that you write, as opposed to the operating system or device drivers) cannot destroy the operating system or other application programs that happen to be running elsewhere in memory via multitasking. That’s what the protected means.

The 64-bit versions of the registers are renamed beginning with an R: EAX becomes RAX, EBX becomes RBX, and so on.

Linux comes with its own linker, called ld.

Everything is a file.

When you read from a file, you are accepting data from an endpoint. The path that the data takes between files may be entirely within a single computer, or it may be between computers along a network of some kind. Data may be processed and changed along the path, or it may simply move from one endpoint to another without modification. No matter. Everything is a file, and all files are treated more or less identically by Unix’s internal file machinery

Your keyboard is a file: it’s an endpoint that generates data and sends it somewhere. Your display is a file: it’s an endpoint that receives data from somewhere and puts it where you can see it. Unix files do not have to be text files. Binary files (like the executables created by NASM and the linker) are handled the same way.

				FILE C IDENTIFIER	FILE DESCRIPTOR 	DEFAULTS TO Standard Input 		stdin 				0 					Keyboard Standard Output 	stdout 				1 					Display Standard Error 		stderr 				2  					Display

but it’s good to know that the power is there as your programming skills improve and you take on more ambitious projects.

By convention in assembly language, the first (leftmost) operand belonging to a machine instruction is the destination operand. The second operand from the left is the source operand.

Three different flavors of data may be used as operands: memory data, register data, and immediate data.

you can move an immediate value to memory, or a memory value to a register, or some other similar combination, but you can’t move a memory value directly to another memory value. This is just an inherent limitation of the current generation of x86 CPUs, and we have to live with it, inconvenient as it is at times.

to move the word in memory at the address contained in EBX into register EAX, you would use the following instruction: mov eax,[ebx]

EatMsg: db “Eat at Joe’s!” mov ecx,EatMsg

If you’ve had any exposure to high-level languages like Pascal, your first instinct might be to assume that whatever data is stored in EatMsg will be copied into ECX. Assembly doesn’t work that way. That MOV instruction actually copies the address of EatMsg, not what’s stored in (actually, at) EatMsg. In assembly language, variable names represent addresses, not data!

So how do you actually ‘‘get at’’ the data represented by a variable like EatMsg? Again, it’s done with square brackets: mov edx,[EatMsg]

EFlags as a whole is a single 32-bit register buried inside the CPU. It’s the 32-bit extended descendent of the 16-bit Flags register present in the 8086/8088 CPUs

• OF: The Overflow flag is set when the result of an arithmetic operation on a signed integer quantity becomes too large to fit in the operand it originally occupied. OF is generally used as the ‘‘carry flag’’ in signed arithmetic.

Alt text

Several x86 machine instructions come in pairs. Simplest among those are INC and DEC,

Every executable program for Linux has to have a label _ s t a r t in it somewhere, irrespective of the language it’s written in: C, Pascal, assembly, no matter. If the Linux loader can’t find the label, it can’t load the program correctly. The g l o b a l specifier tells the linker to make the _ s t a r t label visible from outside the program’s borders.

MyByte db 07h ; 8 bits in size MyWord dw 0FFFFh ; 16 bits in size MyDouble dd 0B8000000h ; 32 bits in size

Think of the DB directive as ‘‘Define Byte.’’ DB sets aside one byte of memory for data storage. Think of the DW directive as ‘‘Define Word.’’ DW sets aside one word (16 bits, or 2 bytes) of memory for data storage. Think of the DD directive as ‘‘Define Double.’’ DD sets aside a double word in memory for storage, typically for full 32-bit memory addresses.

TwoLineMsg: db “Eat at Joe’s…“,10,”…Ten million flies can’t ALL be wrong!”,10

What’s with the numeric literal 10 tucked into the previous example strings? In Linux text work, the end-of-line (EOL) character has the numeric value of 10. It indicates to the operating system where a line submitted for display to the Linux console ends.

You can concatenate such individual numbers within a string, but you must remember that, as with EOL, they will not appear as numbers. A string is a string of characters. A number appended to a string will be interpreted by most operating system routines as an ASCII character.

You can define string variables using DW or DD, but they’re handled a little differently than those defined using DB. Consider these variables: WordString: dw CQ 1

DoubleString: dd ‘Stop’ The DW directive defines a word-length variable, and a word (16 bits) may hold two 8-bit characters. Similarly, the DD directive defines a double word (32-bit) variable, which may hold four 8-bit characters.

A statement containing the directive EQU is called an equate. An equate is a way of associating a value with a label. Such a label is then treated very much like a named constant in Pascal. Any time the assembler encounters an equate during an assembly, it will swap in the equate’s value for its name. For example:

FieldWidth equ

The preceding tells the assembler that the label F i e l d w i d t h stands for the numeric value 10. Once that equate is defined, the following two machine instructions are exactly the same:

mov eax,10 mov eax,FieldWidth

EatLen: equ $-EatMsg

The assembly-time calculation is to take the location represented by the $ token (which, when the calculation is done, contains the location just past the end of the E a t M s g string) and subtract from it location of the beginning of the E a t M s g string. End - Beginning = Length

In the x86 architecture, the top of the stack is marked by a register called the stack pointer, with the formal name ESP. It’s a 32-bit register, and it holds the memory address of the last item pushed onto the stack.

the stack begins up at the ceiling, and as items are pushed onto the stack, the stack grows downward, toward low memory.

Alt text

Push-y Instructions You can place data onto the stack in several ways, but the most straightforward way involves a group of five related machine instructions: PUSH, PUSHF, PUSHFD, PUSHA, and PUSHAD. All work similarly, and differ mostly in what they push onto the stack: PUSH pushes a 16-bit or 32-bit register or memory value that is specified by you in your source code.

• PUSHF pushes the 16-bit Flags register onto the stack. • PUSHFD pushes the full 32-bit EFlags register onto the stack. • PUSHA pushes all eight of the 16-bit general-purpose registers onto the stack. • PUSHAD pushes all eight of the 32-bit general-purpose registers onto the stack.

Alt text

One excellent use of the stack allows the all-too-few registers to do multiple duty. If you need a register to temporarily hold some value to be operated on by the CPU and all the registers are in use, push one of the busy registers onto the stack. Its value will remain safe on the stack while you use the register for other things. When you’re finished using the register, pop its old value off the stack—and you’ve gained the advantages of an additional register without really having one

kernel services call gate

pooling

INT 00h - 0FFh 256 interrupts

The physical address is calculated from 2 parts. i) segment address. ii) offset address. The CS(code segment register) is used to address the code segment of the memory i.e a location in the memory where the code is stored. The IP(Instruction pointer) contains the offset within the code segment of the memory. Hence CS:IP is used to point to the location (i.e to calculate the physical address)of the code in the memory.

Since IP is 16 bit it means you can only have 64k instructions (2^16), which wasn’t much even in the 80s. So to expand the address space you have a second register which addresses 64k blocks. You could consider cs:ip together as one 32 bit register which is then capable of addressing 2^32 bytes…ie 4G which is what you get on a processor which uses 32 bit addresses. The 8086 was using 20 bits of addresses, so you could access 1M of memory.

INT 00h - 0FFh 256 interrupts

Executes the INT instuction
Interprets the INT instruction during the assembly time
Moves the INT instruction to the Vector Table
1. Vector Table occupies location 00 to 3FF of the program memory
2. It contains the Code Segment(CS) and Instruction Pointer(IP) for each kind of interrupt

At the very start of x86 memory, down at segment 0, offset 0, is a special lookup table with 256 entries. Each entry is a complete memory address including segment and offset portions, for a total of 4 bytes per entry. The first 1,024 bytes of memory in any x86 machine are reserved for this table, and no other code or data may be placed there.

Each of the addresses in the table is called an interrupt vector. The table as a whole is called the interrupt vector table.

Alt text

mov ecx,EatMsg mov edx,EatLen int 80H

This sequence of instructions requests that Linux display a text string on the console. The first line sets up a vital piece of information: the number of the service that we’re requesting. In this case, it’s to sys_write, service number 4, which writes data to a Linux file. Remember that in Linux, just about everything is a file, and that includes the console. The second line tells Linux which file to write to: standard output. Every file must have a numeric file descriptor, and the first three (0,1, and 2) are standard and never change. The file descriptor for standard output is 1. The third line places the address of the string to be displayed in ECX. That’s how Linux knows what it is that you want to display. The dispatcher expects the address to be in ECX, but the address is simply where the string begins. Linux also needs to know the string’s length, and we place that value in register EDX.

mov eax,1 Specify Exit syscall mov ebx,0 Return a code of zero int 80H Make syscall the to terminate the program

Exiting this way is not just a nicety. Every program you write must exit by making a call to s y s _ e x i t through the kernel services dispatcher. If a program just ‘‘runs off the edge’’ it will in fact end, but Linux will hand up a segmentation fault and you’ll be none the wiser as to what happened.

Software Interrupts versus Hardware Interrupts

hardware interrupts are numbered, and for each interrupt number there is a slot reserved in the interrupt vector table. In this slot is the address of an interrupt service routine (ISR) that performs some action relevant to the device that tapped the CPU on the shoulder

The only real difference between hardware and software interrupts lies in the event that triggers the trip through the interrupt vector table. With a software interrupt, the triggering event is part of the software—that is, an INT instruction. With a hardware interrupt, the triggering event is an electrical signal applied to the CPU chip itself without any INT instruction taking a hand in the process. The CPU itself pushes the return address onto the stack when it recognizes the electrical pulse that triggers the interrupt; however, when the ISR is done, an IRET instruction sends execution home, just as it does for a software interrupt.

Write:	mov eax,4 		;Specify sys_write call
		mov ebx,1 		;Specify File Description 1 standad output
		mov ecx,Buff	;Pass address of the charecter to write
		mov edx,1		;Pass number of chars to write
		int 80H 		;Call sys_write...

EOF.

When you count bits, start with the bit on the right, and number them from 0.

SHL SHifts its operand Left, whereas SHR SHifts its operand Right.

0B76FH is as follows: 1011011101101111 SHL AX,1

0110111011011110

A 0 has been inserted at the right-hand end of the number, and the whole shebang has been bumped toward the left by one digit. The last bit shifted out of the left end of the binary number is bumped into a temporary bucket for bits called the Carry flag, generally abbreviated as CF.

ROL shifts all bits left and moves bit 7 to bit 0. What was 10110010 is now 01100101

nybble??

“Greater Than” Versus “Above”

To tell the signed jumps apart from the unsigned jumps, the mnemonics use two different expressions for the relationship between two values: • Signed values are thought of as being greater than or less than. For example, to test whether one signed operand is greater than another, you would use the JG (Jump if Greater) mnemonic after a CMP instruction. • Unsigned values are thought of as being above or below. For example, to tell whether one unsigned operand is greater than (above) another, you would use the JA (Jump if Above) mnemonic after a CMP instruction.

Alt text

The three basic modes of addressing are −

Register Addressing

In this addressing mode, a register contains the operand. Depending upon the instruction, the register may be the first operand, the second operand or both.

For example,

MOV DX, TAX_RATE   ; Register in first operand
MOV COUNT, CX	   ; Register in second operand
MOV EAX, EBX	   ; Both the operands are in registers

Immediate Addressing

An immediate operand has a constant value or an expression. When an instruction with two operands uses immediate addressing, the first operand may be a register or memory location, and the second operand is an immediate constant. The first operand defines the length of the data.

For example,

BYTE_VALUE  DB  150    ; A byte value is defined
WORD_VALUE  DW  300    ; A word value is defined
ADD  BYTE_VALUE, 65    ; An immediate operand 65 is added
MOV  AX, 45H           ; Immediate constant 45H is transferred to AX

BYTE_TABLE DB  14, 15, 22, 45      ; Tables of bytes
WORD_TABLE DW  134, 345, 564, 123  ; Tables of words

The following operations access data from the tables in the memory into registers −

MOV CL, BYTE_TABLE[2]	; Gets the 3rd element of the BYTE_TABLE
MOV CL, BYTE_TABLE + 2	; Gets the 3rd element of the BYTE_TABLE
MOV CX, WORD_TABLE[3]	; Gets the 4th element of the WORD_TABLE
MOV CX, WORD_TABLE + 3	; Gets the 4th element of the WORD_TABLE

Indirect Memory Addressing

MY_TABLE TIMES 10 DW 0  ; Allocates 10 words (2 bytes) each initialized to 0
MOV EBX, [MY_TABLE]     ; Effective Address of MY_TABLE in EBX
MOV [EBX], 110          ; MY_TABLE[0] = 110
ADD EBX, 2              ; EBX = EBX +2
MOV [EBX], 123          ; MY_TABLE[1] = 123

Alt text

choice		DB	'y'
number		DW	12345
neg_number	DW	-12345
big_number	DQ	123456789
real_number1	DD	1.234
real_number2	DQ	123.456

Each byte of character is stored as its ASCII value in hexadecimal. Each decimal value is automatically converted to its 16-bit binary equivalent and stored as a hexadecimal number. Processor uses the little-endian byte ordering. Negative numbers are converted to its 2’s complement representation. Short and long floating-point numbers are represented using 32 or 64 bits, respectively.

Multiple Initializations The TIMES directive allows multiple initializations to the same value. For example, an array named marks of size 9 can be defined and initialized to zero using the following statement −

marks  TIMES  9  DW  0

Immediate addressing mode

mov A,#23H ; #23H real value

accumilator ??

Direct addressing

mov A,23H ; 23 addess value moves to A

mov 23H,33H ; 33 addess value moves to adress 23

mov 26H,#33H ; 33 value moves to address 26

Register Addessing

you can assign those variables with binary or hexadecimal values. Binary values would need to be appended by the letter ‘b’ at the end of the number. Likewise, hexadecimals with ‘h’ at the end. If the hexadecimal numbers start with letters (i.e. A, B, C, D, E, or F), you need to add a zero in front of that number and add an ‘h’ after that number.

bits  db 101001b
var2  dw 4567h
var3  dw 0BABEh

Similarly, you can load the variables to a register or store them back. You can even transfer values between registers.

mov bx, [our_var]
mov cx, bx
mov [our_var], cx

Note the square brackets when you deal with variables. these square brackets are good to distinguish the variable from its address

There are caveats in using mov command. You CANNOT use mov [var1], [var2]. In other words, mov command cannot transfer values between two variables directly

So, how can we get around with this? Use the register.

Suppose both var1 and var2 are word variables. We can use any word registers (AX, BX, CX, DX, and so on) to do the transfer. Suppose we use AX. Thus, mov [var1], [var2] must be transformed into:

mov ax, [var2] mov [var1],ax

The assembler actually treat variables as a label that has an address in the memory (RAM in most cases) associated to it.

mov will read and write 2 bytes instead of 1. Likewise, using byte register to move a word variable will only transfer one byte instead of two.

 var2 db 2
   var3 dw 305h

start:
   mov ax, [var1]  ; ax now equals to 0201h (i.e. 2*256+1)
   mov ax, [var3]  ; ax now equals to 0305h
   mov ax, [var2]  ; ax now equals to 0502h
   mov al, [var3]  ; al now equals to 05h

MOV EAX,myvar2

The EAX register is now a pointer to myvar2. It does not contain the contents of myvar2 (which may not even be a double word), but it contains the address of myvar2.

Variables as Pointers

Once we have moved an address into a 32 bit register, we are then free to move it into a double word variable for storage.

For example, suppose that EAX has been loaded with the address of some memory location storing a byte of data and suppose that we have a double word variable mypoint, say, that we want to store this address in. We simply write

MOV [mypoint],EAX

Now here comes the tricky part. Let’s suppose we want to load the contents of the memory location now pointed to by mypoint, into the CH register. Firstly we have to retrieve the address from storage:

MOV EBX,[mypoint]

Now EBX points to the location in question. Now to retrieve the byte of data at that location, we write

MOV CH,[EBX]

Here the square brackets do not denote the contents of EBX itself (for which we would just write EBX), but rather, they denote the contents of the location pointed to by EBX.

Although this usage of the square brackets may seem different to the earlier usage, it is in reality the same thing, since the thing inside the brackets is basically a pointer in both cases.

EAX used to be called the accumulator since it was used by a number of arithmetic operations, and ECX was known as the counter since it was used to hold a loop index. Whereas most of the registers have lost their special purposes in the modern instruction set, by convention, two are reserved for special purposes — the stack pointer (ESP) and the base pointer (EBP).

.DATA			
var	DB 64  		; Declare a byte, referred to as location var, containing the value 64.
var2	DB ?	; Declare an uninitialized byte, referred to as location var2.
DB 10			; Declare a byte with no label, containing the value 10. Its location is var2 + 1.
X	DW ?		; Declare a 2-byte uninitialized value, referred to as location X.
Y	DD 30000    ; Declare a 4-byte value, referred to as location Y, initialized to 30000.

C pointer

#include <stdio.h>	
int main () {

   int  var = 20;   /* actual variable declaration */
   int  *ip;        /* pointer variable declaration */

   ip = &var;  /* store address of var in pointer variable*/

   printf("Address of var variable: %x\n", &var  );

   /* address stored in pointer variable */
   printf("Address stored in ip variable: %x\n", ip );

   /* access the value using the pointer */
   printf("Value of *ip variable: %d\n", *ip );

   return 0;
}

assebmly        | c language
----------------|------------
mov eax,ebx 	| eax  = ebx
mov eax,[ebx] 	| eax  = *ebx
mov [eax],ebx 	| *eax = ebx
mov eax,[myvar] | eax  = myvar
mov eax,myvar 	| eax  = &myvar

assembly number differnces decimal or hex or ansi??

Numerical data is generally represented in binary system. Arithmetic instructions operate on binary data. When numbers are displayed on screen or entered from keyboard, they are in ASCII form.

proc_name: procedure body … ret The procedure is called from another function by using the CALL instruction. The CALL instruction should have the name of the called procedure as an argument as shown below −

CALL proc_name

Alt text

The C, C++, C#, Java, and other C-derivative programming languages use the prefix “0x” to denote a hexadecimal value. Therefore, you’d use the character sequence “0xdead” for the hexadecimal value DEAD16

So for hexadecimal values that don’t already begin with a numeric digit, you would add “0” to the beginning of the value (adding a zero to the beginning of any numeric representation does not alter the value of that representation). For example, use “0deadh” to unambiguously represent the hexadecimal value DEAD

101.01 The conversion to decimal yields: 1 × 2^2 + 1 × 2^0 + 1 × 2^-2 = 4 + 1 + 0.25 = 5.25

Tags: assembly Previous Post: JavaScript basic Next Post: Arithmetic Series

Teymur Valiyev Weblog

Assembly

Assebly

Register

Data Registers

Nibbles

Bytes

Words

Double Words

Signed and Unsigned Numbers

Shifts and Rotates

The ASCII Character Set

Buses

The system bus

The Data Bus

The Address Bus

The Control Bus

CPU Registers

The Arithmetic & Logical Unit

The Bus Interface Unit

The Control Unit and Instruction Sets

The x86 Instruction Set

The arithmetic and logical instructions

The control transfer instructions

Addressing Modes on the x86

3.3.7 Encoding x86 Instructions

Step-by-Step Instruction Execution

Memory model

The Instruction Pointers

The Flags Register

The Three Major Assembly Programming Models

Real Mode Flat Model

Software Interrupts versus Hardware Interrupts

Register Addressing

Immediate Addressing

Indirect Memory Addressing

Immediate addressing mode

Direct addressing

Register Addessing

Variables as Pointers

C pointer

Recent Posts

Java Gradle 26 Jun 2018

Java Classloader 18 Jun 2018

Java Swing 22 Apr 2018