User Tools

Site Tools


x86 Assembly


Category Instructions
Data transfer mov, cmovcc, push, pop, pushad, popad, xchg, xadd, movsx, movzx
Data comparison cmp, cmpxchg, cmpxchg8b
Data conversion cbw, cwde, cwd, cdq, bswap, movbe, xlatb
Binary arithmetic add, adc, sub, sbb, imul, mul, idiv, div, inc, dec, neg, daa, das, aaa, aas, aam, aad
Logical and, or, xor, not, test
Rotate and shift rcl, rcr, rol, ror, sal/shl, sar, shr, shld, shrd
Byte set and bit strings setcc, bt, bts, btr, btc, bsf, bsr
String cmpsb, cmpsw, cmpsd, lodsb, lodsw, lodsd, movsb, movsw, movsd, scasb, scasw, scasd, stosb, stosw, stosd, rep, repe, repz, repne, repnz
Flag manipulation clc, stc, cmc, std, cld, lahf, sahf, pushfd, popfd
Control transfer jmp, jcc, call, ret, enter, leave, jecxz, loop, loope, loopz, loopne, loopnz
Miscellaneous bound, lea, nop, cpuid

Data Types

Data types supported by the x86 platform:

Data Type C++ Type Bits
Byte char 8
Word short 16
Doubleword int 32
Quadword long long 64
Quintword 80
Double Quadword 128
Quad Quadword 256


Calculate Sum

; Produce code for the flat memory model.
; Use C-style names for public symbols.
; Memory block that contains executable code.
; The beginning of the procedure.
AsmCalcSum PROC
; Function Prolog:
    push ebp                    ; save the EBP register on the stack
    mov ebp,esp                 ; copy ESP to EBP i.e., set EBP as the stack frame pointer
                                ; this enables access to the function's arguments
; ESP always points to the stack's top-most item
; EBP is used as a base pointer to access data on the stack
; Stack:
; High Memory ...
;             [ c ] = [ebp + 16]
;             [ b ] = [ebp + 12]
;             [ a ] = [ebp + 8]
;             [ret] = [ebp + 4] contains the return address
; Low Memory  [ebp] contains the old value of EBP pushed on the stack using push ebp
;                   this is the stack's top-most item pointed by ESP and EBP (because of mov ebp,esp)
; Load the argument values.
    mov eax,[ebp + 8]           ; a --> EAX
    mov ecx,[ebp + 12]          ; b --> ECX
    mov edx,[ebp + 16]          ; c --> EDX
; Calculate the sum.
    add eax,ecx                 ; eax += ecx
    add eax,edx                 ; eax += edx
; An x86-32 assembly function must use EAX to return a 32-bit
; integer to its calling function.
; Function Epilog:
    pop ebp                     ; restore the caller's stack frame pointer
; The end of the procedure.
AsmCalcSum ENDP
; The end of statements in the file.
#include <stdio.h>
extern "C" int AsmCalcSum(int a, int b, int c);
void CalcSum()
    int a = 17, b = 11, c = 14;
    int sum = AsmCalcSum(a, b, c);
    printf("a = %d, b = %d, c = %d\n", a, b, c);
    printf("sum = %d\n\n", sum);


Year CPU Microarchitecture Features
1985 80386 - 32-bit registers and data types
- flat memory model option
- 4GB logical address space
- paged virtual memory
- separate 80387 FPU
1989 80486 - on-chip memory caches
- optimized instructions
- integrated x87 FPU
1993 Pentium P5 - dual-instruction execution pipeline
- 64-bit external data bus
- separate on-chip code and data caches
- MMX - the technology supporting SIMD on packed integers using 64-bit registers
1995 Pentium Pro P6 - three-way superscalar design
- support for out-of-order instruction execution
- improved branch-prediction algorithms
- speculative instruction execution
1997 Pentium II P6
1999 Pentium III P6 SSE SIMD extensions:
- 128-bit registers
- packed single-precision floating-point arithmetic
2000 Pentium 4 Netburst SSE2 SIMD extensions:
- packed double-precision (64-bit) values
- 128-bit SSE registers can be used for packed integer calculations and scalar floating-point operations
2004 Pentium 4
90 nm and smaller
Netburst SSE3 SIMD extensions:
- hyper-threading technology
- packed integer and packed floating-point instructions
2006 Core 2 Duo
Core 2 Quad
Xeon 3000/5000
Core - improved performance
- reduced power consumption
- no hyper-threading
SSSE3. SSE4.1:
- new packed integer and floating-point instructions
2008 Core i3,i5,i7
(1st generation)
Xeon 7000
Nehalem - reintroduction of hyper-threading
- new accelerator instructions
- instructions for text-string processing
2011 Core i3,i5,i7
(2nd and 3rd generations)
Xeon E3/E5/E7
Sandy Bridge New SIMD technology AVX:
- 256-bit regsiters for packed floating-point operations (single- and double-precision)
- three-operand instruction syntax
2013 Core i3,i5,i7
(4th generation)
Xeon E3 (v3)
Haswell FMS (fuse-multiply-add) operations
- 256-bit registers for packed integer operations
- enhanced data transfer capabilities: broadcast, gather, permute instructions


  • Microarchitecture defines the organization of a processor's internal components, including registers, execution units, instruction pipelines, data buses, and memory caches.
  • In the three-way superscalar design, the processor is able to decode, dispatch, and execute three distinct instructions during each clock cycle.
  • The address of properly-aligned data type is divisible by its size in bytes.
  • SIMD - Single Instruction Multiple Data
notes/assembly.txt · Last modified: 2020/08/26 (external edit)