Useful Links
- x86 instruction reference: Link
- [PDF] Intel reference manual: Link to full manual
- intel_volume_1.pdf: Describes processors’ architecture and programming environment supporting IA-32 and Intel® 64 architectures.
- intel_volume_2.pdf: This document contains the full instruction set reference, A-Z, in one volume. It describes the format of the instructions and provides reference pages for them. A functional cross-volume table of contents, references, and index allow for easy navigation of the instruction set reference.
- Volume 3 and Volume 4 are not needed for this exam.
- NASM documentation: Link
- SystemV ABI: sysv_abi.pdf
- C Convention (Spanish, from a teacher): convencion_c.pdf
- GDB cheatsheet (Spanish, from a teacher): Link
- Manejo de la pila en Intel (Stack considerations in Intel)(Spanish, from a teacher): manejo_pila_intel.pdf
Arguments and register volatility in 64 bits
Non volatile registers
RBX, RBP, R12, R13, R14, R15
Return values
RAX ; integers, pointers
XMM0 ; floats
Arguments
RDI, RSI, RDX, RCX, R8, R9 ; in order
XMM0, XMM1, ..., XMM7 ; floats
PUSH <...> ; if no more registers are available
Stack
All PUSH/SUB
needs their POP/ADD
When calling a function, ensure stack aligned to 16 Bytes.
Prologue and epilogue
function_definition:
PUSH rbp
MOV rbp, rsp
; CODE
POP rbp
RET
Register names
Common functions
strcmp
int strcmp (const char* str1, const char* str2);
Parameters
- str1 - a string
- str2 - a string
Returns
- 0 if equal
-
0 if first non-matching in
str1
is greater than that ofstr2
- <0 if first non-matching in
str1
is lower than that ofstr2
strcpy
char *strcpy(char *dest, const char *src)
Parameters
-
dest − This is the pointer to the destination array where the content is to be copied.
-
src − This is the string to be copied.
Returns
- This returns a pointer to the destination string dest.
strlen
size_t strlen(const char *str)
Parameters
- str − This is the string whose length is to be found.
Returns
- Size of the string (until
\0
)
malloc
void *malloc(size_t size)
Parameters
- size − This is the size of the memory block, in bytes.
free
void free(void *ptr)
Parameters
- ptr − This is the pointer to a memory block previously allocated with malloc, calloc or realloc to be deallocated. If a null pointer is passed as argument, no action occurs.
calloc
void *calloc(size_t nitems, size_t size)
Parameters
- nitems − This is the number of elements to be allocated.
- size − This is the size of elements.
Returns
- This function returns a pointer to the allocated memory, or NULL if the request fails.
memcpy
void *memcpy(void *dest, const void * src, size_t n)
Parameters
- dest − This is pointer to the destination array where the content is to be copied, type-casted to a pointer of type void*.
- src − This is pointer to the source of data to be copied, type-casted to a pointer of type void*.
- n − This is the number of bytes to be copied.
Struct packing and padding
Shuffle mask
Given an xmm* register (starting from the right side):
127 0
| 0xF0 | 0xE0 | 0xD0 | 0xC0 |0xB0 | 0xA0 | 0x90 | 0x80 | 0x70 | 0x60 | 0x50 | 0x40 | 0x30 | 0x20 | 0x10 | 0x00 |
Mask example (for pshufb
) to destroy everything
; reads in this order
; ------>
; so 0x00 position is destroyed by this ─┐
; ┌──────────────────────┘
; |
; v
mask_example: db 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
; ∧
; │
; └────────────┐
; and the third (0x20) is destroyed by this ┘
ASCII diagram as Image: mask_example.png
Else, we set the value in the position given by the second half
; reads in this order
; ------>
mask_example: db 0x00, 0x80, 0x01, 0x80, 0x02, 0x80, 0x03, 0x80, 0x04, 0x80, 0x05, 0x80, 0x06, 0x80, 0x07, 0x80
With that mask, the register changes to:
| 0xF0 | 0xE0 | 0xD0 | 0xC0 | 0xB0 | 0xA0 | 0x90 | 0x80 | 0x70 | 0x60 | 0x50 | 0x40 | 0x30 | 0x20 | 0x10 | 0 00 |
│ │ │ │ │ │ │ │ │ │
└─────────────────These are not used────────────────────┘ │ │ │ │ │ │ │ │
┌────────────────────────────────────────────────┘ │ │ │ │ │ │ │
│ ┌─────────────────────────────────────────┘ │ │ │ │ │ │
│ │ ┌──────────────────────────────────┘ │ │ │ │ │
│ │ │ ┌───────────────────────────┘ │ │ │ │
│ │ │ │ ┌────────────────────┘ │ │ │
│ │ │ │ │ ┌─────────────┘ │ │
│ │ │ │ │ │ ┌──────┘ │
▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼
| 0..0 | 0x07 | 0..0 | 0x06 | 0..0 | 0x05 | 0..0 | 0x04 | 0..0 | 0x03 | 0..0 | 0x02 | 0..0 | 0x01 | 0..0 | 0x00 |
│ │ │ │ │ │ │ │
└─────────────┴─────────────┴────────────┴─────────────┴─────────────┴─────────────┴──────────────┘
These are zero because in the mask they were 0x80
Where the 0..0
are literal zeros and the other values are the ones that were in that position
ASCII diagram as Image: register_after_mask_pshufb.png
Reminders
- Use
0x80
to put a 0 (anything with position 7 asbit[7]
, the most significant one, works) - Use
0x0*
to put the value in0x0*
into the position in the mask
Blend mask
Similar to shuffle, if the position in the mask has a 1
, the value in dest
gets replaced by the one in that position in src
, else it’s unchanged.
// Intel manual pseudocode
IF (imm8[0] = 1) THEN DEST[15:0] := SRC[15:0]
ELSE DEST[15:0] := DEST[15:0]
IF (imm8[1] = 1) THEN DEST[31:16] := SRC[31:16]
ELSE DEST[31:16] := DEST[31:16]
IF (imm8[2] = 1) THEN DEST[47:32] := SRC[47:32]
ELSE DEST[47:32] := DEST[47:32]
IF (imm8[3] = 1) THEN DEST[63:48] := SRC[63:48]
ELSE DEST[63:48] := DEST[63:48]
IF (imm8[4] = 1) THEN DEST[79:64] := SRC[79:64]
ELSE DEST[79:64] := DEST[79:64]
IF (imm8[5] = 1) THEN DEST[95:80] := SRC[95:80]
ELSE DEST[95:80] := DEST[95:80]
IF (imm8[6] = 1) THEN DEST[111:96] := SRC[111:96]
ELSE DEST[111:96] := DEST[111:96]
IF (imm8[7] = 1) THEN DEST[127:112] := SRC[127:112]
ELSE DEST[127:112] := DEST[127:112]
The mask in blend is imm8
, so for example 01010101
(read from right to left) would leave the odd values equal to dest
, and even values replaced by src
NASM Size specifiers
- BYTE | WORD | DWORD | QWORD | TWORD | OWORD | YWORD | ZWORD (Source)
Solved problems
- ”Mirá que coincidencia” (greyscale pixels, blend, float to int, int to float, pack dword to word, word to byte): https://godbolt.org/z/Exaznxb78
Compare pixels
(Tags: pixels, pcmpgtb
)