Table of Contents
1 gdb float
#include <iostream> int main() { float f = 8.5; std::cout << f << '\n'; return 0; }
THIS IS little endian example
(gdb) p f $1 = 8.5 (gdb) x/4tb &f 0x7fffffffc22c: 00000000 00000000 00001000 01000001
Python
# split by space and reverse ''.join('00000000 00000000 00001000 01000001'.split()[::-1])
1.1 float 8.5 IEEE
8.5 (https://resources.nerdfirst.net/float)
Final Bit String: 0 10000010 0001000 00000000 00000000
1.2 little or big endian
Intel processors are little endian machines
int x = 0x76543210; char *c = (char*) &x;
Now if you take a pointer c of type charand
assign x‘s address to c by casting x to char pointer,
then on little endian architecture
you will get 0x10 when *c is printed
and on big endian architecture you will get 0x76 while printing down *c.
1.3 verilog
input [7:0] address;
Note the [7:0] means we're using the little-endian convention
You start with 0 at the rightmost bit to begin the vector, then move to the left.
http://www.asic-world.com/verilog/verilog_one_day1.html
—
CPU 所儲存一個 byte 內部是否也有 big endian 和 little endian 之分
例如 0xB4 (10110100)
lsb msb ---------------------------------------------------------------> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
msb lsb ----------------------------------------------------------------> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
但 CPU 操作的最小單位是 一個 byte 所以其內部的 bit order 對應用程式來說是一個黑盒子
比如 0x12345678
低地址 高地址 -----------------------------------------> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 78 | 56 | 34 | 12 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Big Endian
低地址 高地址 --------------------------------------------------------> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 12 | 34 | 56 | 78 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ big endian是指低地址存放最高有效字节(MSB)
1.4 float IEEE 754
equation https://youtu.be/gc1Nl3mmCuY?t=315 exponent: unsigned number (offset 127) https://youtu.be/gc1Nl3mmCuY?t=450 https://www.youtube.com/watch?v=gc1Nl3mmCuY https://www.youtube.com/watch?v=b2FgF2sUoS8 https://resources.nerdfirst.net/float simulator #self-note denormal float->bit https://youtu.be/b2FgF2sUoS8?t=507 denormal float->bit https://youtu.be/b2FgF2sUoS8?t=540
1.5 二的補數
https://www.cnblogs.com/zhangziqiu/archive/2011/03/30/computercode.html
在二的補數系統中,某個整數 x 加上負號,會等於對該數進行 not 運算後加上 1 的值。 (-x ≡ ~x + 1)
原碼, 反碼, 補碼
原碼 原碼就是符號位加上真值的絕對值, 即用第一位表示符號, 其餘位表示值. 比如如果是8位二進制:
比如如果是8位二進制:
[+1]原 = 0000 0001 [-1]原 = 1000 0001
第一位是符號位. 因為第一位是符號位, 所以8位二進制數的取值範圍就是:
[1111 1111 , 0111 1111]
即
[-127 , 127]
反碼
正數的反碼是其本身 負數的反碼是在其原碼的基礎上, 符號位不變,其餘各個位取反.
[+1] = [00000001]原 = [00000001]反 [-1] = [10000001]原 = [11111110]反
如果一個反碼表示的是負數, 人腦無法直觀的看出來它的數值. 通常要將其轉換成原碼再計算.
補碼 正數的補碼就是其本身
負數的補碼是在其原碼的基礎上, 符號位不變, 其餘各位取反, 最後+1. (即在反碼的基礎上+1)
[+1] = [00000001]原 = [00000001]反 = [00000001]補 [-1] = [10000001]原 = [11111110]反 = [11111111]補
對於負數, 補碼表示方式也是人腦無法直觀看出其數值的. 通常也需要轉換成原碼在計算其數值.
1.6 stack
Journey to the Stack, Part I https://www.zcfy.cc/article/journey-to-the-stack-part-i https://manybutfinite.com/post/journey-to-the-stack/
2 disassemble
The instruction pointer is a register that stores the memory address of the next instruction.
On x8664, athat register is %rip.
We can access the instruction pointer using the $rip variable, or alternatively we can use the architecture independent $pc
(gdb) x/i $pc 0x100000e94 <natural_generator+4>: movl $0x1,-0x4(%rbp)
The instruction pointer always contains the address of the next instruction to be run.
In GDB 7.0 or later, you can just run set disassemble-next-line on, which shows all the instructions that make up the next line of source
set disassemble-next-line on
在函数的第一条汇编指令打断点 https://wizardforcel.gitbooks.io/100-gdb-tips/break-on-first-assembly-code.html
通常给函数打断点的命令:“b func”(b是break命令的缩写),不会把断点设置在汇编指令层次函数的开头
(gdb) disassemble
如果要把断点设置在汇编指令层次函数的开头,要使用如下命令:“b *func”
(gdb) b *func
—
可以用“disas /m fun”(disas是disassemble命令缩写)命令将函数代码和汇编指令映射起来
如果只想查看某一行所对应的地址范围,可以:
(gdb) i line 13 Line 13 of "foo.c" starts at address 0x4004e9 <main+37> and ends at 0x40050c <main+72>.
disassemble [Start],[End] (gdb) disassemble 0x4004e9, 0x40050c
— 显示将要执行的汇编指令
display /i $pc display /3i $pc
(gdb) i registers (gdb) info registers
以上输出不包括浮点寄存器和向量寄存器的内容
(gdb) i all-registers
C++ 反汇编笔记 https://bot-man-jl.github.io/articles/?post=2019/Cpp-Disassembly-Notes
3 cpp weekly assembly
3.1 C++ Weekly - Ep 34 - Reading Assembly Language - Part 1 (intel syntax, AT&T syntax)
https://www.youtube.com/watch?v=my39Gpt6bvY
Compiler Explorer https://godbolt.org/
int main() { }
no optimization Intel assembly language syntax the source is on the right hand side: rsp the destination is on the left: rbp
main: push rbp mov rbp, rsp mvo eax, 0 pop rbp ret
4:00 return 0; return 5;
xor eax, eax
There is also AT&T syntax my understanding is AT&T syntax is the older syntax and it more or less readable depending on who you're talk to
main: xorl %eax, %eax ret
int main() { return 5; }
AT&T syntax
main: movl $5, %eax ret
intel syntax
main: move eax, 5 ret
In a way i prefer AT&T syntax I think it is a little more readable for left to right movement makes more sense to me but a lot of the world is moving towards Intel syntax
3.2 C++ Weekly - Ep 35 - Reading Assembly Language - Part 2
https://www.youtube.com/watch?v=R3HZJ1h2BVY
I have been focusing specifically on 64-bit intel compatible assmbly but these concept should apply to most other things
int main(int argc, const char* []) { return 5; }
main: push rbp mov rbp, rsp mov DWORD PTR[rbp-4], edi mov DWORD PTR[rbp-16], rsi mov eax, 5 pop rbp ret
return 5 + 3*argc
-O1 3:06
main: lea eax, [rdi+5+rdi*2] ret
index memory location
rdi - argc 4 byte to do the math