博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
【pwnable.kr】memcpy - RDSTC指令对齐 - malloc chunk
阅读量:4208 次
发布时间:2019-05-26

本文共 11890 字,大约阅读时间需要 39 分钟。

memcpy - 10 ptAre you tired of hacking?, take some rest here.Just help me out with my small experiment regarding memcpy performance. after that, flag is yours.http://pwnable.kr/bin/memcpy.cssh memcpy@pwnable.kr -p2222 (pw:guest)

本关看描述似乎是考察的memcpy效率有关的事情,没有给binary,nc之后只能看到readme和源码。

readme内容如下。意思是binary只有memcpy_pwn权限才可以看。netstat 或lsof 查看9022端口对应的进程也查看不到。

the compiled binary of "memcpy.c" source code (with real flag) will be executed under memcpy_pwn privilege if you connect to port 9022.execute the binary by connecting to daemon(nc 0 9022).

那么直接看源码吧,源码如下。mmap开辟了内存,调用slow_memcpy 按字节复制,调用fast_memory按64byte复制。分别计算两个函数所用的时间,运行到最后会给出flag。

// compiled with : gcc -o memcpy memcpy.c -m32 -lm  #-lm 引入libm库 for pow函数[1]#include 
#include
#include
#include
#include
#include
#include
unsigned long long rdtsc(){ asm("rdtsc");}char* slow_memcpy(char* dest, const char* src, size_t len){ int i; for (i=0; i
= 64){ i = len / 64; len &= (64-1); while(i-- > 0){ __asm__ __volatile__ ( "movdqa (%0), %%xmm0\n" "movdqa 16(%0), %%xmm1\n" "movdqa 32(%0), %%xmm2\n" "movdqa 48(%0), %%xmm3\n" "movntps %%xmm0, (%1)\n" "movntps %%xmm1, 16(%1)\n" "movntps %%xmm2, 32(%1)\n" "movntps %%xmm3, 48(%1)\n" ::"r"(src),"r"(dest):"memory"); dest += 64; src += 64; } } // byte-to-byte slow copy if(len) slow_memcpy(dest, src, len); return dest;}int main(void){ setvbuf(stdout, 0, _IONBF, 0); setvbuf(stdin, 0, _IOLBF, 0); printf("Hey, I have a boring assignment for CS class.. :(\n"); printf("The assignment is simple.\n"); printf("-----------------------------------------------------\n"); printf("- What is the best implementation of memcpy? -\n"); printf("- 1. implement your own slow/fast version of memcpy -\n"); printf("- 2. compare them with various size of data -\n"); printf("- 3. conclude your experiment and submit report -\n"); printf("-----------------------------------------------------\n"); printf("This time, just help me out with my experiment and get flag\n"); printf("No fancy hacking, I promise :D\n"); unsigned long long t1, t2; int e; char* src; char* dest; unsigned int low, high; unsigned int size; // allocate memory char* cache1 = mmap(0, 0x4000, 7, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0); char* cache2 = mmap(0, 0x4000, 7, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0); src = mmap(0, 0x2000, 7, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0); size_t sizes[10]; int i=0; // setup experiment parameters for(e=4; e<14; e++){ // 2^13 = 8K low = pow(2,e-1); high = pow(2,e); printf("specify the memcpy amount between %d ~ %d : ", low, high); scanf("%d", &size); if( size < low || size > high ){ printf("don't mess with the experiment.\n"); exit(0); } sizes[i++] = size; } sleep(1); printf("ok, lets run the experiment with your configuration\n"); sleep(1); // run experiment for(i=0; i<10; i++){ size = sizes[i]; printf("experiment %d : memcpy with buffer size %d\n", i+1, size); dest = malloc( size ); memcpy(cache1, cache2, 0x4000); // to eliminate cache effect ??? t1 = rdtsc(); slow_memcpy(dest, src, size); // byte-to-byte memcpy t2 = rdtsc(); printf("ellapsed CPU cycles for slow_memcpy : %llu\n", t2-t1); memcpy(cache1, cache2, 0x4000); // to eliminate cache effect ??? t1 = rdtsc(); fast_memcpy(dest, src, size); // block-to-block memcpy t2 = rdtsc(); printf("ellapsed CPU cycles for fast_memcpy : %llu\n", t2-t1); printf("\n"); } printf("thanks for helping my experiment!\n"); printf("flag : ----- erased in this source code -----\n"); return 0;}

直接nc到服务器上之后发现,在第五个实验,fast_memcpy的时候出了问题。

memcpy@ubuntu:~$ nc 0 9022Hey, I have a boring assignment for CS class.. :(The assignment is simple.------------------------------------------------------ What is the best implementation of memcpy?        -- 1. implement your own slow/fast version of memcpy -- 2. compare them with various size of data         -- 3. conclude your experiment and submit report     ------------------------------------------------------This time, just help me out with my experiment and get flagNo fancy hacking, I promise :Dspecify the memcpy amount between 8 ~ 16 : 8specify the memcpy amount between 16 ~ 32 : 16specify the memcpy amount between 32 ~ 64 : 32specify the memcpy amount between 64 ~ 128 : 64specify the memcpy amount between 128 ~ 256 : 128specify the memcpy amount between 256 ~ 512 : 256specify the memcpy amount between 512 ~ 1024 : 512specify the memcpy amount between 1024 ~ 2048 : 1024specify the memcpy amount between 2048 ~ 4096 : 2048specify the memcpy amount between 4096 ~ 8192 : 4096ok, lets run the experiment with your configurationexperiment 1 : memcpy with buffer size 8ellapsed CPU cycles for slow_memcpy : 1551ellapsed CPU cycles for fast_memcpy : 540experiment 2 : memcpy with buffer size 16ellapsed CPU cycles for slow_memcpy : 342ellapsed CPU cycles for fast_memcpy : 441experiment 3 : memcpy with buffer size 32ellapsed CPU cycles for slow_memcpy : 534ellapsed CPU cycles for fast_memcpy : 654experiment 4 : memcpy with buffer size 64ellapsed CPU cycles for slow_memcpy : 960ellapsed CPU cycles for fast_memcpy : 135experiment 5 : memcpy with buffer size 128ellapsed CPU cycles for slow_memcpy : 1812memcpy@ubuntu:~$

本地编译一下,gdb挂上看一下crash信息是在movntps[3]这个指令。在fast_memcpy时,movntps将源操作数(XMM寄存器)的内容copy给目的操作数(edx指向的内存)。根据movntps指令的描述信息[3]得知该指令的内存操作数对齐必须16byte对齐(IDA中也可以看到操作的数是128bit 也就是16byte)。这也就明白了为什么下面的crash信息了,因为0x804c4a8没有16byte对齐。

The memory operand must be aligned on a 16-byte (128-bit version), 32-byte (VEX.256 encoded version) or 64-byte (EVEX.512 encoded version) boundary otherwise a general-protection exception (#GP) will be generated.

edx指向的内存在堆中,根据glibc 管理堆块的数据结构叫malloc_chunk,长这样:

struct malloc_chunk {  INTERNAL_SIZE_T      prev_size;  /* Size of previous chunk (if free).  */  INTERNAL_SIZE_T      size;       /* Size in bytes, including overhead. */  struct malloc_chunk* fd;         /* double links -- used only if free. */  struct malloc_chunk* bk;  /* Only used for large blocks: pointer to next larger size.  */  struct malloc_chunk* fd_nextsize; /* double links -- used only if free. */  struct malloc_chunk* bk_nextsize;};/*  An allocated chunk looks like this:    chunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+	    |             Size of previous chunk, if allocated            | |	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+	    |             Size of chunk, in bytes                       |M|P|      mem-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+	    |             User data starts here...                          .	    .                                                               .	    .             (malloc_usable_size() bytes)                      .	    .                                                               |nextchunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+	    |             Size of chunk                                     |	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+	    	    */

当malloc的size>=64时,会调用fast_memcpy。size=64,malloc返回的位置是0x804c460,size=128,malloc返回的位置是0x804c4a8。因此,需要修改size=64时的size,比如修改成64+8=72,让malloc 128返回的地址是16的倍数,如0x804c4b0。

pwndbg> x/600wx 0x804c4000x804c400:	0x00000000	0x00000000	0x00000000	0x00000011 //80x804c410:	0x00000000	0x00000000	0x00000000	0x00000019 //160x804c420:	0x00000000	0x00000000	0x00000000	0x000000000x804c430:	0x00000000	0x00000029	0x00000000	0x00000000 //320x804c440:	0x00000000	0x00000000	0x00000000	0x000000000x804c450:	0x00000000	0x00000000	0x00000000	0x00000049 //640x804c460:	0x00000000	0x00000000	0x00000000	0x000000000x804c470:	0x00000000	0x00000000	0x00000000	0x000000000x804c480:	0x00000000	0x00000000	0x00000000	0x000000000x804c490:	0x00000000	0x00000000	0x00000000	0x000000000x804c4a0:	0x00000000	0x00000089	0x00000000	0x00000000 //1280x804c4b0:	0x00000000	0x00000000	0x00000000	0x000000000x804c4c0:	0x00000000	0x00000000	0x00000000	0x00000000

本质上,只需要chunk的地址是0x8结尾即可,malloc的返回的地址加上chunk head就是0x10。假设64~128的chunk size的地址为8,可写出如下脚本计算要输入的size:

#!/usr/bin/pythonprev_size = 4align_length = 2 * prev_sizedef get_malloc_chunk_size(size):    data_size = 0    if size < align_length + prev_size:  # < 2*4+4        data_size = align_length    else:        if (size  - (size / align_length) * align_length) > prev_size:            data_size = align_length * (1 + size / align_length)        else:            data_size = align_length * (size / align_length)    return 2 * prev_size + data_size # chunk_header  + data_sizedef check_addr(addr):    if addr & 0x8 == 0x8:        return True    else:        return Falseif __name__ == '__main__':    addr = 8    for i in xrange(6,13,1): # 6 7 8 9 10 11 12        print pow(2,i), ' <= size < ', pow(2,i+1)        chunk_size = get_malloc_chunk_size(pow(2,i))        addr = chunk_size + addr        if check_addr(addr):            print '  Correct, input :' ,pow(2,i)            # print 'Size: ', pow(2,i)        else:            print '  failed, input :', pow(2,i)+8            addr += 8            # print '+8 Now addr:', addr

输出:

$ python malloc_size.py64  <= size <  128  failed, input : 72128  <= size <  256  failed, input : 136256  <= size <  512  failed, input : 264512  <= size <  1024  failed, input : 5201024  <= size <  2048  failed, input : 10322048  <= size <  4096  failed, input : 20564096  <= size <  8192  failed, input : 4104

写完这个脚本发现了一个trick,malloc的chunk是8byte对齐的,如果当前size的chunk不满足条件,size+8之后的chunk肯定符合条件。XD

 

flag

➜  ~ nc pwnable.kr 9022Hey, I have a boring assignment for CS class.. :(The assignment is simple.------------------------------------------------------ What is the best implementation of memcpy?        -- 1. implement your own slow/fast version of memcpy -- 2. compare them with various size of data         -- 3. conclude your experiment and submit report     ------------------------------------------------------This time, just help me out with my experiment and get flagNo fancy hacking, I promise :Dspecify the memcpy amount between 8 ~ 16 : 8specify the memcpy amount between 16 ~ 32 : 16specify the memcpy amount between 32 ~ 64 : 32specify the memcpy amount between 64 ~ 128 : 72specify the memcpy amount between 128 ~ 256 : 136specify the memcpy amount between 256 ~ 512 : 264specify the memcpy amount between 512 ~ 1024 : 520specify the memcpy amount between 1024 ~ 2048 : 1032specify the memcpy amount between 2048 ~ 4096 : 2056specify the memcpy amount between 4096 ~ 8192 : 4104ok, lets run the experiment with your configurationexperiment 1 : memcpy with buffer size 8ellapsed CPU cycles for slow_memcpy : 2271ellapsed CPU cycles for fast_memcpy : 357experiment 2 : memcpy with buffer size 16ellapsed CPU cycles for slow_memcpy : 309ellapsed CPU cycles for fast_memcpy : 303experiment 3 : memcpy with buffer size 32ellapsed CPU cycles for slow_memcpy : 519ellapsed CPU cycles for fast_memcpy : 504experiment 4 : memcpy with buffer size 72ellapsed CPU cycles for slow_memcpy : 1056ellapsed CPU cycles for fast_memcpy : 240experiment 5 : memcpy with buffer size 136ellapsed CPU cycles for slow_memcpy : 1917ellapsed CPU cycles for fast_memcpy : 267experiment 6 : memcpy with buffer size 264ellapsed CPU cycles for slow_memcpy : 3621ellapsed CPU cycles for fast_memcpy : 270experiment 7 : memcpy with buffer size 520ellapsed CPU cycles for slow_memcpy : 7194ellapsed CPU cycles for fast_memcpy : 336experiment 8 : memcpy with buffer size 1032ellapsed CPU cycles for slow_memcpy : 13878ellapsed CPU cycles for fast_memcpy : 474experiment 9 : memcpy with buffer size 2056ellapsed CPU cycles for slow_memcpy : 27573ellapsed CPU cycles for fast_memcpy : 846experiment 10 : memcpy with buffer size 4104ellapsed CPU cycles for slow_memcpy : 59736ellapsed CPU cycles for fast_memcpy : 1638thanks for helping my experiment!flag : 1_w4nn4_br34K_th3_m3m0ry_4lignm3nt

 

参考

  1. -lm参数 
  2. RDSTC 
  3. movntps 
  4. glibc chunk 
  5. pwnable.kr memcpy 
你可能感兴趣的文章
数据库中的空值与NULL的区别以及python中的NaN和None
查看>>
python pandas消除空值和空格以及 Nan数据替换
查看>>
pandas中apply函数的用法
查看>>
python---pandas.merge使用
查看>>
Pandas 行列操作
查看>>
通过Pandas读取大文件
查看>>
解决vim编辑文件时中文乱码
查看>>
Python3—UnicodeEncodeError 'ascii' codec can't encode characters in position 0-1
查看>>
Linux中文显示乱码?如何设置centos显示中文
查看>>
ubuntu 安装pip3 遇到Ignoring ensurepip failure: pip 8.1.1 requires SSL/TLS错误
查看>>
Linux 升级 Python 至 3.x
查看>>
centos6 python 安装 sqlite 解决 No module named ‘_sqlite3′
查看>>
pyTelegramBotAPI Read timed out. (read timeout=30) Error
查看>>
kalid 运行thchs30 报错 Caution: the last few frames of the wav file may not be decoded properly.
查看>>
Anaconda 管理python环境
查看>>
kaldi在线语音识别bug解决
查看>>
基于kaldi、thchs30 的离线中文识别
查看>>
np.linalg.norm(求范数)
查看>>
numpy.transpose()
查看>>
python 实现MFCC
查看>>