MalwareTech_VM1
MalwareTech’s VM1
Context
“vm1.exe implements a simple 8-bit virtual machine (VM) to try and stop reverse engineers from retrieving the flag. The VM’s RAM contains the encrypted flag and some bytecode to decrypt it. Can you figure out how the VM works and write your own to decrypt the flag? A copy of the VM’s RAM has been provided in ram.bin (this data is identical to the ram content of the malware’s VM before execution and contains both the custom assembly code and encrypted flag).”
Analyzing the code
Once IDA initial autoanalysis has been finished, only 2 functions remain unrecognized: 0x4022E0 and 0x402270. I’ve renamed the former VM_Fetch() and the latter VM_DecodeAndExecute().
VM_Fetch()
1
void VM_Fetch();
This function loops through an array of 507 bytes. The content of this array can be found in the .data segment at address 0x404040, and it is the virtualized code+data. At each loop, the function fetches 3 bytes from the bytearray (starting at bytearray+0xff) and passes these to function VM_DecodeAndExecute(). The snippet below illustrates the fetching of the first byte:
1
2
3
4
5
6
7
; pc = program counter (virtual eip)
; get 1st byte:
0x004022F3 movzx ecx, [ebp+pc]
0x004022F7 mov edx, virtualized
0x004022FD movzx eax, byte ptr [edx+ecx+0FFh]
0x00402305 mov [ebp+insn_p1], eax
[...]
Bytes 2 and 3 are fetched in the same way, it’s just a matter of incrementing the variable pc. The snippet below shows the call to VM_DecodeAndExecute():
1
2
3
4
5
6
7
0x0040234B mov ecx, [ebp+insn_p3]
0x0040234E push ecx
0x0040234F mov edx, [ebp+insn_p2]
0x00402352 push edx
0x00402353 mov eax, [ebp+insn_p1]
0x00402356 push eax
0x00402357 call VM_DecodeAndExecute
Finally, if VM_DecodeAndExecute() returns 0 the loop ends else it continues and fetch the next 3 bytes:
1
2
3
4
5
0x00402357 call VM_DecodeAndExecute
0x0040235C movzx ecx, al
0x0040235F test ecx, ecx
0x00402361 jnz short continue
0x00402363 jmp short exit
To understand the significance of these 3 bytes we need to dig into the function VM_DecodeAndExecute().
VM_DecodeAndExecute()
1
bool VM_DecodeAndExecute(BYTE, BYTE, BYTE);
This function starts by checking the value of the first byte:
1
2
3
4
5
6
7
8
9
0x00402274 mov eax, [ebp+insn_p1]
0x00402277 mov [ebp+mnem], eax
0x0040227A cmp [ebp+mnem], 1
0x0040227E jz short mnem_type1
0x00402280 cmp [ebp+mnem], 2
0x00402284 jz short mnem_type2
0x00402286 cmp [ebp+mnem], 3
0x0040228A jz short mnem_type3
0x0040228C jmp short vm_stop
Depending on the value of this first byte, specific code paths are executed. If this value is not 1, 2, or 3 the function returns to VM_Fetch() with AL = 0, else it returns with AL = 1:
1
2
3
4
5
6
7
8
9
0x004022D1 vm_stop:
0x004022D1 xor al, al
0x004022D3 jmp short exit_fn
0x004022D5 vm_continue:
0x004022D5 mov al, 1
0x004022D7 exit_fn:
0x004022D7 mov esp, ebp
0x004022D9 pop ebp
0x004022DA retn 0Ch
Case 1
If the first byte is 1, the code below is executed:
1
2
3
4
5
6
0x0040228E mnem_type1:
0x0040228E mov ecx, virtualized
0x00402294 add ecx, [ebp+insn_p2]
0x00402297 mov dl, [ebp+insn_p3]
0x0040229A mov [ecx], dl
0x0040229C jmp short vm_continue
Abstracted as virtualized[byte2] = byte3 (“write to virtualized”).
Case 2
If the first byte is 2, the code below is executed:
1
2
3
4
5
6
0x0040229E mnem_type2:
0x0040229E mov eax, virtualized
0x004022A3 add eax, [ebp+insn_p2]
0x004022A6 mov cl, [eax]
0x004022A8 mov vreg, cl
0x004022AE jmp short vm_continue
Abstracted as virtual_register = virtualized[byte2] (“read from virtualized”).
Case 3
If the first byte is 3, the code below is executed:
1
2
3
4
5
6
7
8
9
10
0x004022B0 mnem_type3:
0x004022B0 movzx edx, vreg
0x004022B7 mov eax, virtualized
0x004022BC add eax, [ebp+insn_p2]
0x004022BF movzx ecx, byte ptr [eax]
0x004022C2 xor ecx, edx
0x004022C4 mov edx, virtualized
0x004022CA add edx, [ebp+insn_p2]
0x004022CD mov [edx], cl
0x004022CF jmp short vm_continue
Abstracted as virtualized[byte2] ^= virtual_register (“xor and write to virtualized”).
Catching the flag
To solve this challenge without a debugger, we’ll have to automate things a little bit. I’m not a dev, but a few lines of Python did the job. Using all pieces of information we got from our analysis, we deduce all virtual instructions are 3 bytes long, even if the third byte is not always used:
1
2
3
4
5
6
7
8
9
virtual instruction:
+-------+-------+-------+
| byte1 | byte2 | byte3 |
+-------+-------+-------+
^ ^ ^
| | |
mnemonic | |
1st operand |
2nd operand
From there, its just a matter of using Python to fetch data and perform reads, writes and xors that have the same side effects as the virtual instructions. The full script is available here.
EOF