Post

MalwareTech_VM1

MalwareTech’s VM1

Context

“vm1.exe implements a simple 8-bit virtual machine (VM) to try and stop reverse engineers from retrieving the flag. The VM’s RAM contains the encrypted flag and some bytecode to decrypt it. Can you figure out how the VM works and write your own to decrypt the flag? A copy of the VM’s RAM has been provided in ram.bin (this data is identical to the ram content of the malware’s VM before execution and contains both the custom assembly code and encrypted flag).”

Analyzing the code

Once IDA initial autoanalysis has been finished, only 2 functions remain unrecognized: 0x4022E0 and 0x402270. I’ve renamed the former VM_Fetch() and the latter VM_DecodeAndExecute().

VM_Fetch()

1
void VM_Fetch();

This function loops through an array of 507 bytes. The content of this array can be found in the .data segment at address 0x404040, and it is the virtualized code+data. At each loop, the function fetches 3 bytes from the bytearray (starting at bytearray+0xff) and passes these to function VM_DecodeAndExecute(). The snippet below illustrates the fetching of the first byte:

1
2
3
4
5
6
7
; pc = program counter (virtual eip)
; get 1st byte:
0x004022F3    movzx   ecx, [ebp+pc]
0x004022F7    mov     edx, virtualized
0x004022FD    movzx   eax, byte ptr [edx+ecx+0FFh]
0x00402305    mov     [ebp+insn_p1], eax
[...]

Bytes 2 and 3 are fetched in the same way, it’s just a matter of incrementing the variable pc. The snippet below shows the call to VM_DecodeAndExecute():

1
2
3
4
5
6
7
0x0040234B    mov     ecx, [ebp+insn_p3]
0x0040234E    push    ecx
0x0040234F    mov     edx, [ebp+insn_p2]
0x00402352    push    edx
0x00402353    mov     eax, [ebp+insn_p1]
0x00402356    push    eax
0x00402357    call    VM_DecodeAndExecute

Finally, if VM_DecodeAndExecute() returns 0 the loop ends else it continues and fetch the next 3 bytes:

1
2
3
4
5
0x00402357    call    VM_DecodeAndExecute
0x0040235C    movzx   ecx, al
0x0040235F    test    ecx, ecx
0x00402361    jnz     short continue
0x00402363    jmp     short exit

To understand the significance of these 3 bytes we need to dig into the function VM_DecodeAndExecute().

VM_DecodeAndExecute()

1
bool VM_DecodeAndExecute(BYTE, BYTE, BYTE);

This function starts by checking the value of the first byte:

1
2
3
4
5
6
7
8
9
0x00402274    mov     eax, [ebp+insn_p1]
0x00402277    mov     [ebp+mnem], eax
0x0040227A    cmp     [ebp+mnem], 1
0x0040227E    jz      short mnem_type1
0x00402280    cmp     [ebp+mnem], 2
0x00402284    jz      short mnem_type2
0x00402286    cmp     [ebp+mnem], 3
0x0040228A    jz      short mnem_type3
0x0040228C    jmp     short vm_stop

Depending on the value of this first byte, specific code paths are executed. If this value is not 1, 2, or 3 the function returns to VM_Fetch() with AL = 0, else it returns with AL = 1:

1
2
3
4
5
6
7
8
9
0x004022D1 vm_stop: 
0x004022D1    xor     al, al
0x004022D3    jmp     short exit_fn
0x004022D5 vm_continue:
0x004022D5    mov     al, 1
0x004022D7 exit_fn: 
0x004022D7    mov     esp, ebp
0x004022D9    pop     ebp
0x004022DA    retn    0Ch

Case 1

If the first byte is 1, the code below is executed:

1
2
3
4
5
6
0x0040228E mnem_type1:
0x0040228E    mov     ecx, virtualized
0x00402294    add     ecx, [ebp+insn_p2]
0x00402297    mov     dl, [ebp+insn_p3]
0x0040229A    mov     [ecx], dl
0x0040229C    jmp     short vm_continue

Abstracted as virtualized[byte2] = byte3 (“write to virtualized”).

Case 2

If the first byte is 2, the code below is executed:

1
2
3
4
5
6
0x0040229E mnem_type2:
0x0040229E    mov     eax, virtualized
0x004022A3    add     eax, [ebp+insn_p2]
0x004022A6    mov     cl, [eax]
0x004022A8    mov     vreg, cl
0x004022AE    jmp     short vm_continue

Abstracted as virtual_register = virtualized[byte2] (“read from virtualized”).

Case 3

If the first byte is 3, the code below is executed:

1
2
3
4
5
6
7
8
9
10
0x004022B0 mnem_type3:
0x004022B0    movzx   edx, vreg
0x004022B7    mov     eax, virtualized
0x004022BC    add     eax, [ebp+insn_p2]
0x004022BF    movzx   ecx, byte ptr [eax]
0x004022C2    xor     ecx, edx
0x004022C4    mov     edx, virtualized
0x004022CA    add     edx, [ebp+insn_p2]
0x004022CD    mov     [edx], cl
0x004022CF    jmp     short vm_continue

Abstracted as virtualized[byte2] ^= virtual_register (“xor and write to virtualized”).

Catching the flag

To solve this challenge without a debugger, we’ll have to automate things a little bit. I’m not a dev, but a few lines of Python did the job. Using all pieces of information we got from our analysis, we deduce all virtual instructions are 3 bytes long, even if the third byte is not always used:

1
2
3
4
5
6
7
8
9
virtual instruction:
+-------+-------+-------+
| byte1 | byte2 | byte3 |
+-------+-------+-------+
    ^       ^       ^
    |       |       |
mnemonic    |       |
      1st operand   |
              2nd operand

From there, its just a matter of using Python to fetch data and perform reads, writes and xors that have the same side effects as the virtual instructions. The full script is available here.


EOF

This post is licensed under CC BY 4.0 by the author.