Post

PyGhidra - Get Data Referenced in a Code Unit

Tested with Ghidra 12.0.3

Introduction

Let’s look at the following disassembly:

1
140001529 LEA  RAX,[opcode_table]; opcode_table is at 0x140005080

I want to retrieve the data at 0x140005080 because I want to do Python things on them. in order to do so, I also need the length of these data. Here it is 0xB00 bytes; I won’t explain how I get this length for it is of no importance for the purpose of this post. Just recall that we need to known in advance the length of the data to retrieve.

Keywords to dig in the doc:

  • Reference
  • CodeUnit
  • MemoryBlock

The code shown below also use jpype.JByte() to create an output buffer; it is required to use the method getBytes(). There may be other ways, I don’t really know; here I’m using what I saw in one of the example script (./ghidra_12.0.3_PUBLIC/Ghidra/Features/PyGhidra/ghidra_scripts/PyGhidraBasics.py).

Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
# Get data referenced in a code unit
#@author silma
#@category _MyScripts
#@keybinding 
#@menupath 
#@toolbar 


"""
Target asm block:
14000151c 55                         PUSH   RBP
14000151d 48 89 e5                   MOV    RBP,RSP
140001520 48 83 ec 10                SUB    RSP,0x10
140001524 89 c8                      MOV    EAX,ECX
140001526 88 45 10                   MOV    byte ptr [RBP + inByte],AL
140001529 48 8d 05 50 3b 00 00       LEA    RAX,[opcode_table] ; <-- here

target insn address: 0x140001529
target bytes: 48 8d 05 50 3b 00 00
target instruction: LEA RAX,[0x140005080]
"""

# Set those variable according to your needs
TARGET_ADDRESS = 0x140001529
ENTRY_COUNT = 88
ENTRY_SIZE = 0x20
TARGET_DATA_LEN = ENTRY_COUNT * ENTRY_SIZE

import binascii
import jpype

from ghidra.program.model.listing import CodeUnit
from ghidra.program.model.symbol import Reference


def get_data_from_ref(ref: Reference) -> bytes:
    """
    adapted from PyGhidraBasics.py
    """
   
    byte_array = jpype.JByte[TARGET_DATA_LEN]  # <java class 'byte[]'>
    dest = ref.getToAddress()  # Address
    block_name = currentProgram.memory.getBlock(dest).getName()  # str
    block = currentProgram.memory.getBlock(block_name) # MemoryBlock
    count = block.getBytes(dest, byte_array)  # int
    return bytes(byte_array)


def get_target_code_unit(a: int) -> CodeUnit:
    # Create an 'Address' object
    target_addr = address_factory.getDefaultAddressSpace().getAddress(a)
    target_cu = listing.getCodeUnitContaining(target_addr)
    assert(target_cu.getAddress() == target_addr)
    target_func = f_mgr.getFunctionContaining(target_addr)  # Function
    print(f"target code unit '{target_cu}' found in function '{target_func}'")
    return target_cu


address_factory = currentProgram.getAddressFactory()
listing = currentProgram.getListing()
f_mgr = currentProgram.getFunctionManager()  

target_cu = get_target_code_unit(TARGET_ADDRESS)
    
# https://github.com/NationalSecurityAgency/ghidra/discussions/3655
# Either cu.getPrimaryReference(i) || cu.getAddress(i), where i is operand index.
# Here, instruction is 'LEA RAX,[0x140005080]':
#   operand[0] is RAX
#   operand[1] is the reference we want
# And if cu.getAddress() is called with no parameter, we get 'TARGET_ADDRESS'.
ref = target_cu.getPrimaryReference(1)
data = get_data_from_ref(ref)
print(data)

EOF

This post is licensed under CC BY 4.0 by the author.