Using Ghidra to Reverse Engineer x86 Binaries

Ghidra came out a few months ago and I felt a nagging sensation that I should get back into reversing engineering again so I can try it out, but also challenge myself to see if I still remembered how to reverse engineer. Luckily I did and in the process discovered that Ghidra is amazing!

The target

When I was a kid in 2009 I played a roleplaying server on San Andreas Multiplayer (a multiplayer mod for Grand Theft Auto San Andreas for the PC). It was just a roleplaying community where you could make a character, write a backstory, and interact with other people in a large multiplayer sandbox open-world environment. It was truly the first of it’s kind and a marvel of technological innovation. A lot of reverse engineering work went into making the original GTA SA game client compatible with multiplayer, since it’s a closed-source game released by Rockstar Games. I wanted to figure out how I could reverse-engineer this to solve text-anagrams that displayed on the screen in multiplayer.

Solving anagrams

On the server that I played there was a gamemode where you had 20 seconds to solve as many anagrams as possible. Since I type really fast this was always extremely easy, but I thought it would be funny if I could solve them with a hack of some sort. In practice this would involve reading the text with the anagram that pops up on the screen, solving the anagram and then inputting the solution to the chat for the next anagram.

I ended up figuring out how to solve 80% of this in one afternoon with some trial and error using CheatEngine and Ghidra.

Reading the text on screen

The first thing I did was start up my own SAMP server and call the TextDrawShowForPlayer function so that I could emulate a testbed environment to test my cheat on. My goal was to see if I could read all the text that’s present on the player’s screen.

In CheatEngine, I did a search for the string that I drew on the screen, in this case "unscramble anagrams" on my testbed, attached a debugger and found the render call that accessed it. It was at samp.dll + B31EO so in Ghidra I went to this address and started work reverse-engineering.

The first thing I noticed was that the function was doing some sort of string manipulation to make sure that it wasn’t empty. Ghidra doesn’t replace this with strlen, but it’s sort of ambiguous what type of buffer the string is since Ghidra can’t figure it out on its own. It’s clear that the argument to the function call was the pointer to the character buffer with the rendered text data needed to be rendered since my debugger brought me to it. In theory we could just hook the render logic here, but implementing detour logic is a pain in the ass and there’s probably an easier way to access the character buffers of rendered text.

void __fastcall FUNC_0491e(char *text_render_struct) {
  char str_text_ptr;
  float fVar1;
  float fVar2;
  float fVar3;
  float fVar4;
  char *ptr_to_text;
  char *ptr_to_str_text_ptr;
  int iVar5;
  int iVar6;
  undefined4 local_24;
  
  if (*(int *)(text_render_struct + 0x9a3) != -1) {
    FUN_100b30a0();
    return;
  }
  if (text_render_struct == (char *)0x0) {
    return;
  }
  ptr_to_text = text_render_struct;
  do {
    str_text_ptr = *ptr_to_text; // <-- debugger read breakpoint
    ptr_to_text = ptr_to_text + 1;
  } while (str_text_ptr != '\0');
  if (ptr_to_text == text_render_struct + 1) {
    return;
  }
  ptr_to_text = text_render_struct + 0x321;
  ptr_to_str_text_ptr = text_render_struct;
  ...

I looked up all the functions that refer to this function, and this is where it got interesting.

void __fastcall FUNC_54701(void *string_struct) {
  int counter;
  
  counter = 0;
  if ((DAT_1026e9c4 == (int *)0x0) || (*DAT_1026e9c4 == 0)) {
    do {
      if (*(int *)((int)string_struct + counter * 4) != 0) {
        FUNC_0491e(*(char **)((int)string_struct + counter * 4 + 0x2400));
      }
      counter = counter + 1;
    } while (counter != 0x900);
  }
  return;
}

Note the function FUNC_0491e is the function which takes in the char* buffer of text to be rendered. The one thing that really interested me here was the fact that there’s a magic 0x900 in the loop, and a constant of 0x2400 offset from the struct that is passed in to FUNC_54701. I never ended up figuring out why there’s 0x2400 bytes of data before the struct array, but it seems to be some sort of data structure associated with rendering and probably the text is only a small part of the data that is used during rendering (since when drawing text there are a number of parameters like font, color, thicknes, position, etc).

The 0x900 is 2304 in decimal which is 2048 + 256 or the max number of textdraws that can be displayed on a player’s screen! I knew I was on the right track when I found this, but it took me googling the limits to put two and two together. As an aside text_rendered_struct is a struct because its fields are accessed past 0x900 bytes (the string character limit). Anyway, it’s pretty clear now we have a way of accessing all the text that can be drawed on the screen..how do we get access to this data structure without hooking?

Static pointer fun

I got pretty lucky here, so if you look up all the references of FUNC_54701 you’ll see that it gets called like this only once. For the sake of brevity I renamed FUNC_54701 to draw_bunch_of_strings

if ((((DAT_1026ea24 != 0) &&
   (iVar1 = FUN_100a0950(0,unaff_ESI,unaff_EBP,&stack0xfffffff4,unaff_EBX,uVar4,uVar3),
   iVar1 == 0)) && (DAT_1026ea0c != (void *)0x0)) &&
(string_struct = *(void **)(*(int *)((int)DAT_1026ea0c + 0x3de) + 0x20),
 string_struct != (void *)0x0)) {
    draw_bunch_of_strings(string_struct);
}

Since we want to get access to string_struct we can see it’s simply a constant offset from DAT_1026ea0c with some pointer deferencing, specifically

x = DEREF samp.dll + 0x26ea0c
y = DEREF x + 0x3de
z = DEREF y + 0x20

z = string_struct

so in our code we just have to look up samp.dll + 0x26ea0c and retrieve string_struct, then replicate the calling conventions and iterate over the data structure to get all the strings. In practice this means that the way GTA SA was written in 2004 was with a lot of global variables, so it makes it a pretty easy game to reverse engineer overall.

Conclusion

I won’t bore you with the details but it ends up looking like this (code injected in a DLL)

void __fastcall iterate_rendered_text(void* string_struct, int* DAT_1026e9c4) {
	int counter;

	counter = 0;
	if ((DAT_1026e9c4 == (int*)0x0) || (*DAT_1026e9c4 == 0)) {
		do {
			if (*(int*)((int)string_struct + counter * 4) != 0) {
				char* rendered_text = *(char**)((int)string_struct + counter * 4 + 0x2400);
				// rendered_text holds all rendered_text on screen :)
			}
			counter = counter + 1;
		} while (counter != 0x900);
	}
	return;
}

void init() {
	uintptr_t base = GetModuleBaseAddress(GetCurrentProcessId(), L"samp.dll");

	int DAT_1026ea24 = read_4_bytes((LPCVOID)(base + 0x26ea24));
	int DAT_1026ea0c = read_4_bytes((LPCVOID)(base + 0x26ea0c));

	if (DAT_1026ea24 != 0 && DAT_1026ea0c != 0x0) {
		void* string_struct = (void*)(read_4_bytes((LPCVOID)(read_4_bytes((LPCVOID)(DAT_1026ea0c + 0x3de)) + 0x20)));
		if (string_struct != 0x0) {	
			int* DAT_1026e9c4 = (int*)read_4_bytes((LPCVOID)(base + 0x26e9c4));
			iterate_rendered_text(string_struct, DAT_1026e9c4);
		}
	}
}

string_struct + 0x2400 means we’re offsetting into the second array of 2403 elements to pointers (4 bytes each) based on the state of string_struct + counter * 4 which is a pointer to something (we can’t be sure) but acts as a boolean if it’s set to null and tells us if there is a rendered_text for that element at counter.

Making something that automatically solves anagrams is easy, just use this code in another dedicated thread to loop over all the rendered text, check if the text is a solvable anagram, solve it, send a chat message and then enjoy the fruits of your labor.

I think part of the reason all the font data structures are statically allocated is because when text draw calls are made they just update an array in memory containing all the possible textdraws available since the game was programmed well with hardware limitations in mind. You wouldn’t want an infinitely long linked list of these text draws, because that could run into performance issues if the person writing the Pawno script decided to draw an indefinite number of strings which would crash the client.

Here’s proof it worked. The full source code is here.

Back