Dreaming of Dragons: Compiling

Showing posts with label Compiling. Show all posts

Sunday, December 22, 2024

Wherein We Look At An ELF: Executable and Linkable Format

So, this is an ELF file, huh?

Elves

For the past few days, I've been delving into ELF (Executable and Linkable Format) binaries—specifically focusing on their structure, behavior, and ways to manipulate them. If the subject matter is of any interest to you, then check this out.

What are ELF files?

They are a "common standard file format for executable files, object code, shared libraries, and core dumps, at least according to Wikipedia.

The magic number for ELF files is 0x7f 45 4C 46. Care to guess what's 45 4C and 46 in ASCII?

Let's look at (yet another) terribly simple C script and then look under the covers:

Not terribly impressive, but we're not trying to be terrible or to impress anyone.

Remember when we talked a bit about the 4-step compilation process here? Instead of directly compiling this program, let's jump to the Assembly phase and look at the object file:

So, as we can see, we have a 64-bit ELF object file... relocatable. What does it mean for this to be relocatable?
It means that it's not dependent on specific memory addresses. So this file can be moved around without breaking its code. Our code isn't yet an executable. We're still short of that objective since we've not yet passed through the linking phase, which will or might add to it other object files or libraries, and then yes, produce our executable.

Remember that, for the most part, programmers skip and don't even think about these steps. The compilation process takes care of all of this in the background, and only if something is untoward will the programmer be warned that one of these 4 steps went awry.

And notice as well that this file is 'not stripped'. What's this, you ask? It's informing us that the the file contains the symbol table and debugging information within it. This keeps information that is useful for debugging purposes, making it easier to analyze and understand what's happening, with tools like gdb or objdump. On the other hand, stripping our binary means that both symbol table and debugging info will be removed. The symbol table contains the names of functions, files, variables and other metadata useful for debugging or reverse engineering. And that extra debugging info shows the variable types and line numbers, for example.

So, stripping will reduce the file size, hide implementation details but also make debugging a bit harder.

Under the hood

Let's do it. It's pretty simple, actually. After finally creating our binary, we can strip it with:

strip --strip-all simple_adder

Here we can see the difference between the two files, through the use of the command readelf:

It's a bit rough around the edges, but if you look carefully, you can see two files, one stripped and one not stripped, and the difference is telling, even for such a small binary.

Obviously, stripping is also used as a countermeasure and obfuscation technique.

The readelf command that you see up there is a tool for analyzing and displaying information about ELF files. With it, we can inspect the internal structure of ELF files, such as executables, shared libraries, or object files. Yes, it's what that ELF lady is doing at the beginning of this blog post. I know. Genius.

Readelf comes in handy to debug linking issues or to understand how an executable or a library is laid out.

As per usual, man files are your friend here.

.text, .data, .bss and .rodata

Elf files have critical areas, like .text which contains executable instructions, .data which stores initialized global variables, .bss which holds uninitialized global variables and .rodata, which holds read-only data, such as constant strings.

We can inspect the .text section, which could be considered as the heart of the program (holding the executable code, really), with a tool like radare2 or objdump, for example:

As you can see, this is giving us Assembly code (in AT&T, no less... yuck) revealing function prologues, loops, and system calls.

Of course, recognizing these patterns is a vital skill for the Reverse Engineer.

Try this for yourself. Also remember to check .data and .bss with:

readelf -x .data your_file

readelf -x .bss your_file

But there's more ways to inspect your ELF files. Let's look at ldd which prints shared object dependencies and use readelf with '-r' to check relocation tables:

What's all of this, you say?

With ldd we can check the shared libraries within our binary. Each listed library representing a dependency. And relocation tables are essential for adjusting addresses in a dynamically loaded binary. Entries like R_X86_64_JUMP_SLOT or R_X86_64_GLOB_DAT help resolve function calls or global variables at runtime.

There's more, of course. But this stuff is much more fun to experiment with than just to talk or write about.

Go at it. Experiment with creating your own binary files and examine them with any or all of these tools (or others). Change things, check again.

No elves were harmed in the making of this blogpost. Nor any DWARF, of course.

Sunday, December 1, 2024

Wherein We Pause to Reflect: Simple ASM Review and OS Security Mechanisms

Hey! Corny matrix-styled dojo. Why not?

Let’s kick off today’s blog post by writing a very, very simple ASM program. Since I’m reviewing ASM stuff through pwn.college’s course, why not showcase how simple ASM can be? (In a way... math is simple too, but many would argue otherwise).

Our script will be called one-oh-one.s, and our program will be called pausing—since that’s what we’ll want at the end.

Let's start with a program that starts and ends. Simple enough, right? But for that, we need to pass execution to the OS via a syscall (remember those?). To do this, we can move the value 60 into the rax register. Here's how:

Simple enough! But if we try to assemble and link this, we’ll hit an issue:

Ah, much better. Now, our previously failing program is a success... of sorts. We made sure to run with Intel syntax, which is much cleaner (at least to my eyes). We also made the _start label globally visible so that we can indicate where our program begins.

Now no more errors, and we’re ready to tackle the rest of the program. Easy peasy.

Right! While I was trying to set up a pause in my program, I ran into an issue. The syscall I thought was the right one for pausing was actually wrong. So, since I’m working on Linux with x86_64 architecture, I checked /usr/include/asm/unistd_64.h to see which syscalls I should use.

There, I found the exit syscall (60), which needs to be loaded into rax (which we’re already doing). If we want to set an exit code, we need to load it into the rdi register. Fair enough, we can do that. As for the pause syscall, we need syscall 34. I also discovered the alarm syscall (37), which sends a SIGALRM, ending the pause syscall. Without this, we'd have to ctrl+c our way out of the program. We also learned that we need to pass a parameter through rdi. Easy peasy.

Now that we have the tools, let’s create our pausing program, assemble it, link it, and run it while checking for our exit code:

Don't know about you, but I love this!

Now, let's review other concepts. For instance, in pwn.college, you can learn how to point to a specific memory value or the contents at that memory location.

Let's say there’s a memory position 12345 holding the value 42.

If we do:

mov rdi, 12345

We're making it so that rdi will hold that number 12345.

But if we do:

mov rdi, [12345]

Now rdi will hold the value at memory position 12345, which is 42, just like the value stored at that location.

I won’t dive too much into these basics because I’m sure you either already know them or want to experiment with them yourself. Just remember that the OS will have some defenses in place that may make it difficult, if not impossible, to access specific memory locations in your binary.

You can check which defenses have been enabled with a tool called checksec:

Please take your time and check what each of these items does. Here's a quick rundown:

RELRO: A security feature that makes it harder to modify certain parts of a program, like its GOT (Global Offset Table), preventing attacks on function pointers.
STACK CANARY: A protective value placed on the stack to detect buffer overflow attacks before they overwrite important data.
NX: No-Execute; prevents code from running in certain areas of memory, like the stack, to stop exploits that execute malicious code.
PIE: Position Independent Executable; allows a program to run at random memory addresses, making it harder for attackers to predict the location of code.
RPATH: A runtime setting in executables that specifies directories to search for dynamic libraries before default system paths.
RUNPATH: Similar to RPATH, but it is used after LD_LIBRARY_PATH to specify directories for locating shared libraries at runtime.
Symbols: These represent function names, variable names, and other references in the program’s code, helpful for debugging or exploitation.
FORTIFY: A set of compiler protections that enhance the safety of certain library functions, like checking buffer sizes to prevent overflows.
Fortified: Refers to functions that have been modified with additional checks (from FORTIFY_SOURCE) to improve security against buffer overflows.
Fortifiable: Functions that can potentially be fortified using additional checks to prevent common vulnerabilities like buffer overflows.
FILE: Refers to the executable or object file format, containing the machine code, symbols, and metadata that the operating system uses to load and run the program.

Like I said, a quick rundown, but it’s worth spending time learning about these defenses—what they do and how to set them up to adequately inspect a binary. Also, note that this applies to ELF files.

Alrighty, another short one! Hope you had fun.

Sunday, September 8, 2024

Wherein We Get Lost And Compare Object Dumps: C vs. Assembly

That's a rabbit hole, Alice. And those are books on shelves, all the way down.

Hi again!

I created a simple "Hello, World!" program in C, so that we could have a quick talk about function prologues and epilogues in Assembly, but we're in for a detour, as happens with all rabbit holes.

And the truth is that it's just rabbit holes as we're going down (until we reach elephants, and then it's turtles all the way down, of course).

Here's the culprit:

Ok, nothing impressive, but it does its job.

After compiling this program through the usual steps, the program runs and prints "Hello, World!" to the standard output.

Next, I wanted to create an Assembly program that would print the exact same line, and although I can read some Assembly and am making progress in that front, I can't (yet) write my own Assembly programs. So I asked our LLM friend to do it for us. And so it did:

Pretty neat.

And we can turn this into a binary file with:
nasm -f elf32 print_hello_ASM.asm -o print_hello_ASM.o

And then turn it into an actual program with:

ld -m elf_i386 print_hello_ASM.o -o print_hello_ASM

And voila! We can run this program just like with our C program...

But...

"wait, wait, wait, wait!"
You say.

"What's with the turning-the-code-into-binary-and-then-into-a-program-magic?
We don't need to compile stuff in Assembly, like we do with C?"

Well, those are great questions!

The thing is that we take compilation for granted. In fact, compilation is done in 4 steps:

- Preprocessing

- Compilation

- Assembling

- Linking

Let's ask an LLM to give us a little more information on these steps, and let it assume we want it explained in a simple manner:

Confused? Remember that you can always ask it to explain again from a different angle, in simpler terms, through analogy, etc:

We can always check more trustworthy sources, check documentation, forums, etc, like in:
https://unstop.com/blog/compilation-in-c

(I told you, it's rabbit holes most of the way down)

I'm not going to give you an in-depth explanation of these concepts (that's your job, really). But let's just say for the sake of simplicity, that when we compile our C code, we're in fact going through these four steps, and that when turning our Assembly code into a program, we just take the two last steps: Assembling and Linking (also, fyi: note that these steps can be combined or optimized in modern compilers).

To showcase the difference between these two processes and the baggage that comes along with C, let's look at an objdump of both our C and our Assembly programs.

What's an objdump? Here:

So... it's basically when we take a binary file and disassemble it back into Assembly code (+ extra info).

Then let's jump into that Assembly objdump of ours, right? Here:

And, for comparison, here's a gif with the C objdump:

Notice any difference? The C objdump file is a tad longer.
And note that I haven't included all the possible information in these dumps (checkout the man page for objdump. In particular for the -s argument).

Notice, though, that there is something we haven't seen before in our little Assembly forays. In that ASM objdump, we see these "int 0x80" lines. What are these?
Seems important enough.

These are system call interrupts, which are a way for our program to request services from the operating system's kernel. Namely, we want to be able to print our Hello World message on screen and we also want to be able to exit our program - that's what those two syscalls are doing there.

This is done behind the scenes through compilation when we're using C - so it's not all that obvious to us.

More info from our friendly LLMs:

Ah, but I just recalled that we were meant to discuss function prologues and epilogues in Assembly.

I went to https://godbolt.org/ and placed my original C code in there, and immediately got an Assembly representation of that code as well.

And lo and behold, it's even color-coded, allowing us to see exactly what is the prologue and what is the epilogue.

Here:

But I'm leaving function prologues and epilogues for an upcoming blog post.

In the meanwhile, you can always check that yourself if you're curious. Or anything, really. See something you don't understand? Leave no stone unturned! Jump into that hole, satiate your curiosity and keep learning.

Dreaming of Dragons

Sunday, December 22, 2024

Wherein We Look At An ELF: Executable and Linkable Format

Sunday, December 1, 2024

Wherein We Pause to Reflect: Simple ASM Review and OS Security Mechanisms

Sunday, September 8, 2024

Wherein We Get Lost And Compare Object Dumps: C vs. Assembly

"INTs Aren't Integers and FLOATs aren't Real"

Report Abuse

Labels