Showing posts with label MagicNumbers. Show all posts
Showing posts with label MagicNumbers. Show all posts

Sunday, December 22, 2024

Wherein We Look At An ELF: Executable and Linkable Format

 


                So, this is an ELF file, huh?


Elves

For the past few days, I've been delving into ELF (Executable and Linkable Format) binaries—specifically focusing on their structure, behavior, and ways to manipulate them. If the subject matter is of any interest to you, then check this out.

What are ELF files?

They are a "common standard file format for executable files, object code, shared libraries, and core dumps, at least according to Wikipedia.
The magic number for ELF files is 0x7f 45 4C 46. Care to guess what's 45 4C and 46 in ASCII?

Let's look at (yet another) terribly simple C script and then look under the covers:



Not terribly impressive, but we're not trying to be terrible or to impress anyone.

Remember when we talked a bit about the 4-step compilation process here? Instead of directly compiling this program, let's jump to the Assembly phase and look at the object file:


So, as we can see, we have a 64-bit ELF object file... relocatable. What does it mean for this to be relocatable?
It means that it's not dependent on specific memory addresses. So this file can be moved around without breaking its code. Our code isn't yet an executable. We're still short of that objective since we've not yet passed through the linking phase, which will or might add to it other object files or libraries, and then yes, produce our executable.

Remember that, for the most part, programmers skip and don't even think about these steps. The compilation process takes care of all of this in the background, and only if something is untoward will the programmer be warned that one of these 4 steps went awry.
And notice as well that this file is 'not stripped'. What's this, you ask? It's informing us that the the file contains the symbol table and debugging information within it. This keeps information that is useful for debugging purposes, making it easier to analyze and understand what's happening, with tools like gdb or objdump. On the other hand, stripping our binary means that both symbol table and debugging info will be removed. The symbol table contains the names of functions, files, variables and other metadata useful for debugging or reverse engineering. And that extra debugging info shows the variable types and line numbers, for example.
So, stripping will reduce the file size, hide implementation details but also make debugging a bit harder.

Under the hood

Let's do it. It's pretty simple, actually. After finally creating our binary, we can strip it with:
strip --strip-all simple_adder

Here we can see the difference between the two files, through the use of the command readelf:



It's a bit rough around the edges, but if you look carefully, you can see two files, one stripped and one not stripped, and the difference is telling, even for such a small binary.
Obviously, stripping is also used as a countermeasure and obfuscation technique.

The readelf command that you see up there is a tool for analyzing and displaying information about ELF files. With it, we can inspect the internal structure of ELF files, such as executables, shared libraries, or object files. Yes, it's what that ELF lady is doing at the beginning of this blog post. I know. Genius.

Readelf comes in handy to debug linking issues or to understand how an executable or a library is laid out.
As per usual, man files are your friend here.


.text, .data, .bss and .rodata

Elf files have critical areas, like .text which contains executable instructions, .data which stores initialized global variables, .bss which holds uninitialized global variables and .rodata, which holds read-only data, such as constant strings.

We can inspect the .text section, which could be considered as the heart of the program (holding the executable code, really), with a tool like radare2 or objdump, for example:




As you can see, this is giving us Assembly code (in AT&T, no less... yuck) revealing function prologues, loops, and system calls.
Of course, recognizing these patterns is a vital skill for the Reverse Engineer.

Try this for yourself. Also remember to check .data and .bss with:
readelf -x .data your_file
readelf -x .bss your_file

But there's more ways to inspect your ELF files. Let's look at ldd which prints shared object dependencies and use readelf with '-r' to check relocation tables:



What's all of this, you say?
With ldd we can check the shared libraries within our binary. Each listed library representing a dependency. And relocation tables are essential for adjusting addresses in a dynamically loaded binary. Entries like R_X86_64_JUMP_SLOT or R_X86_64_GLOB_DAT help resolve function calls or global variables at runtime.

There's more, of course. But this stuff is much more fun to experiment with than just to talk or write about.

Go at it. Experiment with creating your own binary files and examine them with any or all of these tools (or others). Change things, check again.


No elves were harmed in the making of this blogpost. Nor any DWARF, of course.










Monday, November 18, 2024

Wherein We Do Some Magic!: File Headers

 

All the world's a stage, and all the men and women merely players


Today, we'll be talking about File Headers, also known as Magic Numbers.


These are specific sequences of bytes at the beginning of files that identify the type and format (e.g., PNG: 89 50 4E 47). They facilitate programming by allowing quick identification of the type of file being used, precluding the need to search within the file for specific functions or structures.


GZIP: 1F 88 08


As you can see here, the GZIP signature is right at the beginning of the file. Throughout this blog post, I'll be using tools like xxd (which we've seen before) to actually check these headers.



In Reverse Engineering, these file signatures allow for quick identification of files, detect tampering with said files and determine the appropriate tools or parsers to use.

Remember that file signatures can be modified to disguise file types or to bypass detection. Malware often uses such tactics, obfuscating payloads to evade analysis.

It's a good idea to practice in identifying these numbers. If, for some reason expected signatures aren't detected, it might  be a good idea to whip out a hex editor like xxd or use tools like file or binwalk to analyze headers.  These commands and tools rely on databases of known file signatures to identify file types and structures quickly.

- The file command, in particular relies on a magic database (commonly /usr/share/misc/magic), which contains predefined patterns for file headers.

- binwalk goes beyond headers to scan the entire binary for embedded file types or compressed data. It also uses signature databases but is more specialized for firmware analysis, detecting compressed archives, or images embedded in binaries.


JPEG: FF D8 FF E0


Speaking of JPEG files, I found an interesting challengee on a CTF: I was presented with a data file which was hard to interpret. This was its header:



If we know nothing about headers, then this is meaningless. 

But if we recognize the JPEG signature, then we can see that the header is there, but reversed in 4-byte chunks (due to endianess). So I wrote a python script to process the whole file, reversing the byte order, 4 bytes at a time.



When that was done, the weird file was shown to be a well-behaved JPEG file (containing a flag). CTFs are fun!

ELF: 7F 45 46


But there's more to headers than just the initial signature. For instance, in this ELF file, if we look beyond the initial bytes, we can see:
  • 02 -> 64-bit (0x01 for 32-bit)
  • 01 -> Little-endian (0x02 for big-endian)
  • 01 -> Current version

You can extract this information with tools like readelf (as shown above). For images, tools like exiftool are handy for extracting metadata embedded in files.

There are tables and references available for identifying these headers. Take some time and explore this stuff.
Whether you're debugging a binary, hunting for a flag, or analyzing malware, knowing these magic numbers can make all the difference.

Lift the curtain and have some fun!

PS: do tarballs work as expected? 

"INTs Aren't Integers and FLOATs aren't Real"

                                     I was told this is a cat-submarine. Tail = Periscope. I believe it.   Over the past few weeks, I’ve be...