Monday, November 18, 2024

Wherein We Do Some Magic!: File Headers

 

All the world's a stage, and all the men and women merely players


Today, we'll be talking about File Headers, also known as Magic Numbers.


These are specific sequences of bytes at the beginning of files that identify the type and format (e.g., PNG: 89 50 4E 47). They facilitate programming by allowing quick identification of the type of file being used, precluding the need to search within the file for specific functions or structures.


GZIP: 1F 88 08


As you can see here, the GZIP signature is right at the beginning of the file. Throughout this blog post, I'll be using tools like xxd (which we've seen before) to actually check these headers.



In Reverse Engineering, these file signatures allow for quick identification of files, detect tampering with said files and determine the appropriate tools or parsers to use.

Remember that file signatures can be modified to disguise file types or to bypass detection. Malware often uses such tactics, obfuscating payloads to evade analysis.

It's a good idea to practice in identifying these numbers. If, for some reason expected signatures aren't detected, it might  be a good idea to whip out a hex editor like xxd or use tools like file or binwalk to analyze headers.  These commands and tools rely on databases of known file signatures to identify file types and structures quickly.

- The file command, in particular relies on a magic database (commonly /usr/share/misc/magic), which contains predefined patterns for file headers.

- binwalk goes beyond headers to scan the entire binary for embedded file types or compressed data. It also uses signature databases but is more specialized for firmware analysis, detecting compressed archives, or images embedded in binaries.


JPEG: FF D8 FF E0


Speaking of JPEG files, I found an interesting challengee on a CTF: I was presented with a data file which was hard to interpret. This was its header:



If we know nothing about headers, then this is meaningless. 

But if we recognize the JPEG signature, then we can see that the header is there, but reversed in 4-byte chunks (due to endianess). So I wrote a python script to process the whole file, reversing the byte order, 4 bytes at a time.



When that was done, the weird file was shown to be a well-behaved JPEG file (containing a flag). CTFs are fun!

ELF: 7F 45 46


But there's more to headers than just the initial signature. For instance, in this ELF file, if we look beyond the initial bytes, we can see:
  • 02 -> 64-bit (0x01 for 32-bit)
  • 01 -> Little-endian (0x02 for big-endian)
  • 01 -> Current version

You can extract this information with tools like readelf (as shown above). For images, tools like exiftool are handy for extracting metadata embedded in files.

There are tables and references available for identifying these headers. Take some time and explore this stuff.
Whether you're debugging a binary, hunting for a flag, or analyzing malware, knowing these magic numbers can make all the difference.

Lift the curtain and have some fun!

PS: do tarballs work as expected? 

No comments:

Post a Comment

How a Spy Pixel Crashed Into My Friend's Vacation

              So it goes.   A friend of mine, a freelancer, recently went on a much-deserved vacation. Like most of us in today's always...