Dreaming of Dragons: ReverseEngineering

Showing posts with label ReverseEngineering. Show all posts

Sunday, June 22, 2025

Playing With Bits: Of Malware Labs, Steganography and Narnia

Where's mah Gibson, punk?

Hi y'all!

As mentioned here, I played around with some steganography. The idea was simple and unfancy: just a .ppm file and some Python.
Why .ppm? Because it’s stupidly simple: uncompressed RGB values, no PNG compression, or arcane metadata. Just bytes (although .ppm headers can have comments, so keep that in mind).

The Method
A .ppm header

Looking at the image above, the header is simply to get:

(0x50 0x36) → P6: binary PPM file

(0x0a) → newline

(0x36 0x34 0x30) → 640: image width

(0x20) → space

(0x34 0x32 0x36) → 426: image height

And bam—our header info.

And then we get 3 bytes at a time, for each pixel (RGB: Red, Green, Blue), defining the color of that precise byte.

After that, pixel data: 3 bytes per pixel (RGB). Nothing else.

To hide data, I zeroed the least significant bit (LSB) of each byte, which doesn’t really alter the image in a visible way. This gives you 3 bits per pixel to encode data. Stack those bits together and slice them in blocks of 8—now you have bytes. Bytes mean ASCII. ASCII means text.

That’s the gist. You can write an entire hidden message (or image, or audio) inside another image by tampering with just the LSBs.

To make this practical, I wrote a couple of Python scripts. You can find them here. Mess with that as you see fit.

Limitations?

Sure. Quite a few:

.ppm headers with comments will throw things off. But you can easily code around that. Heck, have that as a homework if you'd like.
No encryption. Anyone with a hex editor and some free time can sniff it out.
No error detection or correction. More homework.
Every LSB is predictably overwritten—pattern detection is trivial.
Any kind of compression or encryption on the container image kills the message.

OpSec Level: Meh

You’d probably want to:

Only use one LSB per pixel (maybe just Blue).
Randomize altered pixels with a PRNG.
Encrypt the payload beforehand.
...

But let’s be real—Chi-squared tests, histograms, stego‑detectors—if someone’s looking for it, they’ll find it.

Still, security through obscurity has its place (despite all the hate). Think authoritarian regimes where strong crypto might be illegal or where you might simply be forced to decrypt everything you own.
Just saying... it can be part of your 'onion'.

Malware Analysis Lab - The Barebones Setup

I’ve said before I’d blog about my lab setup. Never got around to it because I kept tweaking things or getting distracted by something shinier.

Goal here: no fancy stuff, just a focus on working securely with malware.

My current setup:
- a dedicated Mini-PC.
- a managed switch which has a port isolated for the Mini-PC
- a VLAN dedicated to that host
- a Raspery Pi as Gateway/firewall/logger for that VLAN

A true minimal setup:

- Your laptop

- Two VMs:

Flare VM (Windows + analysis tools)
REMnux (Linux + reverse engineering toolkit)

Essentials:

Keep malware VM networks disabled unless strictly needed
Take frequent snapshots
Log and document everything
Don’t aim for perfect—just safe and functional

That's it. I won't go further into this for now. If I find the need, I'll do it later. What you want is something that will let you experiment with some safe malware samples, learn the basic tools and avoid having your home invaded by your baddies.

🧙‍♂️ Back to Narnia (OverTheWire)

Yup, I’m back at it. I paused these CTFs months ago—wasn’t getting much out of them. Realized I needed to read more about shellcode and memory handling before diving deeper.

So now I’m back. And yeah, I’m also changing my mind about avoiding walkthroughs. Everyone does them, so I might as well add my spin: fewer spoilers, more insights. You’ll find the step-by-step mess here on GitHub (don’t expect polish).

narnia0

What’s this one about? Buffer overflows and UTF-8 stuffsesses. Simple code:

There's the Gibson <3

Buffer overflow by 4 bytes. The goal is to insert \xde\xad\xbe\xef in little-endian form.

First instinct was to strings, gdb, and objdump my way through. But you don’t need that. Just feed the program 20 garbage bytes (aaaaaaaa...) followed by those 4 crafted bytes.

Small snag: input is treated as UTF-8. So outputting raw 0xde didn’t work nicely.

Long story short, I learned that I couldn't easily output xde, and I did try, by looking at UTF-8/HEX lists, but it was a no go. I then decided to build a binary with printf:

printf "aaaaaaaaaaaaaaaaaaaa\xef\xbe\xad\xde" > /tmp/key

This solved the problem, but didn't elevate me to narnia1. A picture can explain more than a thousand words, and so a picture filled to the brim with words might be a wonder. So, here:

See at the top? the redirection is not returning an error message, but I am still user narnia0, and thus haven't really solved the issue. But why?

The issue is Linux related, to be fair. As we run the script with our payload, we're spawning a shell, but it is dying instantly since it's not connected to a tty. I needed to keep the shell opened, so I got the cat out of the bag. It forces the shell session to remain open and thus granting us permanence.
The shell is ugly, but we get elevated privileges (you can always load a nicer shell, if you really want to), and now we can search for the password. That's easy enough, so we can leave it out of this explanation.

That’s it for now. These blogposts are mostly memory aids and curiosity igniters. Narnia stuff goes to GitHub as I solve them, and I’ll post here only when there’s something worth rambling about.

Get your tools ready and keep exploring!

Tuesday, December 31, 2024

Wherein We Face A Lindwyrm: don't do LeetCode

The Lyndwurm. Let's face it sooner. Not later.

Before diving into LeetCode, let me tell you this post was initially going to be about my journey completing a series of 30 Assembly CTFs on pwn.college. It still is, in a way, and I highly recommend that site to anyone interested in hacking and low-level programming (check that site─totally worth it).

But somewhere along the way, I decided to shift the focus. So, bear with me while we explore LeetCode.

Let’s start with the disclaimers:

I’m happily employed.
My knowledge of the hiring process is limited to my own experiences and what I’ve observed.
Take everything I say here with a grain of salt.

What Are LeetCode Problems?

LeetCode-style problems are algorithmic and data structure challenges, often used in technical interviews to (supposedly) assess problem-solving skills, logic, and coding efficiency.

I’ve tried LeetCode. I even did a fair bit of it back in university while studying Python. And I’m here to tell you: it’s not for me. It's probably not for you, either.

Here are 10 reasons why I’ve left LeetCode by the wayside:

1. I’m not a programmer, nor do I want to be one.

I work in IT and am aiming to become a Cybersecurity professional, specifically a Reverse Engineer/Malware Analyst. In this realm, LeetCode is of very limited relevance, if any. My focus is on low-level code, systems, and security—not cranking out optimal algorithms for abstract, byte-sized, problems.

2. Time is sacred, and LeetCode doesn’t fit my priorities.

Mastering LeetCode takes time—lots of it. As someone who obsesses over how I spend my time, I refuse to pour hours into a skill I find doubtful in utility for my goals. Instead, I could be:

Diving deeper into malware analysis.
Learning more about CPU internals.
Experimenting with Reverse Engineering tools.
Or even doing other wholesome hobbies like: fishing, watching paint dry, or fine-tuning push-ups.

3. It consumes mental bandwidth I’d rather use elsewhere.

Focusing on LeetCode takes up space in my brain that I could dedicate to something more relevant or enjoyable. Cybersecurity is vast, and every moment I spend on coding puzzles is a moment I’m not spending on fun puzzles.

4. There are other ways to demonstrate my skills.

I maintain a public GitHub with projects I’ve built. If someone needs proof of my abilities, I can show my work or create something on demand. Writing scripts or automation tools in a real-world context is more relevant to my career than solving arbitrary LeetCode.

5. Secure, readable code > Clever one-liners.

LeetCode often rewards speed and brevity, which can lead to unreadable, messy solutions. Writing secure, stable, maintainable code that other humans can understand is far more valuable in real-world applications.

6. In cybersecurity, CTFs are the way to go.

Capture The Flag challenges (CTFs) are like games. They’re fun, align with the hacker spirit, and teach you practical skills. I’d rather do CTFs all day than grind through LeetCode puzzles.

7. Better ways to assess pressure and skills exist.

In addition to several other assessments, my current company included a 1.5-hour test during the hiring process that challenged me to work under pressure, adapt to new situations, conduct research, and document my process. It was hard, fun, and incredibly insightful. If they had instead asked me to solve 10 LeetCode problems, they wouldn’t have learned anything about my actual skills. No joke—this test was fantastic.

8. I’m not here to compete with kids who have all the time in the world.

If I have two free hours, I’ll use them to play in my malware lab—not grind Python or C snippets on Codewars. I have nothing to prove to anyone, and I’m not interested in chasing someone else’s benchmarks.

Years ago, I participated in a BJJ tournament and won (blue belt +40 years category). At the end of the tournament, all the blue belt winners were allowed to face each other in a 'free-for-all styled' match. I said no. I had nothing to prove. I knew the outcome, too─there was no point in fighting guys who had trained as long as I had but were 20 years younger and weighed 20 kilos more.

Here's what I looked like after winning in my category:

Cute, huh? Third place didn't even get a medal. He got a broken rib and a trip to the hospital.

9. I’m not missing meaningful opportunities.

Sure, some companies prioritize LeetCode skills, but those are likely not the places I want to work at. I’d rather focus on preparing for roles that value my expertise in cybersecurity and low-level systems.

On the flip side, if I do receive a job offer from a company looking to assess my "cyber skills," I’ll be in a much stronger position if I’ve invested my time in honing those skills, rather than spending it on LeetCode.

10. Burnout is real.

Burnout is pervasive in IT. I’ve seen people so drained by their work that all they want is to clock out and forget anything technical. They crave time for hobbies, family, and friends, leaving the techie stuff for when it’s absolutely necessary. Not me. I dive into reverse engineering because I genuinely enjoy it, but I make sure to balance it with other hobbies, family time, and necessary downtime. Chasing someone else’s dream isn’t worth sacrificing my mental health.

Final Thoughts

For those who insist we must suffer through things we hate to secure a “dream job,” I leave you with this quote from Game of Thrones:

Sunday, December 22, 2024

Wherein We Look At An ELF: Executable and Linkable Format

So, this is an ELF file, huh?

Elves

For the past few days, I've been delving into ELF (Executable and Linkable Format) binaries—specifically focusing on their structure, behavior, and ways to manipulate them. If the subject matter is of any interest to you, then check this out.

What are ELF files?

They are a "common standard file format for executable files, object code, shared libraries, and core dumps, at least according to Wikipedia.

The magic number for ELF files is 0x7f 45 4C 46. Care to guess what's 45 4C and 46 in ASCII?

Let's look at (yet another) terribly simple C script and then look under the covers:

Not terribly impressive, but we're not trying to be terrible or to impress anyone.

Remember when we talked a bit about the 4-step compilation process here? Instead of directly compiling this program, let's jump to the Assembly phase and look at the object file:

So, as we can see, we have a 64-bit ELF object file... relocatable. What does it mean for this to be relocatable?
It means that it's not dependent on specific memory addresses. So this file can be moved around without breaking its code. Our code isn't yet an executable. We're still short of that objective since we've not yet passed through the linking phase, which will or might add to it other object files or libraries, and then yes, produce our executable.

Remember that, for the most part, programmers skip and don't even think about these steps. The compilation process takes care of all of this in the background, and only if something is untoward will the programmer be warned that one of these 4 steps went awry.

And notice as well that this file is 'not stripped'. What's this, you ask? It's informing us that the the file contains the symbol table and debugging information within it. This keeps information that is useful for debugging purposes, making it easier to analyze and understand what's happening, with tools like gdb or objdump. On the other hand, stripping our binary means that both symbol table and debugging info will be removed. The symbol table contains the names of functions, files, variables and other metadata useful for debugging or reverse engineering. And that extra debugging info shows the variable types and line numbers, for example.

So, stripping will reduce the file size, hide implementation details but also make debugging a bit harder.

Under the hood

Let's do it. It's pretty simple, actually. After finally creating our binary, we can strip it with:

strip --strip-all simple_adder

Here we can see the difference between the two files, through the use of the command readelf:

It's a bit rough around the edges, but if you look carefully, you can see two files, one stripped and one not stripped, and the difference is telling, even for such a small binary.

Obviously, stripping is also used as a countermeasure and obfuscation technique.

The readelf command that you see up there is a tool for analyzing and displaying information about ELF files. With it, we can inspect the internal structure of ELF files, such as executables, shared libraries, or object files. Yes, it's what that ELF lady is doing at the beginning of this blog post. I know. Genius.

Readelf comes in handy to debug linking issues or to understand how an executable or a library is laid out.

As per usual, man files are your friend here.

.text, .data, .bss and .rodata

Elf files have critical areas, like .text which contains executable instructions, .data which stores initialized global variables, .bss which holds uninitialized global variables and .rodata, which holds read-only data, such as constant strings.

We can inspect the .text section, which could be considered as the heart of the program (holding the executable code, really), with a tool like radare2 or objdump, for example:

As you can see, this is giving us Assembly code (in AT&T, no less... yuck) revealing function prologues, loops, and system calls.

Of course, recognizing these patterns is a vital skill for the Reverse Engineer.

Try this for yourself. Also remember to check .data and .bss with:

readelf -x .data your_file

readelf -x .bss your_file

But there's more ways to inspect your ELF files. Let's look at ldd which prints shared object dependencies and use readelf with '-r' to check relocation tables:

What's all of this, you say?

With ldd we can check the shared libraries within our binary. Each listed library representing a dependency. And relocation tables are essential for adjusting addresses in a dynamically loaded binary. Entries like R_X86_64_JUMP_SLOT or R_X86_64_GLOB_DAT help resolve function calls or global variables at runtime.

There's more, of course. But this stuff is much more fun to experiment with than just to talk or write about.

Go at it. Experiment with creating your own binary files and examine them with any or all of these tools (or others). Change things, check again.

No elves were harmed in the making of this blogpost. Nor any DWARF, of course.

Sunday, December 1, 2024

Wherein We Pause to Reflect: Simple ASM Review and OS Security Mechanisms

Hey! Corny matrix-styled dojo. Why not?

Let’s kick off today’s blog post by writing a very, very simple ASM program. Since I’m reviewing ASM stuff through pwn.college’s course, why not showcase how simple ASM can be? (In a way... math is simple too, but many would argue otherwise).

Our script will be called one-oh-one.s, and our program will be called pausing—since that’s what we’ll want at the end.

Let's start with a program that starts and ends. Simple enough, right? But for that, we need to pass execution to the OS via a syscall (remember those?). To do this, we can move the value 60 into the rax register. Here's how:

Simple enough! But if we try to assemble and link this, we’ll hit an issue:

Ah, much better. Now, our previously failing program is a success... of sorts. We made sure to run with Intel syntax, which is much cleaner (at least to my eyes). We also made the _start label globally visible so that we can indicate where our program begins.

Now no more errors, and we’re ready to tackle the rest of the program. Easy peasy.

Right! While I was trying to set up a pause in my program, I ran into an issue. The syscall I thought was the right one for pausing was actually wrong. So, since I’m working on Linux with x86_64 architecture, I checked /usr/include/asm/unistd_64.h to see which syscalls I should use.

There, I found the exit syscall (60), which needs to be loaded into rax (which we’re already doing). If we want to set an exit code, we need to load it into the rdi register. Fair enough, we can do that. As for the pause syscall, we need syscall 34. I also discovered the alarm syscall (37), which sends a SIGALRM, ending the pause syscall. Without this, we'd have to ctrl+c our way out of the program. We also learned that we need to pass a parameter through rdi. Easy peasy.

Now that we have the tools, let’s create our pausing program, assemble it, link it, and run it while checking for our exit code:

Don't know about you, but I love this!

Now, let's review other concepts. For instance, in pwn.college, you can learn how to point to a specific memory value or the contents at that memory location.

Let's say there’s a memory position 12345 holding the value 42.

If we do:

mov rdi, 12345

We're making it so that rdi will hold that number 12345.

But if we do:

mov rdi, [12345]

Now rdi will hold the value at memory position 12345, which is 42, just like the value stored at that location.

I won’t dive too much into these basics because I’m sure you either already know them or want to experiment with them yourself. Just remember that the OS will have some defenses in place that may make it difficult, if not impossible, to access specific memory locations in your binary.

You can check which defenses have been enabled with a tool called checksec:

Please take your time and check what each of these items does. Here's a quick rundown:

RELRO: A security feature that makes it harder to modify certain parts of a program, like its GOT (Global Offset Table), preventing attacks on function pointers.
STACK CANARY: A protective value placed on the stack to detect buffer overflow attacks before they overwrite important data.
NX: No-Execute; prevents code from running in certain areas of memory, like the stack, to stop exploits that execute malicious code.
PIE: Position Independent Executable; allows a program to run at random memory addresses, making it harder for attackers to predict the location of code.
RPATH: A runtime setting in executables that specifies directories to search for dynamic libraries before default system paths.
RUNPATH: Similar to RPATH, but it is used after LD_LIBRARY_PATH to specify directories for locating shared libraries at runtime.
Symbols: These represent function names, variable names, and other references in the program’s code, helpful for debugging or exploitation.
FORTIFY: A set of compiler protections that enhance the safety of certain library functions, like checking buffer sizes to prevent overflows.
Fortified: Refers to functions that have been modified with additional checks (from FORTIFY_SOURCE) to improve security against buffer overflows.
Fortifiable: Functions that can potentially be fortified using additional checks to prevent common vulnerabilities like buffer overflows.
FILE: Refers to the executable or object file format, containing the machine code, symbols, and metadata that the operating system uses to load and run the program.

Like I said, a quick rundown, but it’s worth spending time learning about these defenses—what they do and how to set them up to adequately inspect a binary. Also, note that this applies to ELF files.

Alrighty, another short one! Hope you had fun.

Monday, November 18, 2024

Wherein We Do Some Magic!: File Headers

All the world's a stage, and all the men and women merely players

Today, we'll be talking about File Headers, also known as Magic Numbers.

These are specific sequences of bytes at the beginning of files that identify the type and format (e.g., PNG: 89 50 4E 47). They facilitate programming by allowing quick identification of the type of file being used, precluding the need to search within the file for specific functions or structures.

GZIP: 1F 88 08

As you can see here, the GZIP signature is right at the beginning of the file. Throughout this blog post, I'll be using tools like xxd (which we've seen before) to actually check these headers.

In Reverse Engineering, these file signatures allow for quick identification of files, detect tampering with said files and determine the appropriate tools or parsers to use.

Remember that file signatures can be modified to disguise file types or to bypass detection. Malware often uses such tactics, obfuscating payloads to evade analysis.

It's a good idea to practice in identifying these numbers. If, for some reason expected signatures aren't detected, it might be a good idea to whip out a hex editor like xxd or use tools like file or binwalk to analyze headers. These commands and tools rely on databases of known file signatures to identify file types and structures quickly.

- The file command, in particular relies on a magic database (commonly /usr/share/misc/magic), which contains predefined patterns for file headers.

- binwalk goes beyond headers to scan the entire binary for embedded file types or compressed data. It also uses signature databases but is more specialized for firmware analysis, detecting compressed archives, or images embedded in binaries.

JPEG: FF D8 FF E0

Speaking of JPEG files, I found an interesting challengee on a CTF: I was presented with a data file which was hard to interpret. This was its header:

If we know nothing about headers, then this is meaningless.

But if we recognize the JPEG signature, then we can see that the header is there, but reversed in 4-byte chunks (due to endianess). So I wrote a python script to process the whole file, reversing the byte order, 4 bytes at a time.

When that was done, the weird file was shown to be a well-behaved JPEG file (containing a flag). CTFs are fun!

ELF: 7F 45 46

But there's more to headers than just the initial signature. For instance, in this ELF file, if we look beyond the initial bytes, we can see:

02 -> 64-bit (0x01 for 32-bit)
01 -> Little-endian (0x02 for big-endian)
01 -> Current version

You can extract this information with tools like readelf (as shown above). For images, tools like exiftool are handy for extracting metadata embedded in files.

There are tables and references available for identifying these headers. Take some time and explore this stuff.

Whether you're debugging a binary, hunting for a flag, or analyzing malware, knowing these magic numbers can make all the difference.

Lift the curtain and have some fun!

PS: do tarballs work as expected?

Saturday, November 16, 2024

Wherein We Crack Yet Another Program And Learn Something In the Process: part three (or something)

So, let's fast-forward through this first part. While it was revealing, it wasn’t all that great. Informative? Sure. Exciting? Nah.

So we can skip the fluff.

There I was, creating yet another C program to crack—asking an LLM (Large Language Model) to be rough with me. I told it to place whatever protections it found amusing, especially ones that might put a damper on my usual GDB shenanigans.

I whipped up a simple C program with some XOR gimmicks and handed it over to the LLM, telling it, “Go nuts. Protect this binary as if your life depends on it.”(I might be paraphrasing here).

The LLM's Attempt at a Challenge

Well, the LLM tried, but it failed pretty hard. Not because I’m some kind of binary-reversing wizard (I’m not), but because its defenses mostly relied on surface-level userspace tricks. These are the kinds of protections that look flashy but crumble under the weight of a determined debugger wielding carefully placed breakpoints.

Let’s cut to the chase: here’s a snippet of the original code it generated:

Breaking the "Protections"

Most of these defenses—fake functions, misleading execution flows, or basic obfuscation (not all seen here)—can be easily defeated with a debugger. When you examine the binary at runtime, these kinds of tricks are more like a speed bump than a roadblock.

GDB was enough by itself to detect the two main weaknesses—key+encrypted password:

And voilà, a quick peek into those memory locations reveals the key and the encrypted password. Nothing we haven’t seen before:

The logic here is straightforward. By reading the ASM, we can tell there’s a xor operation happening, and the key is being repeated (via a modulo 4 operation) to match the encrypted password’s length (10 characters).

Great! From here, undoing the operation is trivial. A simple Python script does the trick:

And that’s it. We have the password, the binary is cracked, and we move on.

Lessons Learned

What’s the moral of this part? Don’t store your bloody password and key inside your binary. Ever. Seriously, it’s like leaving your house key under the mat and hoping no one checks.

This reminds me of that guy who stored his password inside his binary while working on a GitHub project with full version control. He was surprised to find others knew the pass, regardless.

What's Next?

I could create more complex C programs where the password lives elsewhere (maybe a server, maybe environment variables), but honestly, that defeats the purpose of this kind of exercise. Plus, it opens up a whole other can of worms I don’t feel like opening just yet.

Instead, we’ll dive into Binary Security: NX, ASLR, RELRO, Stack Canaries, and how these mitigations shape the reverse-engineering landscape.

It’ll be fun (or your money back—promise).

Sunday, October 20, 2024

Wherein We Study A Buffer Overflow And Ready Our Aim: testing the waters

Initial disclaimer: please check the link below, as it will be necessary when following along the pdf.

Hi, again! Ready for some more low-level code goodness?
Today we'll take a look at some very simple, yet purposefully flawed C programs in order to learn a bit more about buffer overflows, grasping control of the return address and disrupting a program's flow.

In the first example, we'll see that the program will result in a segmentation fault, and understand why that's happening exactly, by looking at the disassembled code under the GDB debugger. We'll then talk a little about registers, the return address, its importance, and how to grab control of it.

In the second example, we'll take a look at the basics, which will let us finally take advantage of the return address through a buffer overflow.
But, without much ado, let's jump to our first example. Remember that these two first programs are directly taken from Smashing The Stack For Fun And Profit, which you can find here (revised edition! - please follow this link to disable defenses. Alternatively check this. If you don't do this, you won't be able to take advantage of these methods).

This program is creating a char array named large_string which can take up to 256 characters, and then it's filling large_string with A's. After that's done, the program is calling a function named function, using large_string as the argument. The problem becomes immediately obvious since, as we can see, inside that function, we're filling a char array entitled buffer with our A's. But our buffer can only take in 16 characters. Ergo, we have our buffer overflow.

Function was created and its local variable buffer can only hold so many of our A's. But, since we're using strcpy, which has no control for how many values we can enter, we'll just keep on writing A's until we reach the null character (at the end of large_string).

But let's open our debugger and actually see what's happening here.

So, we're looking at main(). Can you see our loop? We're moving 'A' into eax and advancing the counter at ebp-12. A is the ASCII representation of number 41:

0x000011ea <+50>: mov BYTE PTR [eax],0x41

Adding +1 to our counter at ebp-0cx:

0x000011ed <+53>: add DWORD PTR [ebp-0xc],0x1

And comparing that value with 254 (so as to know when to end the loop):

0x000011f1 <+57>: cmp DWORD PTR [ebp-0xc],0xfe

When this is finally done, as I'm sure you can see, we're jumping right into our function, and this is where the fun starts. Let's disassemble it:

Please take a moment to learn what's happening here, and compare it side by side with the C code. But let us move on and actually see what's happening with our memory, as we set a breakpoint in main:

With x/32x $esp, we're checking memory position 0xffffcfa0, to which ESP is pointing, and the addresses 32 bytes above that. I won't go into much detail. You'll see soon enough what will happen when we step forward and finally reach our function breakpoint:

Looking at the memory addresses, it's obvious what happened: these locations were filled with 0x41414141 or, in plain text, AAAA.

It's important to note that the A's are being written from higher memory positions to lower ones. This is crucial because one of the last things overwritten is the return address, which will cause our segmentation fault.

As we move along, at a certain point, the return address will also be filled with A's, and at that moment we won't have a valid return address any longer. As a result, our program will suffer a segmentation fault.

And that's it. We're going nowhere fast. This program has just died on our hands. If we were trying to crack this program, we would have wanted, instead, to take control of the return address stored next to EBP. We'd use that value and point towards some other function we wanted, for example, thus altering the program's flow in our favor.

Yes, we're slowly creeping towards true shellcode. We'll get there eventually, don't worry.

But before we do that, we might as well talk about other interesting tidbits that can prove helpful when using a debugger and watching our shellcode or buffer overflow in action.

I've already shown quite a few pics, so I won't give you another one, but here's the very first function that appears in our "Smashing the Stack" doc. It's pretty simple, but it hopefully shines a light on the function we've been analyzing so far:

void function(int a, int b, int c) {

char buffer1[5];

char buffer2[10];

}

void main() {

function(1,2,3);

}

If we look at the memory locations, as we did before with the other program, we'll see:

0xffffd09c: 0x00000000 0xf7ffcff4 0x0000002c 0x00000000

0xffffd0ac: 0xffffdfc0 0xf7fc7550 0x00000000 0xf7da2a4f

0xffffd0bc: 0xffffd0e8 0x565561e5 0x00000001 0x00000002

0xffffd0cc: 0x00000003 0xffffd110 0xf7fc1688 0xf7fc1b60

0xffffd0dc: 0x00000000 0xffffd100 0xf7fa2ff4 0x00000000

0xffffd0ec: 0xf7da92d5 0x00000000 0x00000070 0xf7ffcff4

0xffffd0fc: 0xf7da92d5 0x00000001 0xffffd1b4 0xffffd1bc

0xffffd10c: 0xffffd120 0xf7fa2ff4 0x565561b6 0x00000001

I have put in bold important addresses and their contents. In order, from left to right, going down...

ESP

Points to the current top of the stack.
At 0xffffd09c (the lowest memory address in this snapshot)
ESP is showing the exact spot in memory where new data are pushed onto the stack

Saved EBP

At 0xffffd0bc
The base pointer of the caller function
This marks the base of the previous stack frame before the current function was called

Return Address

At 0xffffd0c0
This is the address the program will jump back to after the current function completes
overwriting this location with a malicious address can cause the program to "return" to an arbitrary memory location

Variables

Just below the saved EBP and return address are the local variables and parameters
At 0xffffd0c4 (0x00000001), 0xffffd0c8 (0x00000002) and 0xffffd0cc (0x00000003)

Buffer

Just below these local variables we can see the space that has been assigned to our buffer1 and buffer2 local variables
It's not 5+10 bytes in size. Instead, because of padding and alignment, it will be 8+12 bytes in size, for a total of 20 bytes.

---

Next up: We'll learn how to control the return address and force the program to do our bidding. This will set the stage for mastering the art of shellcoding!