Showing posts with label GDB. Show all posts

Saturday, July 26, 2025

"INTs Aren't Integers and FLOATs aren't Real"

I was told this is a cat-submarine. Tail = Periscope. I believe it.

Over the past few weeks, I’ve been juggling two realities. On one side work: networks, and the daily exploration of security tasks in a banking environment. On the other, low-level code: free time, NASM, one instruction at a time.

I’ve been reviewing the basics—x86 syntax, memory layout, data definitions, etc etc etc. I'm following a series of videos, from a YouTuber that I appreciate, and although I found some information lacking or imprecise—particularly in the episode devoted to division—it's still a good set of videos and I'd advise anyone to watch them here. Lately I've taken a gander at integers. How they’re stored, manipulated, compared, and what all those flags mean when you're moving bits around and trying to make sense of a program in GDB.

If you’ve spent more than a few hours in GDB (as I have, unreasonably so at times), you’ve probably done a CMP eax, ebx and then wondered what exactly happened to the flags. What’s the deal with CF, ZF, SF, and the rest of the alphabet soup? Why do certain jump instructions follow CMP, and not others?

So, quick refresher for the two family members that like me and still read this blog:

CMP just subtracts the second operand from the first, sets flags, and discards the result. The flags tell you the outcome, and then you pick the jump based on what you want to test.

Instruction	Meaning	Flags checked
JE / JZ	Equal (zero result)	ZF = 1
JNE / JNZ	Not equal	ZF = 0
JL / JNGE	Less than (signed)	SF != OF
JLE / JNG	Less or equal (signed)	ZF = 1 or SF != OF
JB / JC	Below (unsigned)	CF = 1
JA	Above (unsigned)	CF = 0 && ZF = 0

Notice those signed vs unsigned differences? JB isn’t "jump if smaller", it’s "jump if below"—unsigned. If you’re comparing signed ints, you should be using JL, JG, and so on.

Floats make it even more nuanced. UCOMISS xmm0, xmm1 for example, is how you compare scalar floats. That instruction sets flags similar to CMP, but works with IEEE 754 single-precision values, not integers. And yes, it’s aware of signs, NaNs, Infs, and the rest (of floating point hell).

Anyway, all that to say: this is kinda subtle, or at least it requires some study and care. IMO, totally worth learning. Most would disagree profoundly. I’ve been pushing myself to remember it, slowly but deliberately. You can check out some of the tiny experiments here. It’s not a project, more like a scratchpad that runs on opcodes and (lovely, quality) coffee.

...When in doubt, explain it to me. I'm anathema too.

And you know what's fun? Mathematics! No, really.
I've seen it again and again: people treating floats as if they are basically the real numbers (ℝ). They aren't!

Just take a look under the hood and you'll understand why.

"But OPQAM, I use Python/Java/Whatever. I don’t care about Assembly or floating-point registers!"

And that’s fair—until you try 0.1 + 0.2 and get 0.30000000000000004, or even 0.30000001192092895508. Then it might matter.

Can you think of a situation where such a small discrepancy could be a problem? I sure can. I can think of several, and some of them imply falling bridges.

Here’s the core issue: ints aren’t integers, and floats aren’t rationals.

An example

The decimal 0.1 becomes the binary: 0.0001100110011001100110011001100110011..., repeating infinitely.
But computers can’t store infinity. They cut off after some number of bits: ~23 bits for floats, ~52 bits for doubles. It’s like trying to store 1/3 in decimal — you can’t write 0.3333... forever. You round. You approximate. So do computers.

So, this means that you cannot represent 0.1 precisely.

The problem isn’t the mathematics — it’s representation.

How do systems and programmers deal with this?

Use decimal representations (decimal.Decimal) when exactness matters (e.g. money).
Use rational types that store fractions exactly (Fraction(1, 10)).
Use symbolic math when precision must be preserved throughout (SymPy, CAS software).
Or use fixed-point arithmetic — store cents instead of euros.

These are workarounds. The real solution? Know that floats are approximations, not truth.

Accurate depiction of my two family members' faces as they read through this.

Meanwhile, back in the Real World, there was a CTF going on in the team. A colleague of mine created it and invited us all to some friendly competition for the next few weeks. Honestly? I will probably only do a couple CTFs before I turn my attention elsewhere. Having a family + hobbies + a day job does limit one's time. Still, I got to dip, and I managed to solve a challenge involving a simple—but satisfying—privilege escalation. I'm not going to give a lot of details, since it goes against the point of the 'contest' and it's actually requested by the site that we don't do it. But here's the 'trick': Classic PATH hijack. I dropped a fake ls binary into a writable dir (/tmp), positioned it early in $PATH, and executed the vulnerable binary which, instead of ls executed cat. Bang. Root shell. Dump flag. Walk away smiling.
This was, of course, allowed by purposefully using SETUID in the binary. A no-no. But there you have it. It was fun.

And yeah, sure—it’s not the most sophisticated vector ever, but the fact is, simple stuff works. You don't have to be fancy-schmancy to make something give you a 'win'. It is thus in Jiu Jitsu, and in illusionism. And so it is in hacking.

Also worth noting: a few days ago I got into a discussion with a colleague about TPM (Trusted Platform Module). I’ve blogged about the TPM issue here, but here’s the gist of it: I had to disable TPM on an old laptop (a very respectable ThinkPad x260) just to make the thing actually power off correctly. I tried everything—kernel parameters, ACPI tweaks, prayer (not really, but I totally could have!)—but nothing worked. Full shutdown always left the machine 'hot'.

Disabling TPM did the trick.

Why? Long story short: the TPM implementation on that hardware was tailored for Windows, and Linux support is... charitable at best. As for the discussion. That colleague warned me that in certain scenarios, like full power drains, I could end up with an unbootable machine if I lost TPM state. So I did what any sane people would do: I tested it again and again, simulating different scenarios.

Unplugged, drained, every scenario short of desoldering the CMOS battery. Result? Nothing broke. LUKS doesn’t need TPM. At least, not for the way I have this set up. In Linux, with this setting, TPM is optional unless you're deliberately tying encryption keys to it—and even then, tread carefully.

These tests were fun. They reminded me of the joy of breaking things on purpose, and the calm that comes with understanding exactly why something behaves the way it does.

So yeah, life’s been busy. I haven’t had much time to write (sentence never written by any amateur blogger ever). But I’m still here, still learning, and still hacking away.

Next up? I'll probably hit you up with some more ASM stuff as I keep on watching those videos and experimenting with stuff. Maybe some CTFs... who knows? Not me! And I'm right here.

We’ll see.

Sunday, December 22, 2024

Wherein We Look At An ELF: Executable and Linkable Format

So, this is an ELF file, huh?

Elves

For the past few days, I've been delving into ELF (Executable and Linkable Format) binaries—specifically focusing on their structure, behavior, and ways to manipulate them. If the subject matter is of any interest to you, then check this out.

What are ELF files?

They are a "common standard file format for executable files, object code, shared libraries, and core dumps, at least according to Wikipedia.

The magic number for ELF files is 0x7f 45 4C 46. Care to guess what's 45 4C and 46 in ASCII?

Let's look at (yet another) terribly simple C script and then look under the covers:

Not terribly impressive, but we're not trying to be terrible or to impress anyone.

Remember when we talked a bit about the 4-step compilation process here? Instead of directly compiling this program, let's jump to the Assembly phase and look at the object file:

So, as we can see, we have a 64-bit ELF object file... relocatable. What does it mean for this to be relocatable?
It means that it's not dependent on specific memory addresses. So this file can be moved around without breaking its code. Our code isn't yet an executable. We're still short of that objective since we've not yet passed through the linking phase, which will or might add to it other object files or libraries, and then yes, produce our executable.

Remember that, for the most part, programmers skip and don't even think about these steps. The compilation process takes care of all of this in the background, and only if something is untoward will the programmer be warned that one of these 4 steps went awry.

And notice as well that this file is 'not stripped'. What's this, you ask? It's informing us that the the file contains the symbol table and debugging information within it. This keeps information that is useful for debugging purposes, making it easier to analyze and understand what's happening, with tools like gdb or objdump. On the other hand, stripping our binary means that both symbol table and debugging info will be removed. The symbol table contains the names of functions, files, variables and other metadata useful for debugging or reverse engineering. And that extra debugging info shows the variable types and line numbers, for example.

So, stripping will reduce the file size, hide implementation details but also make debugging a bit harder.

Under the hood

Let's do it. It's pretty simple, actually. After finally creating our binary, we can strip it with:

strip --strip-all simple_adder

Here we can see the difference between the two files, through the use of the command readelf:

It's a bit rough around the edges, but if you look carefully, you can see two files, one stripped and one not stripped, and the difference is telling, even for such a small binary.

Obviously, stripping is also used as a countermeasure and obfuscation technique.

The readelf command that you see up there is a tool for analyzing and displaying information about ELF files. With it, we can inspect the internal structure of ELF files, such as executables, shared libraries, or object files. Yes, it's what that ELF lady is doing at the beginning of this blog post. I know. Genius.

Readelf comes in handy to debug linking issues or to understand how an executable or a library is laid out.

As per usual, man files are your friend here.

.text, .data, .bss and .rodata

Elf files have critical areas, like .text which contains executable instructions, .data which stores initialized global variables, .bss which holds uninitialized global variables and .rodata, which holds read-only data, such as constant strings.

We can inspect the .text section, which could be considered as the heart of the program (holding the executable code, really), with a tool like radare2 or objdump, for example:

As you can see, this is giving us Assembly code (in AT&T, no less... yuck) revealing function prologues, loops, and system calls.

Of course, recognizing these patterns is a vital skill for the Reverse Engineer.

Try this for yourself. Also remember to check .data and .bss with:

readelf -x .data your_file

readelf -x .bss your_file

But there's more ways to inspect your ELF files. Let's look at ldd which prints shared object dependencies and use readelf with '-r' to check relocation tables:

What's all of this, you say?

With ldd we can check the shared libraries within our binary. Each listed library representing a dependency. And relocation tables are essential for adjusting addresses in a dynamically loaded binary. Entries like R_X86_64_JUMP_SLOT or R_X86_64_GLOB_DAT help resolve function calls or global variables at runtime.

There's more, of course. But this stuff is much more fun to experiment with than just to talk or write about.

Go at it. Experiment with creating your own binary files and examine them with any or all of these tools (or others). Change things, check again.

No elves were harmed in the making of this blogpost. Nor any DWARF, of course.

Saturday, November 16, 2024

Wherein We Crack Yet Another Program And Learn Something In the Process: part three (or something)

So, let's fast-forward through this first part. While it was revealing, it wasn’t all that great. Informative? Sure. Exciting? Nah.

So we can skip the fluff.

There I was, creating yet another C program to crack—asking an LLM (Large Language Model) to be rough with me. I told it to place whatever protections it found amusing, especially ones that might put a damper on my usual GDB shenanigans.

I whipped up a simple C program with some XOR gimmicks and handed it over to the LLM, telling it, “Go nuts. Protect this binary as if your life depends on it.”(I might be paraphrasing here).

The LLM's Attempt at a Challenge

Well, the LLM tried, but it failed pretty hard. Not because I’m some kind of binary-reversing wizard (I’m not), but because its defenses mostly relied on surface-level userspace tricks. These are the kinds of protections that look flashy but crumble under the weight of a determined debugger wielding carefully placed breakpoints.

Let’s cut to the chase: here’s a snippet of the original code it generated:

Breaking the "Protections"

Most of these defenses—fake functions, misleading execution flows, or basic obfuscation (not all seen here)—can be easily defeated with a debugger. When you examine the binary at runtime, these kinds of tricks are more like a speed bump than a roadblock.

GDB was enough by itself to detect the two main weaknesses—key+encrypted password:

And voilà, a quick peek into those memory locations reveals the key and the encrypted password. Nothing we haven’t seen before:

The logic here is straightforward. By reading the ASM, we can tell there’s a xor operation happening, and the key is being repeated (via a modulo 4 operation) to match the encrypted password’s length (10 characters).

Great! From here, undoing the operation is trivial. A simple Python script does the trick:

And that’s it. We have the password, the binary is cracked, and we move on.

Lessons Learned

What’s the moral of this part? Don’t store your bloody password and key inside your binary. Ever. Seriously, it’s like leaving your house key under the mat and hoping no one checks.

This reminds me of that guy who stored his password inside his binary while working on a GitHub project with full version control. He was surprised to find others knew the pass, regardless.

What's Next?

I could create more complex C programs where the password lives elsewhere (maybe a server, maybe environment variables), but honestly, that defeats the purpose of this kind of exercise. Plus, it opens up a whole other can of worms I don’t feel like opening just yet.

Instead, we’ll dive into Binary Security: NX, ASLR, RELRO, Stack Canaries, and how these mitigations shape the reverse-engineering landscape.

It’ll be fun (or your money back—promise).

Tuesday, October 22, 2024

Wherein We Share Some Useful GDB Commands

Expectations were like fine pottery. The harder you held them, the more likely they were to crack.

New to GDB, the Linux Debugger, or just looking for a quick reference guide? Then I got you covered.

Here are some useful commands and tips that will help you navigate and debug your programs efficiently:

GDB Debugger Quick Reference Guide

Essential GDB Commands

Program Control

`break [breakpoint]` - Set a breakpoint

Example: `break main`, `break *0x4004a0`

Tip: Use `break file.c:42` to break at specific source lines

`run [args]` - Start program with optional arguments

`continue (c)` - Continue execution

`next (n)` - Step over function calls

`step (s)` - Step into function calls

`stepi` - Step one assembly instruction

`finish` - Run until current function returns

Inspection

`print [expression]` - Print value

Example: `print x`, `print *ptr`, `print $eax`

`display [expression]` - Auto-print at each stop

`x/[n][f][u] [address]` - Examine memory

n: Number of units to display

f: Format (x=hex, d=decimal, s=string)

u: Unit size (b=byte, h=halfword, w=word, g=giant)

Example: `x/32xb $esp` - Show 32 bytes at stack pointer

`info registers` - Show register values

`bt [full]` - Show backtrace (call stack)

Interface

`layout asm` - Show assembly view

`layout src` - Show source code view

`layout regs` - Show registers view

`layout split` - Split view (source/assembly)

`focus cmd/src/asm/regs` - Switch between views

`refresh` - Refresh screen

Data & Variables

`info locals` - Show local variables

`info args` - Show function arguments

`watch [expression]` - Break on value change

`set variable [name]=[value]` - Modify variable

`whatis [variable]` - Show variable type

Compilation for Debugging

gcc -g -O0 program.c -o program

Key flags:

-g - Include debug symbols
-O0 - Disable optimization
-fno-stack-protector - Disable stack protection
-no-pie - Disable position-independent code
-m32 - Force 32-bit compilation

Advanced Features

Core Dumps

# Enable core dumps
ulimit -c unlimited

# Load core dump
gdb ./program core

ASLR Control

# Disable ASLR for debugging
echo 0 | sudo tee /proc/sys/kernel/randomize_va_space
# Or temporarily:
setarch `uname -m` -R ./program

Remote Debugging

# On target machine
gdbserver :2345 ./program

# On host machine
gdb
(gdb) target remote target_ip:2345

Tips for Effective Debugging

Use conditional breakpoints:
```
break main if argc > 1
```

Save common commands in .gdbinit:

set disassembly-flavor intel
set history save on
set print pretty on

Create command aliases:
```
define reg
  info registers
end
```

Use Python scripting for complex debugging:

python
class MyCommand(gdb.Command):
    def __init__(self):
        super(MyCommand, self).__init__("mycommand", gdb.COMMAND_USER)
MyCommand()
end

I think that these commands will serve you well in your journey with a debugger.

Whether you're stepping through code, inspecting memory, or trying to exploit vulnerabilities, remember to keep experimenting with this stuff! It's all about hands-on practice.
If you have any questions, doubts or ideas to improve this list, just send them my way.

Enjoy!

Sunday, October 20, 2024

Wherein We Study A Buffer Overflow And Ready Our Aim: testing the waters

Initial disclaimer: please check the link below, as it will be necessary when following along the pdf.

Hi, again! Ready for some more low-level code goodness?
Today we'll take a look at some very simple, yet purposefully flawed C programs in order to learn a bit more about buffer overflows, grasping control of the return address and disrupting a program's flow.

In the first example, we'll see that the program will result in a segmentation fault, and understand why that's happening exactly, by looking at the disassembled code under the GDB debugger. We'll then talk a little about registers, the return address, its importance, and how to grab control of it.

In the second example, we'll take a look at the basics, which will let us finally take advantage of the return address through a buffer overflow.
But, without much ado, let's jump to our first example. Remember that these two first programs are directly taken from Smashing The Stack For Fun And Profit, which you can find here (revised edition! - please follow this link to disable defenses. Alternatively check this. If you don't do this, you won't be able to take advantage of these methods).

This program is creating a char array named large_string which can take up to 256 characters, and then it's filling large_string with A's. After that's done, the program is calling a function named function, using large_string as the argument. The problem becomes immediately obvious since, as we can see, inside that function, we're filling a char array entitled buffer with our A's. But our buffer can only take in 16 characters. Ergo, we have our buffer overflow.

Function was created and its local variable buffer can only hold so many of our A's. But, since we're using strcpy, which has no control for how many values we can enter, we'll just keep on writing A's until we reach the null character (at the end of large_string).

But let's open our debugger and actually see what's happening here.

So, we're looking at main(). Can you see our loop? We're moving 'A' into eax and advancing the counter at ebp-12. A is the ASCII representation of number 41:

0x000011ea <+50>: mov BYTE PTR [eax],0x41

Adding +1 to our counter at ebp-0cx:

0x000011ed <+53>: add DWORD PTR [ebp-0xc],0x1

And comparing that value with 254 (so as to know when to end the loop):

0x000011f1 <+57>: cmp DWORD PTR [ebp-0xc],0xfe

When this is finally done, as I'm sure you can see, we're jumping right into our function, and this is where the fun starts. Let's disassemble it:

Please take a moment to learn what's happening here, and compare it side by side with the C code. But let us move on and actually see what's happening with our memory, as we set a breakpoint in main:

With x/32x $esp, we're checking memory position 0xffffcfa0, to which ESP is pointing, and the addresses 32 bytes above that. I won't go into much detail. You'll see soon enough what will happen when we step forward and finally reach our function breakpoint:

Looking at the memory addresses, it's obvious what happened: these locations were filled with 0x41414141 or, in plain text, AAAA.

It's important to note that the A's are being written from higher memory positions to lower ones. This is crucial because one of the last things overwritten is the return address, which will cause our segmentation fault.

As we move along, at a certain point, the return address will also be filled with A's, and at that moment we won't have a valid return address any longer. As a result, our program will suffer a segmentation fault.

And that's it. We're going nowhere fast. This program has just died on our hands. If we were trying to crack this program, we would have wanted, instead, to take control of the return address stored next to EBP. We'd use that value and point towards some other function we wanted, for example, thus altering the program's flow in our favor.

Yes, we're slowly creeping towards true shellcode. We'll get there eventually, don't worry.

But before we do that, we might as well talk about other interesting tidbits that can prove helpful when using a debugger and watching our shellcode or buffer overflow in action.

I've already shown quite a few pics, so I won't give you another one, but here's the very first function that appears in our "Smashing the Stack" doc. It's pretty simple, but it hopefully shines a light on the function we've been analyzing so far:

void function(int a, int b, int c) {

char buffer1[5];

char buffer2[10];

}

void main() {

function(1,2,3);

}

If we look at the memory locations, as we did before with the other program, we'll see:

0xffffd09c: 0x00000000 0xf7ffcff4 0x0000002c 0x00000000

0xffffd0ac: 0xffffdfc0 0xf7fc7550 0x00000000 0xf7da2a4f

0xffffd0bc: 0xffffd0e8 0x565561e5 0x00000001 0x00000002

0xffffd0cc: 0x00000003 0xffffd110 0xf7fc1688 0xf7fc1b60

0xffffd0dc: 0x00000000 0xffffd100 0xf7fa2ff4 0x00000000

0xffffd0ec: 0xf7da92d5 0x00000000 0x00000070 0xf7ffcff4

0xffffd0fc: 0xf7da92d5 0x00000001 0xffffd1b4 0xffffd1bc

0xffffd10c: 0xffffd120 0xf7fa2ff4 0x565561b6 0x00000001

I have put in bold important addresses and their contents. In order, from left to right, going down...

ESP

Points to the current top of the stack.
At 0xffffd09c (the lowest memory address in this snapshot)
ESP is showing the exact spot in memory where new data are pushed onto the stack

Saved EBP

At 0xffffd0bc
The base pointer of the caller function
This marks the base of the previous stack frame before the current function was called

Return Address

At 0xffffd0c0
This is the address the program will jump back to after the current function completes
overwriting this location with a malicious address can cause the program to "return" to an arbitrary memory location

Variables

Just below the saved EBP and return address are the local variables and parameters
At 0xffffd0c4 (0x00000001), 0xffffd0c8 (0x00000002) and 0xffffd0cc (0x00000003)

Buffer

Just below these local variables we can see the space that has been assigned to our buffer1 and buffer2 local variables
It's not 5+10 bytes in size. Instead, because of padding and alignment, it will be 8+12 bytes in size, for a total of 20 bytes.

---

Next up: We'll learn how to control the return address and force the program to do our bidding. This will set the stage for mastering the art of shellcoding!

Thursday, October 10, 2024

Wherein We Crack A Simple Program: level Leviathan

...what in the gibson?

As I was about to publish my second entry in this reverse engineering series with a slightly harder program, I felt somewhat disappointed. I had promised some basic obfuscation and environment variable techniques to make the challenge more interesting, but I wanted to push it further.

While contemplating additional extra layers of fun and despair, I was reminded of an fun debugging experience from the final level of OverTheWire's Leviathan game. The level featured an executable that required a 4-digit numeric parameter which, when correct, granted access to a shell with elevated privileges - specifically, becoming the next level's user and accessing a restricted file.

Now, I promised no walkthroughs for OTW challenges, true! But let me offer this caveat:

While what I'll explain in this blog post is (indeed) a potential solution, it's far from the most straightforward or obvious approach.

If this was your first solution - well... you're my kind of crazy. Give me a call; my borderline-insane friends would love to meet you. No, really.

Fair warning: if you don't want a potential Leviathan solution, stop reading. But my advice? Read on, consider 'my' approach, then devise your own. Remember: the flag isn't the objective. Learning is.

Back to our executable: After solving the level, curiosity drove me to examine it with GDB. I wondered if I could spot the password in the assembly code.

And there it was, in all its hexadecimal glory:

Did you spot it? Here is our baby in all its hexadecimal glory:

0x080491da <+20>: mov DWORD PTR [ebp-0xc],0x1bd3

The flow...

Convert user input to an integer via atoi():

0x08049212 <+76>: call 0x80490a0 <atoi@plt>

Compare it with the stored password:

0x0804921a <+84>: cmp DWORD PTR [ebp-0xc],eax

0x0804921d <+87>: jne 0x804924a <main+132>

If not equal, then we have a bad password, and that's the end of that. But if we get a correct comparison, we escalate privileges:

0x080491f9 <+51>: call 0x8049050 <geteuid@plt>

0x0804922f <+105>: call 0x8049090 <setreuid@plt>

Fun? Yes. But here's where it gets interesting:

When using GDB to bypass the program, even with the correct password, privilege escalation fails. Let's compare that with the direct execution of the program, using the correct password:

...but why?

The Hidden Guardian: setuid and Debugging

This behavior stems from a crucial security feature in Unix-like operating systems.

When a setuid program (one that runs with the privileges of its owner rather than the executing user) is run under a debugger, the operating system automatically drops the setuid privileges.

This protection mechanism prevents malicious users from exploiting debuggers to manipulate privileged programs. Even if you can see the password and execute the code, the debugging context itself prevents the privilege escalation from succeeding.

So, it's not just about the code we're writing and the programs we're running, but als about the environment in which we're in. The operating system itself provides layers of protection that even debuggers can't (easily) circumvent.

You hope you had fun. I learned a ton while playing these games, so I can only highly recommend them.

Or as they used to say in the good old days: two thumbs up. Way up!