Sunday, October 19, 2025

Hashing Isn't Encryption — Explained Simply

 

                    I'm sure this image will be a total viewer-magnet, because Math = Sexeh

 

Everyone has heard of encryption, and most understand the basic idea. 
You have a message, apply an algorithm, and you get an encrypted version of that message. 

To go back to the original message, you use the same key used to encrypt it.

In other words, encryption allows you to move back and forth between the clear message and its encrypted form.

This is a one-to-one relation. 

 

                    Left: original message; Right: encrypted message (high-quality visuals)

---

Now, Let's Talk About Hashing, Shall We? 

Seems to be the same idea, really:

Put something in, hash it, get something out. But it behaves very differently.

Imagine: to your left, a world of infinite possible cleartext messages. In the middle, the hashing function. And to your right, the resulting hashes.


Here's the key difference:

The group to the right is smaller than the one of original messages.

Different messages can end up producing the same hash — this is called a collision.

  

In other words: there is information loss during hashing:

Your final group of hashed messages is "poorer" in terms of information than your original group.

 

                                         Left: original message; Right: hashed message (gorgeous)

 

---

A Simple Example

 

Let's clarify this with a toy example.

Imagine a small world where every message is made of 4 different digits, each one between 0 and 3.

Here's our original group of possible "cleartext" messages:

(0123)    (1023)    (2013)    (3012)

(0132)    (1032)    (2031)    (3021)

(0213)    (1203)    (2103)    (3102)

(0231)    (1230)    (2130)    (3120)

(0312)    (1302)    (2301)    (3201)

(0321)    (1320)    (2301)    (3210)

 

 And here's our toy hashing rule:

    Take the first two positions (the leftmost digits).

    Look at the indices that those digits represent.

    Swap the values at those two positions.

Example: 

(0213)

 First two digits are '0' and '2'.

(0213)

Now we swap the values at positions '0' and '2':

(1203)   

This final number is our hashed value.

---

What Happens When We Apply This To All Messages? 

- Some original values won't be found in the hashed group (information loss).

- Many of the original values will map to the same hashed result (collisions).

- Ergo, we can't always reverse the process and retrieve the original.

---

Let's Showcase This With a Particular Number

 

Suppose you pick this final (already hashed) value:

(2301)

There is only one way to get to this value, so we can actually reverse our number and get to the original

(2310) -----> (2301) 

 (check the top image or, even better, try to make the table yourself)

But now take this hashed value:

(3201) 

 

This one can actually be produced by three different original messages:

(0231) ----> (3201) 

(3102) ----> (3201) 

(3210) ----> (3201) 

Which one was the original one? Can we be sure?

 ---

Key Takeaway

 

Hashing means: 
- Information Loss

- Many-to-one relationship

- Reversibility not guaranteed

 

Even in this toy example, some hashes have several origin messages, leading to the same output. In fact, the original group has 24 unique values and the end group has only 12 unique values.

And real life examples are much, much more complex than this, making even finding collisions computationally impractical. But the principle remains.

 

---

 

And remember:

To teach is to lie (a little). 

Hashing (and encryption) can be way more complicated and interesting than this. 

For more information, check this link and this link

Also, this simple algorithm can be found in the fine (and free) "Reverse Engineering for Beginners", by Dennis Yurichev. 

Check that out too.

Have Fun!

Saturday, October 11, 2025

How Computers Think: A Liars Walk Through x86 Assembly

 

                                                                                    So, yeah. The artist found out about digital art.
 
 

"To teach is to lie."

 

A few days ago I promised to talk a bit about a simple Assembly program.

And talk I shall!

First off, let's just get out of the way what that code is doing:

It takes any number of integer positive values as parameters and adds them.

So, if you run the program with ./<program_name> 40 2 (which we shall in a bit) it will print out the answer:



If this were a Python program, it would be simple as can be. But Python is withholding from you a lot of what is happening behind the scenes. And we want none of that. We want to see stuff as it is happening.

 

First of all, a couple of disclaimers:

- This code isn't as simple as it could be. It's in fact part of a series of lessons I'm going through in order to review and solidify my ASM and debugging knowledge.  They are part of a NASM tutorial (ASMtutor), which I highly advise — it's quite carefully made, without major errors (very common in basic ASM stuff) and which guides you in a smooth manner towards greater knowledge. You can find it here.
- The code I'm using can be found here.

- To create a compiled binary of this code, in particular of the files functions_v1.asm and 13_atoi.asm, you need to download those two files and run: 

nasm -f elf 13_atoi.asm

ld -m elf_i386 13_atoi.o -o atoi

 This will create an executable named atoi, which you can then run as explained before.

- You should know a bit about binary and hexadecimal. Please take a quick gander at this. Knowing how to quickly translate between them will really help to inspect the debugger.
- And speaking of which, I'll be using GDB with GEF and I advise you to do so as well. If you already rock on GDB you probably don't need it. But then again, I highly doubt you'll be reading this if you already have mastered GDB. 

As for GEF, it will make the debugger much more pleasing to the eyes, and your code easier to follow.
- To finally start the debugger (GEF) with our program, we'll run:



- My focus isn't a 'deep dive' into Assembly. I want to give you the very basic and essential tools to be able to go through this code, instruction by instruction and be able to understand how the computer is 'thinking'. In order to do that, I'll give you some basic tools.

- Speaking of which, here is a pdf with the few commands and concepts you'll refer to in order to be able to follow through the code.
- Functions in Assembly are interesting. You can call a function or jump into a function (check the PDF), but also can just enter a function because you're going through the code flow and there is no call, no jump (or there isn't a reason to jump). So, for example, in this case:

You have here a comparison and you are checking if eax is equal to 0, when you do cmp eax, 0. Then you have a jump not zero. If eax is NOT zero, then you jump elsewhere to the function divideLoop. But if eax IS in fact zero, then you just move along that grocery shopping list to the next item on the list. Therefore, you enter the function printLoop and decrease ecx by 1. 

 

On the other hand if printLoop actually had a dot behind it (.printLoop) it would be a function inside of whatever function we are in, and couldn't be called from outside of that function. If in doubt, dig a little deeper into Assembly. 

- You might also need a quick contextual/theoretical rundown, so here it goes:

Assembly is the lowest level code you can have that is still human-readable. Below it is the world of ones and zeroes, and above it the world of different programming languages and their way to read and write programs.

In assembly, we're working in the so-called User Space, but sometimes we need to give control to the Operating System through system calls. To do that, we use interrupts (Here is a link to a nice Linux system call table. If you use Windows we can't be friends, sorry). You won't really need that table in order to follow along and read the code, but it sure helps. We only use two syscalls in this code — one for writing to the terminal, and another to exit cleanly. (look up: do the registers change after the OS is done with them?).

When the syscall happens, the OS takes control of the program, and uses the stack and your registers. After that, you cannot be 100% sure of how all registers will be, so if it matters to you, like what happens in other situations, make sure you save those registers into the stack and get them back after the OS lets go of control and returns it to User space.

 

There are a couple of bitwise operations - basically XOR ('exclusive or'). To learn more about bitwise operators, check this page. And if you like computer games and are even slightly interested in electronics or logic, then take a gander at this game. It's great fun.

But, again, you don't have to jump into bitwise operations to read this code. Just do a mental substitution. Whenever you see, for example xor eax, eax it's basically the same as taking zero and placing it in the register eax or, in other words, doing mov eax, 0

Learn boolean logic, bitwise operations, and everything else because you want to and because its fun. :) 

 

What is a register? What is the Stack? You probably heard about those two terms before. Here's a very quick and dirty analogy:

Imagine you're sitting at a desk and I'm telling you to remember things, like a name, then a number, then an address, then another name. Sure, you can do it for a while. Your short term memory will be able to memorize some stuff before it's too much stuff to handle. Those places where you're storing this easy-to-remember, short-term memory stuff are the registers. Registers are temporary containers for fast memory access. Incredibly fast memory access. A bit like short-term variables on steroids.

But what if I start giving you too much stuff to remember? Then you are allowed to write whatever information I tell you on small pieces of paper, as long as you store them properly on a receipt-spike (also known as an order-spike). You've seen them before in restaurants. Something like this:
 

                          Analogies R Us: each strip of paper a memory location, an integer, etc
 
This stack, or this order-spike, uses a LIFO system, meaning Last In First Out. Whatever last piece of paper with information you have pushed into that pile, is the last one you can pop out and examine.

Neat, no? So, you can store x amount of things for very quick access inside those registers, and you can push whatever you can't hold in quick access memory into that pile, and then you can take it our and again place it in a register if you so wish (actually, you push copies of those values — the originals remain in registers unless you clear them).

All that I've said so far is to be taken with a pinch of salt. Not only because these last bits are an analogy, but because, like I said in the beginning, to teach is to lie. I have to omit stuff in order to let you do some progress.

Imagine us teaching Math in class and instead of telling kids all over the world that you can't divide by zero, we instead said, there are several instances in mathematics where dividing by zero is perfectly sensible, and then described said instances. Can you imagine the faces of those 7th graders? Nothing would be retained.

By lying to them, they learn a useful rule and important aspect of mathematics, and when they advance in their mathematical careers, they get to learn at least of one way to actually divide by zero.

It's the same here. I'm skimming a lot of fat in order to draw an understandable picture, and to let you (if you so wish) take your first steps in Assembly, which I truly love.

Got any questions? Stuff you don't understand? Stuff you'd like to see? Shoot them in my general direction. Search on Google, use an LLM ("gulp, LLMs?! But aren't they the devil?". Look, I was here when Wikipedia first came up and academia was having a fit over it. I also survived using AltaVista back in the 90s. You'll do fine with LLMs and live to tell the tale).

As I've shown in one of the pictures above, we'll be using our program to add 40 to 2 and to find out the Answer to the Ultimate Question of Life, the Universe, and Everything.

I won't go through the whole code. It would serve little purpose and, besides, you need to walk the walk.


                                                     Ah, Jayne... you look young.
 

 So, back to business!

 

You need to:

- download the two .asm files;

- compile them into a binary file;

- install GEF (GDB probably already installed);

- download the PDF with some information;

- Run GDB and check what happens to the Registers and to the Stack as you step into the program.

 

And that's it! We got a party going.

 

Ok, let's pretend anyone is actually reading this and actually is inspecting the program with the debugger. You run the code in terminal to open the debugger, then when it opens, you write start and press enter.
What might you be seeing at this point in time? Something like this:


                                        Already regretting your very recent life decisions? Don't! This stuff is fascinating.


What a bloody mess! I know. Don't worry. The first time I started looking at this I was pretty confused as well. But let's first try to get some 'unhelpful' information out of the way (ahem, teaching is lying) and focus on the meat of this and at what we'll be focusing on exclusively. So, clean-up time!

 

                                       Some breathing room, at last!
 

Aha, so what do we have here?

We have 3 different screens:

- Registers

- Stack

- Code

 

Lucky us! Because that's exactly what we want to follow as we're moving along our code. Let's start with the registers. If you've looked at your PDF, you'll see the stars of our show there. 

Register Screen:

esp is the stack pointer and it is always pointing to the top of the stack. It's not holding the value at the top of the stack, which is the value 3, but it's holding the value 0xffffd0f0.

Huh.. interesting. If you look at the menu below, you'll see that that's the exact stack address of its topmost value. Memory position 0xffffd0f0 is the topmost piece of paper (like its identifier) and in it we have written the value 3 (more on why 3 later). Everything else we care about is (apparently) 0 for now, so let's move on.

Stack Screen:

Remember that pile of papers stuck at the order-spike? This is it.
 Here's an example of one of these lines:

0xffffd0f8│+0x0008: 0xffffd2e0  →  0x32003034 ("40"?)

Let me explain it to you. 

0xffffd0f8  memory location in the Stack. This is represented in 4 byte intervals, since we're in a 32 bit program

+0x0008  Memory offset from the top of the stack. This means that we're in a position 8 bytes above the position of the top of the stack (0xffffd0f0). As we add stuff to the stack the memory values go down. This is an important quirk that you would do good in remembering. Also note that while this has the stack facing upwards, sometimes you'll see it upside down (which makes the memory values 'growing' down appear to be a bit less strange).

0xffffd2e0  What is stored in that stack position (our piece of paper). In this case it is a pointer to a string. That string is '40'. So, in other words, there is a location in memory that is storing the string '40' and in this stack there is a pointer pointing to the first character of that string, or '4'. Read on, it will be made somewhat clearer. You don't need to know in depth what are pointers, etc to be able to read this program, but knowing these things de-mystifies them plenty and makes your life easier and more enjoyable. 

(bonus cookie points for you if you can explain why, as you go up in the stack you go from  0xffffd2e3 to 0xffffd2e0, a small 3 byte jump, and then follow with a big big jump from 0xffffd2e0 to 0xffffd2a5 a 59 byte-sized jump)

0x32003034  Well, I see a 40 and the beginning of a 2. :) Let me explain:

As 40 and 2 were used as arguments in our program, they were added to the stack. But they weren't added as integers. In fact, they were added as ASCII characters. 


 We're working with hexadecimal characters and we're looking for the value 40, right?

So, 0x32003034. You'd possibly expect to read from left to right, but we're working in little-endian here, so in fact our number will be presented right to left. And if we look at that table, we would be reading (now reversed) 4, 0, NUL, 2. That NUL is the null byte which marks the end of that string. And I'd bet you my breeches that after that 32 there will be immediately to its left another 00, since these strings are being stored next to each other. If you're paying attention, you'll know I was cheating and you won't want to bet against me as you read the next value in the stack 0x48530032 and see, in black and white that '00 32', which translates to the number 2 (in decimal). You might be tempted to continue translating that line and read 'SH..'. Interesting, no? But I'll leave it to you to find out more.

But you might be asking, why is the Stack in this initial state? And that is a perfectly valid question. As we load up our program and enter its two parameters 40 and 2, we have automatically added 4 things to the stack, in this order (from bottom to top):

- a pointer to the last parameter (our third argument, or argv[2]);

- a pointer to the first parameter (our second argument, or argv[1]);

- a pointer to the program itself, its full path, in fact (our first argument, or argv[0]);

- the number of arguments  (argc).

As you start reading the program and debugging it, you'll see the stack add more items and remove them. 

Code Screen:

This is where you follow your code. the green arrow and green letters shows you where you're at in the code, and the red dot tells you that there's a breakpoint here. If you're into learning more about GDB (please do) then you can try this link as well. This line of code is the next to be executed. And as you press si ('step into') you'll move along the code, Assembly line by Assembly line.

 

So, let's do just that and run si some 4 times, until we get to that compare (cmp). How will that code look like? That Stack, will it change? And the Registers?  Let's see:


 So, what happened? We started by popping the value on the top of the stack and placing it in ecx. So, in that next step you'll see that ecx = 3. Then we do the same and place that value, a pointer to the function name, in edx, removing it from the top of the stack. Finally, we decrease ecx, basically subtracting 1 to the value within and we xor edx, edx which is basically the same as doing mov edx, 0. ecx is holding the number of variables used in our Math and edx is emptied for future use. See how that works? If we push, we take the value in the register and copy it into the stack, and if we pop we remove the value from the stack and place it in the designated variable.

I'm stopping here with the guided code lesson. I'm leaving it up to you. You do have all the tools that you need to do so.

Remember tidbits like: if you call a function, you jump to that function, instead of continuing down the lines of code, but you place the line of code right after the call on top of the stack (the return address is pushed onto the stack). When you hit the instruction ret, on the other hand, you pop whatever value is on top of the stack and go directly to the line that is referenced in that value (a pointer to a memory value). 

GEF will be kind enough to tell you when a jump instruction isn't fulfilled (which means that instead of jumping into a function you continue reading the code below that jump).

Pay also attention to the name of the function and on how far you are away from its start. For example, in this case, I'm inside the inner function .multiplyLoop, inside the function atoi:

 


 

Here's some more stuff you might want to get into:
 

- what is the register eip doing?

-  What's with bl, BYTE PTR [esi+ecx*1]? Can you guess what's happening there?

- Pay attention to where you're going and what's happening when you run a ret instruction.

- Perhaps write down all the commands in a piece of paper and try to 'guess' how the flow should work until the very last operation. Then compare to what you see on GDB. Does it match?

And that's it! You're now ready to start discovering more and more about this wonderful world, or you're thinking that anyone that likes this must be a bit crazy.



Either way, you cannot unsee it now. 
Have fun and hack away!

Hashing Isn't Encryption — Explained Simply

                           I'm sure this image will be a total viewer-magnet, because Math = Sexeh   Everyone has heard of encryption , ...