Saturday, September 21, 2024

Wherein We Create An Assembly Program: Butting Heads With The OS



When I first started learning Assembly, I wanted to write simple code, but the apparent complexity of the code I’d seen up to that point threw me off—and that might happen to you as well.


Thing is, you've probably seen ASM code that is basically a decoding of some program - the assembly level code that is the result of a partial compilation or full compilation of another program. That means this code will have a lot of overhead from function prologues, epilogues, and groundwork needed to work smoothly with different libraries and instances.

But you don't need all that to write simple ASM code. In fact you only need this:


That's right:

- you need a data section and a text section (and some detail in between).

So, having found out that that was the case, I started creating simple programs. One of the first ones was this one, a simple "Hello, World!" program (ah, the good old tradition):

See? I told you. Not too complicated.

We have a .data section, a .text section and _start, which is a label that is also where the program execution begins.
 
Yeah, ok... besides that, there's probably a lot here you've never seen before. But remember that this is not a race, it's a marathon (whatever this is). We're going to search, ask and prod for everything we don't understand. And then we're going to get some practice under our belts with these new things.

But let's clear the waters and define some of these terms and words:

db - 'define byte'. This allocates storage for a string of bytes and initializes it with a given value.

0xa - this introduces a newline character (go check 'line feed' and also look up this ASCII table, *wink, wink, nudge, nudge*)

$ is the current address in the assembler, so $ - msg  gives us the difference between that current address and the address of msg which is the address at the start of msg, then we let len be that. Smart, huh? That gives us the length of our msg string, no matter what it contains.

As for int 0x80 it's an interrupt to a syscall. We've mentioned that before. But most of today's blogpost will revolve around syscalls and registers, so we'll get right back to that after we stop babbling and create our first Assembly program.

Remember, we save our ASM script, go back to the command line and enter the following:

nasm -f elf32 -o hello_world.o hello_world.asm
ld -m elf_i386 -o hello_world hello_world.o
./hello_world

And voilà! Our first Assembly 'Hello, World!' program. Newline included, free of charge.

Let's look at our code proper, inside of _start. What are those lines doing? One by one:

mov edx, len -> the sys_write syscall, which will print "Hello, World!" to the stdout expects the length of the data to be in edx. So we move that onto that register.
mov ecx, msb -> that same syscall expects the address of the msg to be in ecx.
mov ebx, 1 -> 1 is the file descriptor of stdout.
mov eax, 4 -> 4 is the call number for sys_write.
Next we do our interrupt, after everything being loaded up and ready.

And finally we do:
mov eax, 1 -> 1 is the number for the sys_exit syscall.
We do our final interrupt, and the program exits.


Let's go back to our 0x80 interrupts
Why are we doing them within our Assembly code, and why aren't we doing them when we create and compile our C programs, for example?

Well, we do, or it does quietly in the background. These syscalls are hidden within our compilation process and in particular within the libraries we do. The libraries are packaged with more than just the functions we regularly use. they also contain these syscalls which tell the OS to act in particular ways.

The function printf, for example, doesn't directly issue a system call. Instead, it goes through the C runtime, which handles formatting, buffering, and then calls a lower-level function like write() (from the C Standard Library) to send the output to stdout.
There is, in fact a system call to the kernel, but it's hidden behind several layers of abstraction.
Next time you check the Assembly code of a C program, be on the lookout for stuff like call 0x1030 <printf>. This is a syscall into the C Standard Library.

Now for something that happened to me a while ago, when I was creating a simple Assembly program to add two predefined numbers. Here’s part of the code—this version works:



Ok, like I said, this worked just fine, added '1' to '2' and returned '3'. Amazing.

Then I started thinking: 'That’s a lot of repetition. I’m constantly refreshing the register values after each interrupt. What if I just skip some of that?'

And so I did. This was one of such versions:


See? I just deleted register instructions in between interrupts.
Long story short: this version doesn't work.

And that was surprising to me at the time, and got me wondering as to why the program wasn't working. I knew that ASM wasn't preserving my register values between syscalls. But why?

After a little research I found out why: the Kernel takes control of our programs upon a syscall and it uses those same registries for its own sake. Hence, we need to always re-set our registers to the values that we want them to have after each syscall.

In fact, when the program is running normally, it’s in 'user mode,' and after a syscall is made, control is given to the Operating System. At that point, we enter 'kernel mode' until the OS finishes and returns control to the user. Here's a bad sketch to view this in action:


Good to know, right? Or at least interesting enough!

So I decided to create a (cheaty) ASM program to detect the values of the registers before and after a syscall. I wanted to actually visualize this change in two moments in time, much like the much maligned print debugging.
Here it is:



If you try to compile this program normally, it won’t work. And yeah, I’m a dirty cheater for this, but explaining all the reasons why is beyond the scope of this post. But here's a hint: see that call printf? That looks a lot like a C thing, but not so much an ASM thing.

I leave it to you to prod as to how I've cheated and what these compilation steps (which make the program work) are actually doing:

nasm -f elf32 -o register_catcher.o register_catcher.asm 
gcc -m32 -o register_catcher register_catcher.o -nostartfiles -no-pie




See? You cannot expect the registers to remain the same after an interrupt.
The OS will use and change them, often in ways you don’t expect.

Care to remake this catcher in pure ASM? Perhaps you'd like to learn more about Assembly? Then check this link. How about a Syscall Table?

The purpose of this blog is not to teach Assembly step by step, but there's a ton of resources out there for you to explore.

I hope you have learned something new. I learned a lot while playing with Assembly and when writing these blog posts.

Enjoy!





No comments:

Post a Comment

Why Won’t You Power Off?

                                  "Beneath this mask there is more than flesh. Beneath this mask there is an idea, Mr. Creedy."   ...