In the unlikely event that you have read the first 7 chapters of this book, I am going to assume you are a pretty hard core computer user. What I can say for sure is that you are the type of person who reads books or blog posts about technical details. DOS is an operating system that tends to only be used by nerds who love reading text and efficient operations at the command line.
Sadly to say, our kind is dying out. At the time of writing this I am 38 years old and there are few people who remember the old way computers were used. DOS is mostly seen as a dead platform and it is not usually used except by programmers and hard core gamers who still run their favorite games in a DOS emulator. Though I cannot fail to mention that FreeDOS is available as a real DOS system.
But most people know nothing about DOS because the popular operating systems available today are Windows, MacOS and Linux.
If you have enjoyed programming in Assembly, I do have some helpful tips on how you can apply most of the same information to start Assembly in Linux.
As far as Windows or MacOS go, I cannot help you much with that because I don’t use proprietary operating systems if I have a choice. These operating systems don’t allow you to simply load registers and call interrupts to print things on the screen.
Linux, however, works very much like DOS does. If you know how to load the registers correctly and use a system call, you can print strings of text just like in DOS except MUCH faster because you will be running natively instead of in an emulator as in the DOS examples from the rest of this book.
I cannot cover the details of installing a Linux operating system because there are many choices. However I recommend Debian because it has been my main distro for years. Therefore, the following two programs that I will show you in this chapter have both been tested to work on my 64 bit Intel PC running Debian 12 (bookworm).
Remember, although DOS was a 16 bit system, modern Linux processors and distros usually support 32 or 64 bit code. Therefore, I will be showing you a small program using the FASM assembler that prints text using a Linux version of the putstring function. It behaves the same as the DOS version behaves in chapter 2.
main.asm (32 bit)
format ELF executable
entry main
main:
mov eax,main_string
call putstring
mov eax, 1 ; invoke SYS_EXIT (kernel opcode 1)
mov ebx, 0 ; return 0 status on exit - 'No Errors'
int 80h
;A string to test if output works
main_string db 'This program runs in Linux!',0Ah,0
putstring:
push eax
push ebx
push ecx
push edx
mov ebx,eax ; copy eax to ebx. ebx will be used as index to the string
putstring_strlen_start: ; this loop finds the length of the string as part of the putstring function
cmp [ebx],byte 0 ; compare byte at address ebx with 0
jz putstring_strlen_end ; if comparison was zero, jump to loop end because we have found the length
inc ebx
jmp putstring_strlen_start
putstring_strlen_end:
sub ebx,eax ;By subtracting the start of the string with the current address, we have the length of the string.
; Write string using Linux Write system call. Reference for 32 bit x86 syscalls is below.
; https://www.chromium.org/chromium-os/developer-library/reference/linux-constants/syscalls/#x86-32-bit
mov edx,ebx ;number of bytes to write
mov ecx,eax ;pointer/address of string to write
mov ebx,1 ;write to the STDOUT file
mov eax,4 ;invoke SYS_WRITE (kernel opcode 4 on 32 bit systems)
int 80h ;system call to write the message
pop edx
pop ecx
pop ebx
pop eax
ret ; this is the end of the putstring function return to calling location
; This Assembly source file has been formatted for the FASM assembler.
; The following 3 commands assemble, give executable permissions, and run the program
;
; fasm main.asm
; chmod +x main
; ./main
The program above uses only two system calls. One is the call to exit the program. The other is the write call which is the same as the DOS function 0x40 of interrupt 0x21; However, the usage of the registers is not in the same order. However, these registers: eax,ebx,ecx,edx are the same registers except that they are extended to 32 bits. That is why they have an e in their name.
But if you take the time to study it, you will see that it does the exact same process of finding the length of the string by the terminating zero and then loading the registers in such a way that the operating system knows what function we care calling, which handle we are writing to, how many bytes to write, and where the data is in memory which will be written.
Next I will show you the 64-bit equivalent that works the same way but uses different numbers for the system calls.
main.asm 64 bit
format ELF64 executable
entry main
main: ; the main function of our assembly function, just as if I were writing C.
mov rax,main_string ; move the address of main_string into rax register
call putstring
mov rax, 60 ; invoke SYS_EXIT (kernel opcode 60 on 64 bit systems)
mov rdi,0 ; return 0 status on exit - 'No Errors'
syscall
;A string to test if output works
main_string db 'This program runs in Linux!',0Ah,0
putstring:
push rax
push rbx
push rcx
push rdx
mov rbx,rax ; copy rax to rbx as well. Now both registers have the address of the main_string
putstring_strlen_start: ; this loop finds the lenge of the string as part of the putstring function
cmp [rbx],byte 0 ; compare byte at address rdx with 0
jz putstring_strlen_end ; if comparison was zero, jump to loop end because we have found the length
inc rbx
jmp putstring_strlen_start
putstring_strlen_end:
sub rbx,rax ;rbx will now have correct number of bytes
;write string using Linux Write system call
;https://www.chromium.org/chromium-os/developer-library/reference/linux-constants/syscalls/#x86_64-64-bit
mov rdx,rbx ;number of bytes to write
mov rsi,rax ;pointer/address of string to write
mov rdi,1 ;write to the STDOUT file
mov rax,1 ;invoke SYS_WRITE (kernel opcode 1 on 64 bit systems)
syscall ;system call to write the message
pop rdx
pop rcx
pop rbx
pop rax
ret ; this is the end of the putstring function return to calling location
; This Assembly source file has been formatted for the FASM assembler.
; The following 3 commands assemble, give executable permissions, and run the program
;
; fasm main.asm
; chmod +x main
; ./main
You may notice that the 64-bit program also uses the syscall instruction rather than interrupt 0x80. On my machine both programs behave identically because both calling conventions are valid. There are executables that run in 32 bit mode and others that run in 64 bit mode. They are not usually compatible and the FASM assembler has to be told which format is being assembled.
FASM has been my preferred assembler for a long time because unlike NASM, it has everything it needs to create executables without depending on a linker.
“What is a linker?” You might be asking. You see, the developers of Linux never really expected for people to be writing applications entirely in assembly. Usually they are written in C and then GCC compiles it to assembly that only the Gnu assembler (informally called Gas) can assemble and then link with the standard library. There is a linker program called “ld” that GCC automatically uses.
However, through some research and experimentation, I have converted the previous 64 bit FASM program into the Gas syntax. As you read it, remember that the AT&T phone company made this weird alternative syntax. The source and destination have been flipped so you will see the register receiving data on the right side instead of the left.
main.s (GNU Assembler 64 bit)
# Using Linux System calls for 64-bit
# Tested with GNU Assembler on Debian 12 (bookworm)
# It uses Chastity's putstring function for output
.global _start
.text
_start:
mov $main_string,%rax # move address of string into rax register
call putstring # call the putstring function Chastity wrote
mov $0x3c,%eax # system call 60 is exit
mov $0x0,%edi # we want to return code 0
syscall # end program with system call
main_string:
.string "This program runs in Linux!\n"
putstring: # the start of the putstring function
push %rax
push %rbx
push %rcx
push %rdx
mov %rax,%rbx
putstring_strlen_start:
cmpb $0x0,(%rbx)
je putstring_strlen_end
inc %rbx
jmp putstring_strlen_start
putstring_strlen_end:
sub %rax,%rbx # subtract rax from rbx for number of bytes to write
mov %rbx,%rdx # copy number of bytes from rbx to rdx
mov %rax,%rsi # address of string to output
mov $0x1,%edi # file handler 1 is stdout
mov $0x1,%rax # system call 1 is write
syscall
pop %rdx
pop %rcx
pop %rbx
pop %rax
ret
# This Assembly source file has been formatted for the GNU assembler.
# The following makefile rule has commands to assemble, link, and run the program
#
#main-gas:
# gcc -nostdlib -nostartfiles -nodefaultlibs -static main.s -o main
# strip main
# ./main
Although I find the GNU Assembler syntax hard to read, the fact that this assembler exists as part of the GNU Compiler Collection means that it is usually available even on systems that don’t have FASM or NASM available.
It is possible to use NASM also but it can’t create executables and requires linking with “ld” anyway. It is better to just write directly for the GNU Assembler or stick with FASM if you prefer intel syntax.
However, the beauty is that the machine code bytes from both types of assembly are identical! In fact that is how I got the GAS version. I had to assemble the other version and then disassemble it with objdump to get the equivalent syntax.
The programs you saw in this chapter only work on Linux, but Linux is Free both in terms of Software Freedom and Free in price too because anyone with an internet connection can download the ISO of a new operating system and install it on their computer as long as they take the time to read directions from the makers of that distribution. In fact Debian, Arch, Gentoo, and FreeBSD (not Linux but very similar) all have great instruction manuals. If you have managed to read this book, then you will have no problem following their stuff.
Leave a comment