This post is a chapter from my recently published Linux edition of Assembly Arithmetic Algorithms. The story behind why I wrote the program featured in chapter 15 goes back to when I discovered how to cheat at video games. This story is worth sharing to inspire the next generation of gamers to learn computer math and programming just as I did.
Chapter 15: chastecmp
In this chapter, I will show you the source code of a file comparison program. This program is meant to find which bytes are different between two files that are similar but contain a few differences.
I will use text files for my examples in this chapter, but the program actually does a binary file comparison and displays the different bytes in hexadecimal because it is a universally understood shorthand for binary that most C and Assembly programmers are already familiar with.
First, here is the source code of chastecmp, which is the short name for “Chastity’s Comparison tool”. The name is also meant to refer to the “cmp” instruction, which is used a lot more in this program because it is essential.
FASM chastecmp source
;Linux 32-bit Assembly Source for chastecmp
format ELF executable
main:
;radix will be 16 because this whole program is about hexadecimal
mov dword[radix],16 ; can choose radix for integer input/output!
mov dword[int_width],1
pop eax ;get the number of arguments
dec eax ;subtract 1 because we will ignore the name of the program
pop ebx ;pop program name into a register to delete it from stack
cmp eax,2 ;do we have two arguments to be used as filenames?
jb help
mov dword[offset],0 ;assume the offset is 0,beginning of file
jmp arg_open_file_1
help:
mov eax,help_message
call putstring
jmp main_end
arg_open_file_1:
pop eax
mov [filename1],eax ; save the name of the file we will open to read
call putstring ;print the name of the file we will try opening
mov ecx,0 ;open file in read mode
mov ebx,eax ;move filename for system call
mov eax,5 ;invoke SYS_OPEN (kernel opcode 5)
int 80h ;call the kernel
cmp eax,0
js file_error_display ;end program if the file can't be opened
mov [fd1],eax ; save the file descriptor number for later use
mov eax,file_open
call putstr_and_line
arg_open_file_2:
pop eax
mov [filename2],eax ; save the name of the file we will open to read
call putstring ;print the name of the file we will try opening
mov ecx,0 ;open file in read mode
mov ebx,eax ;move filename for system call
mov eax,5 ;invoke SYS_OPEN (kernel opcode 5)
int 80h ;call the kernel
cmp eax,0
js file_error_display ;end program if the file can't be opened
mov [fd2],eax ; save the file descriptor number for later use
mov eax,file_open
call putstr_and_line
files_compare:
file_1_read_one_byte:
mov edx,1 ;number of bytes to read
mov ecx,buf1 ;address to store the bytes
mov ebx,[fd1] ;move the opened file descriptor into EBX
mov eax,3 ;invoke SYS_READ (kernel opcode 3)
int 80h ;call the kernel
;eax will have the number of byte read after system call
mov [count1],eax ;we save the number of byte read for later
cmp eax,0
jnz file_2_read_one_byte ;unless zero bytes were read, proceed to read from next file
mov eax,[filename1]
call putstring
mov eax,end_of_file_string
call putstr_and_line
;Even if we have reached the end of the first file,
;we still proceed to read a byte from the second file
;to see if it also ends at the same address
file_2_read_one_byte:
mov edx,1 ;number of byte to read
mov ecx,buf2 ;address to store the bytes
mov ebx,[fd2] ;move the opened file descriptor into EBX
mov eax,3 ;invoke SYS_READ (kernel opcode 3)
int 80h ;call the kernel
;eax will have the number of bytes read after system call
mov [count2],eax ;we save the number of bytes read for later
cmp eax,0
jnz check_both_bytes ;unless zero bytes were read, proceed to compare bytes from both files
mov eax,[filename2]
call putstring
mov eax,end_of_file_string
call putstr_and_line
jmp main_end ;we have reach end of one file and should end program
check_both_bytes:
;we add the number of bytes read from both files
mov eax,[count1]
add eax,[count2]
cmp eax,2
jnz main_end
compare_bytes:
mov al,[buf1]
mov bl,[buf2]
;compare the two bytes and skip printing them if they are the same
cmp al,bl
jz bytes_are_same
;print the address and the bytes at that address
mov eax,[offset]
mov dword[int_width],8
call putint_and_space
mov dword[int_width],2
mov eax,0
mov al,[buf1]
call putint_and_space
mov al,[buf2]
call putint_and_line
bytes_are_same:
inc dword[offset]
jmp files_compare
file_error_display:
mov eax,file_error
call putstr_and_line
main_end:
;this is the end of the program
;we close the open files and then use the exit call
mov ebx,[fd1] ;file number to close
mov eax,6 ;invoke SYS_CLOSE (kernel opcode 6)
int 80h ;call the kernel
mov ebx,[fd2] ;file number to close
mov eax,6 ;invoke SYS_CLOSE (kernel opcode 6)
int 80h ;call the kernel
mov eax, 1 ; invoke SYS_EXIT (kernel opcode 1)
mov ebx, 0 ; return 0 status on exit - 'No Errors'
int 80h
include 'chastelib32.asm'
;variables for displaying information
help_message db 'chastecmp by Chastity White Rose',0Ah,0Ah
db 9,'chastecmp file1 file2',0Ah,0Ah
db 'Differing bytes are shown in hexadecimal',0Ah
db 'until the EOF has been reached.',0Ah,0
file_open db ' opened',0
file_error db ' error',0
end_of_file_string db ' EOF',0
db 23 dup 0 ;fill with extra space to match 1024 executable size
;variables for managing files
filename1 dd ? ;name of the file to be opened
filename2 dd ? ;name of the file to be opened
fd1 dd ? ;file descriptor 1
fd2 dd ? ;file descriptor 2
buf1 db ? ;store byte from file 1 here
buf2 db ? ;store byte from file 2 here
count1 dd ?
count2 dd ?
offset dd ?
How to use chastecmp
Using the chastecmp program requires two filenames to be passed as command-line arguments. Although you can use any files you have, it makes sense to use a simple example with text files because they are so easy to create with the echo command.
Run these commands to create the two files.
echo "chandler is my birth name" > file1.txt
echo "chastity is my trans name" > file2.txt
Now that the files exist
./main file1.txt file2.txt
If you have created these files and run the chastecmp program on them, you will see this result:
file1.txt opened
file2.txt opened
00000003 6E 73
00000004 64 74
00000005 6C 69
00000006 65 74
00000007 72 79
0000000F 62 74
00000010 69 72
00000011 72 61
00000012 74 6E
00000013 68 73
file1.txt EOF
file2.txt EOF
How does chastecmp work?
This program is much simpler than chastack or chastext, but it is close to 180 lines and still has some logic to follow. First thing it does is check to see how many command-line arguments were passed to the program. Since the name of the program always counts as 1, we subtract from this number and also pop the next argument into ebx just to get rid of it. The actual register used doesn’t matter in this case as long as it is not eax, which holds the number of arguments.
The eax register is compared with 2. If this number is below 2, then there are not enough arguments to continue the program, and it will end. Otherwise, it will proceed to use the open call with both filenames and assume these files exist. If they do not exist, it will print the filename and then say error.
If both files are opened, it will keep reading 1 byte from each file descriptor and store each in its own buffer of 1 byte. If the two bytes are the same, they will be ignored. However, if they are different, the address and the values of both bytes at that address will be displayed.
The variable “offset” is used to keep track of which address we are at in both files, but it isn’t used to lseek in this program because we are going from beginning to end.
If at any time the read system call returns 0, a message is displayed with the filename and EOF to tell the user that the end of that file has been reached.
In the example I just used, both files are the same length of 26 bytes and will reach the end at the same time.
But why should I care?
The average person probably does not know why it matters to see the hexadecimal differences between two files. I know it seems silly, especially for small text files as I used in this chapter’s examples. However, I can give two examples of times I have used this information.
The first example is relevant to Chapter 2, where I presented the header file “chaste-elf-32.nasm” which can be included to make a loadable program using the NASM assembler.
I read the specification document for ELF files to describe what the fields were named and what the values meant. However, this informational alone was not enough for me to successfully create the custom ELF header. I had to create ELF executable files with FASM because it has this feature built in. By creating slightly different programs, I was able to compare the binary differences in the different source files fed to FASM. The chastecmp program was extremely helpful to me as I used it hundreds of times in reverse engineering the ELF format.
One of my discoveries was that when the size of a program increased, either by adding more code or adding more data statements, there was a number in the header that also increased. As it turns out, the memory size of the file increased even when data reservation keywords (such as rb,rw,rd, and rq) were used, even when the size of the file itself didn’t.
The specification could tell me a lot, but without the example ELF headers FASM was already creating, I would not have been able to create dynamic headers to match programs written in FASM. I probably spent 12 hours on that project, but at least I can assembler any of my programs with NASM if I make the necessary syntax changes.
But perhaps a more fun example, and also the reason I got started with programming, was that I used a file comparison tool to cheat at a Norse mythology game years ago. The game was called Castle of the Winds, and it ran on Windows 3.1, 98, and even XP.
One of the features of that specific game was that it let you save the game at any time. I remember that I had 5 mana points. I saved the first file and then cast the magic arrow spell to spend one point. I then saved a second file and ran the Windows “fc” command to compare the two files in binary mode.
fc /b 1.cwg 2.cwg
It told me the address of the byte that had changed from 5 to 4. I then opened this in a hex editor named XVI32 and changed this byte to different values.
In time, I was able to not only change my mana points but also hit points and experience points to make myself invincible in that game.
I didn’t really know much about hexadecimal at this point, but by trial and error, I accidentally started understanding it. It was this experience of cheating in a video game that led me to learn about binary and hexadecimal number systems originally.
I had seen for the first time that an understanding of computer arithmetic could allow me to break the rules and do things in a video game that the developer could not predict or prevent me from doing. In those days, I learned to do the same with many video games and had many fun adventures.
In modern times, developers have gotten smarter and have put measures in place to prevent this form of cheating. Most notably, more games are multiplayer and read data from a server that stores the game data, where no user can hack it.
But you have to understand that back in the 90s, nearly every single player game could be hacked that stored its data locally and didn’t connect to the internet. I have had people criticize my habit of cheating in single-player games and say that it ruins the experience of the game.
But what they don’t understand is that I didn’t care about the video game I was hacking, because Arithmetic had become my favorite game. My love of math was so great that I learned computer programming and had more fun writing programs in BASIC, C, and Assembly than I did playing video games in the first place.
I can’t hack most modern games with these tricks, but I have found the art of computer programming, which is much more satisfying than any video game I have played in my life.
In summary, the chastecmp program does the same thing as the “fc /b” command from DOS and Windows did. When I switched to Linux as my primary operating system, I wrote my own file comparison tool to always keep the fond memories of my childhood with me.