Smallest DOS .EXE File

I posted an answer over at StackOverflow about someone wanting to decompile a 16-bit DOS executable. A few days later, someone else (inadvertently) made a comment that .EXE files couldn't be 104 bytes. Knowing that this can't be the case, since DOS .EXE file headers are only 28 bytes in size.

Ya, you guessed it, I had to try.

So I hard coded--word for word--a small .EXE file to print Hello, World! to the screen.

The .EXE is 59 bytes, total. This includes the 28 byte header. (Assembled with NBASM)

If we were to just return to DOS, not doing anything, the minimum file size would be 33 bytes, assuming a 'retf' will return to DOS.

However, there are some assumptions here. 1) There is no relocation table which may or may not be required depending on where you read the specification, and 2) The CRC is zero, which is considered okay in most cases.

Here is the source:

STACK_SIZE  equ  128

.model tiny
.code

outfile 'small.exe'

    ; exe header
    org 00h

    dw   0x5A4D                       ; EXE signature
    dw   (end_code & 511)             ; bytes in last page
    dw   ((end_code + 511) / 512)     ; number of 512-byte pages
    dw   0                            ; number of relocation entries
    dw   ((start + 15) / 16)          ; header size
    dw   (((end - start) + 15) / 16)  ; memory required (in paragraphs)
    dw   (((end - start) + 15) / 16)  ; memory requested (in paragraphs)
    dw   0x0000                       ; ss (relocation)
    dw   ((stack - start) + STACK_SIZE) ; sp
    dw   0                            ; crc
    dw   (start - start)              ; initial ip
    dw   0x0000                       ; cs (relocation)
    dw   ?                            ; relocation table offset
    dw   ?                            ; overlay
    dw   ?                            ; overlay man
    
    ; this is where the relocation table would start.
    ;  however, since we don't have one, let's start 
    ;  the code here instead
    
    ; executable start
    org  32
start:
    push cs                      ; make ds = cs
    pop  ds                      ;
    
    mov  ah,09h                  ; DOS String Out service
    mov  dx,(string - start)     ; must subtract origin
    int  21h                     ; initiate the service call
    
    mov  ah,4Ch                  ; exit to DOS
    int  21h                     ;   (no error value)
    
string  db  'Hello, World!', '$'
    
end_code:
    
    orgnf (($ + 1) & ~1)    ; align on a word boundary without
stack   dup STACK_SIZE,?    ;           adding a byte to the file

    ; end of our code/data/file    
end:

.end

At 59 bytes, I believe this is the smallest an .EXE can be, using the following rules.

1) Must be a standard DOS 16-bit .EXE
2) Must use DOS service 09h to print the string 'Hello, World!' (exactly as shown)
3) Must use DOS service 4Ch to exit

If you can do better, please let me know.

xHCI and QEMU (Update)

After some research and communication, I have found the reason for the spurious interrupt mentioned in yesterday's blog post. It actually isn't spurious afterall. QEMU is checking to see if the Event List is empty after every Event TRB is completed. Don't know why, but it is. On a mis-matched cycle bit, your code (the Consumer) is to update the xHC_INTERRUPTER_DEQUEUE register with the current TRB location *without* incrementing your internal Dequeue Pointer. This was my error, I was incrementing my internal Dequeue pointer. A simple code modification fixed this. You see, I am not perfect. :-)

As far as the third note in yesterdays post, I still think this is in error. However, I seriously doubt that anything will be done to QEMU to fix it.

Anyway, I learned something today. I continue to learn something each day and that is what makes this hobby so enjoyable.

xHCI and QEMU

After some well deserved and appreciated time off from this hobby of mine, I have recently started working on a few things again. Out of curioustiy, I decided to see if my USB code worked on the latest version of QEMU. It seems that a lot of my readers perfer it over Bochs simply because of the speed advantage. Wait until they find out the vast debugging capabilities of Bochs. :-)

To start with, I tested UCHI, well why not? All went well. Same for OHCI and EHCI. However, when I started with the xHCI testing, QEMU didn't like my code at all.

I have been doing some tests and have found a few errors, either with QEMU, or my code. However, these same tests pass on real hardware, so I am leaning toward QEMU, though have not come to that as of yet.

First, I believe that QEMU only allows a single segment within the Event Ring. The questioning code being at xhci_write-event() and specifically starting at line 668. However, I have not investigated this further. It is a possiblity that QEMU uses 'intr' as a virtual list and later writes this to a physical list. I don't know. Further investigation will need to be done. Update: Granted if QEMU never changes the MAX SEGMENTs value in the HCSPARAMS2 register, then one is okay. However, after looking through the code, it would be quite simple to add multiple segments.

Second, I believe that QEMU is sending a spurious Event Ring Interrupt after every correct Event Ring Interrupt. This has to do with starting at line 3090. QEMU is firing an interrupt because the written ERDP value doesn't match what it thinks it should be. This is because QEMU has already incremented the 'intr->er_ep_idx' value to the next available location. When software writes to the ERDP.Event Ring Dequeue Pointer, it writes that last executed entry, not the next available entry. Update: I have confirmed this bug. I have reported it and hopefully someone will fix it.

Anyway, incase you are using QEMU as your test emulator and are having issues with your xHCI code, please be advised that you are not the only one.

For the record, please don't think that I am saying anything bad about QEMU. I think it is a wonderful emulator and use it quite often. I am just trying to find out why my tests fail on QEMU and want to fix the issue, whether it be my tests or QEMU is something I am looking to find out.

Update: I have found why my tests don't pass on QEMU. As you know, a *successful* Control Tranfer includes a single SETUP TRB, zero or more DATA TRBs, and a STATUS TRB. It is proper to send the SETUP TRB and, if any, all of the DATA TRBs before you send the STATUS TRB. What if the SETUP TRB doesn't process correctly? The stream would not expect a STATUS TRB if there was not a successful SETUP TRB, correct? QEMU is checking that there is a STATUS TRB *before* it processes the SETUP TRB. (This is done on line 1701.) My code sends the SETUP TRB, zero or more DATA TRBs and then waits for a successful transfer of all of those TRBs before sending the STATUS TRB. QEMU was not happy that there was no STATUS TRB before the transfer started.

The specification clearly states that this is allowed (4.11.2.2), however, the controler (QEMU) *must* indicate with an Error Completion code. QEMU does not.

Happy New Year

I want to wish everyone a Happy New Year.

I have been working on the 64-bit version of my operating system and have made good progress. I still have more work to do to make it functional as the 32-bit version is.