80x86 16-bit Compiling How-to
by Alexei A. Frounze

Table of Contents

1 Introduction

The need for making 16-bit code in is primarily due to the following facts:

So, one would want 16-bit real mode code to run on the 80x86 PC to take advantage of using the BIOS and/or prepare to switch to the 32-bit protected mode of the CPU, like in e.g. bootloaders or OS loaders. For some purposes, pure 16-bit real mode code is enough as well. And you can compile your own ROM BIOS for an embedded x86-based system!

2 Revising Memory Addressing in Real Mode of 80x86 CPU

Let's revise realmode 80x86 memory addressing.

2.1 From 8080/8085 to 8086

The intel 8086 CPU was derived from intel 8080/8085 CPU and inherited 16-bit ideas from it. Although being 16-bit and somewhat compatible with 8080/8085, the 8086 CPU has an enhanced memory addressing mechanism, which isn't condemned to the 16 lines of the address bus, instead the 8086 has a 20 lines-wide address bus. So, unlike 8080/8085 (which could address up to 216 = 65536 bytes of memory, i.e. 64 KB), the 8086 can address up to 220 = 1048576 bytes of memory, i.e. 1 MB.

Now, let's see how intel implemented memory addressing...

An 8080/8085 would access its worth of 64 KB memory using direct and indirect forms of address specifications in the CPU instructions.

For example:

Instruction Action
LDA 2050H Load A (8-bit accumulator register) with byte from memory location 2050H.
LHLD 0A00H Load HL (16-bit register) with word from memory location 0A00H (byte at 0A00H would go to L (least signigicant half of HL) and byte at 0A01H would go to H (most significant half of HL)).
MOV A, M Load A (8-bit accumulator register) with byte from memory location specified in the 16-bit register HL (M designates accessing memory indirectly thru the HL register).
LDAX B Load A (8-bit accumulator register) with byte from memory location specified in the 16-bit register BC.

Hence, it's very simple with 8080/8085. Either the 16-bit address is a constant value encoded in the CPU instruction and the memory location is accessed directly by using the encoded address (this is direct addressing) or the 16-bit address is contained in a 16-bit register of the CPU (BC or HL in our examples) and this address is read from the register before accessing a memory location by this address (this is indirect addressing).

Now, the 8086 can do the same thing...

Instruction Action
MOV AL, [2050H] Load AL (least significant half of 16-bit accumulator register AX) with byte from memory location 2050H.
MOV BX, [0A00H] Load BX (16-bit register) with word from memory location 0A00H (byte at 0A00H would go to BL (least signigicant half of BX) and byte at 0A01H would go to BH (most significant half of BH)).
MOV AL, [BX] Load AL (least significant half of 16-bit accumulator register AX) with byte from memory location specified in the 16-bit register BX.
LODSB Load AL (least significant half of 16-bit accumulator register AX) with byte from memory location specified in the 16-bit register SI.

Same thing.
Almost...

2.2 16 or 20 Bits? Meet the Segment:Offset Pair!

Do you remember that 8086 has been said to have 20-bit-wide address bus?
You surely do, don't you?
Then how come the four 8086 instructions above specify only 16 bits of the address?
Where's the leftover, the 4 other bits to make it 20-bit? :)

The fun part is that there's one special address register involved, the DS register (data segment register). The DS register is also a 16-bit register.
The value of the DS register is concatenated with the 16-bit address specified in the instruction. The concatenation is a bit tricky. The DS value is shifted left by 4 binary positions (or, equivalently, multiplied by 16) and then added to the 16-bit address specified in the instruction.

Example:

BX=341BH
DS=123AH
MOV AL, [BX]

would load the AL register with a byte from memory location 123AH * 16 + 341BH = 123A0H + 341BH = 157BBH. 157BBH is the physical 20-bit address that is placed on the address bus so that the memory value at this address can be transferred to CPU (or backward, from CPU to memory, with e.g. MOV [BX], AL).

Really simple.

The address of the form 123AH:341BH is referred as to logical address.
The part that is specified before the colon is referred as to segment part of the address (or often for shortness just segment). The part that is specified after the colon is referred as to offset part of the address (for shortness just offset or sometimes displacement).

segment:offset pair is a logical address
segment * 16 + offset = physical address

So, with a constant value of segment (say, constant DS; there can be other segment registers used) but with different values of offset, we can address up to 216 = 65536 bytes = 64 KB of memory starting at the physical address equal to segment * 16. This 64 KB region of memory is referred as to segment. Right, same word is often used to refer to different things and smart guys are known to do it all the time. :) This is important to remember, if you're new to this addressing stuff and its terminology. Hopefully, you'll be able to deduce from context what segment stands for.

By changing the segment value (say DS value) and offset value we can generate all the physical addresses from 0 up to 220-1, but this is not the upper bound. Technically, if we take segment=0FFFFH and offset=0FFFFH, then we'll end up with physical address equal to 10FFEFH, which needs 21 bit to be represented. The 8086 CPU has only 20 address lines, so such an address would lose its most significant bit and wrap around zero and in this example the 8086 CPU would access the byte at physical address 0FFEFH instead of 10FFEFH.

It is important to mention that there are many different logical address possible such that transform to the same physical address. This is the effect of the way the segment:offset pair is transformed to the final, physical, address.

Just an example:

123AH * 16 + 341BH = 123A0H + 341BH = 157BBH
1239H * 16 + 342BH = 12390H + 342BH = 157BBH
143AH * 16 + 141BH = 143A0H + 141BH = 157BBH

...

2.3 More Than 1 MB?

With introduction of the intel 80286 CPU, the number of address lines extended to 24, so on the 80286 you can access memory above 1 MB mark by using the segment:offset pair. Only FFF0H = 65520 bytes (almost 64 KB) above 1 MB can be accessed this way. But that can only be possible if you enable the A20 address line (8086 had only A0 thru A19 lines). For compatibility (with 8086 PCs) reasons, the PC engineers had added a programmable hardware mechanism on 80286+ based PCs to enable and disable the A20 address line, so that the address wrap around be possible just like on the 8086. When the A20 is disabled, both 10FFEFH and 0FFEFH physical addresses, generated by a 80286+ CPU, would appear to the memory as physical address 0FFEFH, i.e. the 20th address bit would always be 0.
We won't discuss details of A20 enabling and disabling here because it's an off-topic.
For now, let's just mention that in the protected mode of the intel 80286+ and 80386+ CPUs, it's possible to access to much more memory than 1 MB. The 80286 can access up to 16 MB of memory and the 80386 and 80486 can access up to 4 GB. Pentium class CPUs can access even more. That's it about protected mode for now.

2.4 Which Segment Register?

OK. Let's get back to the segment registers... In fact, the 8086 CPU always uses some segment register to read code/data from memory or write data to memory.

The instructions executed by the 8086 CPU are sequentially read from memory using the CS:IP pair of CPU registers (CS is Code Segment register, IP is Instruction Pointer register). After execution of an instruction has completed, the IP will increment so the next instruction can be feched and executed. IP can also be changed by the near jump, call and return instructions, e.g. the control is transferred within 64 KB segment starting at physical address equal to CS * 16. The far jump, call and return instructions modify both IP and CS and make it possible to transfer control to any part of a program anywhere in the 1 MB of addressable memory. Interrupt and return from interrupt instructions always modify CS and IP, similarly to far call and return instructions.

The 8086 CPU stack is organized with the SS:SP pair of registers (SS is Stack Segment register, SP is Stack Pointer register). SP decrements by 2 before a 16-bit word is stored on the stack, and conversly increments by 2 after a 16-bit word is removed from the stack. All interrupt, call and return instructions affect SP, not affecting SS.

Let alone instruction fetch (with CS:IP) and stack manipulations (with SS:SP)... The interesting thing is how the 8086 CPU transfers data between itself and memory using direct and indirect addressing with registers other than IP and SP. It might look a bit complicated, but here's how it works...

The 8086 CPU registers are:

AHAL
AX
BHBL
BX
CHCL
CX
DHDL
DX
FLAGS
DI SI BP SP IP
ES DS SS CS

Just for the completeness, 8086 CPU registers description:

Register Description
AX 16-bit Accumulator register, least and most significant halves (AL and AH respectively) are separately accessible. Most suited for/dedicated to the ALU operations and I/O.
BX 16-bit Base register, least and most significant halves (BL and BH respectively) are separately accessible. Can be used as indirect address register when accessing memory.
CX 16-bit Counter register, least and most significant halves (CL and CH respectively) are separately accessible. Can be used to organize loops and repeat string instructions.
DX 16-bit Data register, least and most significant halves (DL and DH respectively) are separately accessible. Used in some special ALU and I/O operations.
FLAGS 16-bit Flags register. Contains control/status flags.
IP 16-bit Instruction Pointer register. Points to an instruction to be executed.
SP 16-bit Stack Pointer register. Points to the last 16-bit word pushed to the stack.
BP 16-bit Base Pointer register. Can be used as indirect address register when accessing memory (handy for stack memory accesses).
SI 16-bit Source Index register. Can be used as indirect address register when accessing memory (used by string instructions).
DI 16-bit Destination Index register. Can be used as indirect address register when accessing memory (used by string instructions).
CS 16-bit Code Segment register. Selects the 64 KB region of memory, from which instructions are fetched and executed by the CPU.
SS 16-bit Stack Segment register. Selects the 64 KB region of memory, where the CPU stack is located.
DS 16-bit Data Segment register. Selects the 64 KB region of memory, with which most of memory reads and writes are done.
ES 16-bit Extra data Segment register. Selects an additional 64 KB region (additional to one selected by DS) of memory, with which more memory reads and writes can be done. Used by string instructions that work with DI.

Now, having introduced all of the 8086 CPU registers, let's see how we can access memory using them for indirect addressing. What if I want to use say register SI to indirectly address memory? Which segment register will be used by default in this case? The following table below lists all possible addressing modes and the default data segment register used in each of them.

Addressing Mode Address Operand Format Default Segment Register
Direct/Displacement [displacement/offset/label/whatever you call it] DS
Indirect [BX] DS
Indirect [BP] SS
Indirect [SI] DS
Indirect [DI] DS (ES for string instructions)
Indirect+Displacement [BX+displacement] DS
Indirect+Displacement [BP+displacement] SS
Indirect+Displacement [SI+displacement] DS
Indirect+Displacement [DI+displacement] DS
Double Indirect+Displacement [BX][SI]+displacement DS
Double Indirect+Displacement [BX][DI]+displacement DS
Double Indirect+Displacement [BP][SI]+displacement SS
Double Indirect+Displacement [BP][DI]+displacement SS

Notes:

To summarize:

If you need to override the use of the default segment register, you can explicitly specify the segment register to use, like so:
MOV AL, CS:MyTable[BP][SI] or
MOV AL, [CS:BP+SI+MyTable] whichever format is supported by your assembler (TASM/MASM/WASM/NASM/etc).
The prefix, consisting of segment name and colon, overrides the default segment register to the one specified before the colon.

3 Memory Models Employed by Realmode Compilers

The following table summarizes the most common memory models employed by 16-bit realmode 80x86 compilers.

Near pointers (in real mode) are 16-bit pointers, consisting only of a 16-bit offset. The default segment register (CS for code, DS/SS for data/stack) is assumed to be constant. Near pointers are small and quick, need less code to handle.

Far pointers (in real mode) are 32-bit pointers, consisting of the both 16-bit parts, segment and offset. Far pointer increment/decrement usually doesn't affect the segment part of the far pointer. Far pointers are big and slow, need more code to handle.

It is problematic to access objects or arrays bigger than 64 KB with both near and far pointers in HLL (C/C++) compilers because this needs manual implementation of far pointer arithmetics.

Memory Model Code Segment Size, Pointer Type Data Segment Size, Pointer Type Description
Tiny < 64 KB, near < 64 KB, near Use the tiny model for small size applications.
All four segment registers (CS, DS, ES, SS) are set to the same address, so you have a total of 64 KB for all of your code, data, and stack. Near pointers are always used.
Tiny model programs can be compiled to .COM format.
SS=ES=DS=CS, always
Small < 64 KB, near < 64 KB, near Use the small model for average size applications.
The code and data segments are different and don't overlap, so you have 64 KB of code and 64 KB of data and stack. Near pointers are always used.
SS=DS, usually
Medium < 1 MB, far < 64 KB, near The medium model is best for large programs that don't keep much data in memory.
Far pointers are used for code but not for data. As a result, data plus stack are limited to 64 KB, but code can occupy up to 1 MB.
SS=DS, usually
Compact < 64 KB, near < 1 MB, far Use Compact model if your code is small but you need to address a lot of data.
The opposite of the medium model is true for the compact model: far pointers are used for data but not for code; code is then limited to 64 KB, while data has a 1 MB range.
All functions are near by default and all data pointers are far by default.
SS!=DS, usually
Large < 1 MB, far < 1 MB, far Use Large model for very large applications, only.
Far pointers are used for both code and data, giving both a 1 MB range. All functions and data pointers are far by default.
SS!=DS, usually
Huge < 1 MB, far < 1 MB, far Use Huge Model for very large applications only. Far pointers are used for both code and data. Turbo C++ normally limits the size of all data to 64 KB; the huge memory model sets aside that limit, allowing data to occupy more than 64 KB.
The Huge model allows multiple data segments, (each 64 KB in size), up to 1 MB for code, and 64 KB for stack. All functions and data pointers are assumed to be far.
SS!=DS, usually

4 The Two Memory Models and File Formats We'll Use

Let limit us to only the two simplest memory models, which, I believe, are sufficient for most of our 16-bit applications. These memory models are Tiny and Small, both with near pointers, never changing segment registers (except probably when we need this, e.g. when calling BIOS functions and managing memory outside our program). This makes things easy.

4.1 Tiny Memory Model (.COM)

An application compiled with the Tiny memory model option is usually compiled to the .COM file format, the format well-known from the DOS world. This format is basically a raw/flat executable binary w/o any relocation information. Since a program in this format is assumed to always have CS=DS=ES=SS and occupy 64 KB at most (i.e. full 64 KB segment), relocation of such a program in memory is just a simple matter of choosing a segment of memory for the program, loading it there as-is, and setting segment registers to point to this segment before jumping to the entry point of the program.

The good side of the Tiny/.COM model/format is that it's simplest ever for relocation (basically, no relocation needed) and you can always run your program in DOS, in DOS-box of windows, in DOSemu in Linux.

4.2 Small Memory Model (.EXE)

The Small memory model allows making a program whose code and data/stack segments can be both as big as 64 KB because these segments are separate and CS!=DS unlike Tiny/.COM. An application compiled with the Small memory model option is compiled to the .EXE format, also well-known from the DOS world. Applications in this format can also be relocated in the memory and this format (unlike .COM) keeps relocation information inside as well as entry point address (CS:IP) and initial stack configuration (SS:SP), which vary from prgram to program (unlike .COM, where entry point and stack configuration is fixed).

DOS allocates segment(s) of memory to load an .EXE, loads the .EXE image (which goes after the .EXE header portion of the file) and pefrorms address fix ups inside the loaded image using the relocation information from the .EXE file header, thereby completing relocation. After that DOS sets up stack and performs a jump to the entry point.

I won't discuss .EXE relocation with its address fixups here, the explanation of this process can be found elsewhere (e.g. http://www.wotsit.org/). .EXE format is best for 16-bit applications, and with it you can more than with .COM (all other memory models (Medium, Compact, Large, Huge) also naturally compile to .EXE).

5 The Two Compiler Sets We'll Use

As a full-time DOS/windows user, I found the following popular (and now free!) compilers to be very well suited for compiling non-DOS 16-bit 80x86 realmode applications such as bootloaders and various tools:

Compiler Free? 16/32 Bit Assembler Linker Librarian Make Debugger IDE Description
Borland's Turbo C++ 1.01 Free 16-bit TASM
(not free)
Use NASM or other
TLINK TLIB MAKE TD
(not free)
Use ZD86, WD or other
TC An old, but very good 16-bit C/C++ compiler by Borland. Has a very nice IDE with integrated debugger. Unfortunately, the free distribution doesn't include assmbler (TASM) and as result you can't have inline assembly with this compiler. But if you're not afraid of writing external subroutines in assembly, and linking them with the high-level C/C++ code, you can use NASM with Turbo C++. Together they go very well. The free compiler distribution doesn't include a standalone 16-bit debugger (TD), but you may use ZD86, WD or some other.
Open Watcom 1.x C++ compiler Free 16,32-bit WASM WLINK WLIB WMAKE WD IDE
(for win32)
A really good 16/32 bit C/C++ compiler, free, comes with everything (all development tools, documentation, examples, source code), runs under DOS, Windows, OS/2, can be used to compile applications for those OSes. Open Watcom also includes a Fortran compiler.

6 Compiling with Borland/Turbo C/C++ and NASM

6.1 Important details on Borland/Turbo C/C++ compiler

6.2 Calling Conventions and Register Conventions

6.3 Turbo C++ Tools

6.3.1 TCC, C/C++ compiler

Command line:
TCC [options] file[s]

Useful options:
-K- Default char is signed (int and long are signed, likewise char is very often signed)
-1 Generate 80186/80286 instructions
-N- Don't check stack overflow
-k Standard stack frame (arguments referenced thru SS:BP+disp)
-ms Set memory model to small (for .COMs and discussed .EXEs)
-c Compile only (TCC can also link by calling linker and produce executable)
-S Produce assembly output (useful for studying assembler and finding bugs)
-wxxx Warning control
-v Include source level debug information into output object files (useful for TD only)
-y Include source file line number debug information into output object files (useful for TD only)

6.3.2 TLINK, linker

Command line:
TLINK [options] objfiles, executablefile, mapfile, libfiles
Note: you may omit names of executable and/or map file, but commas must remain if there are any file names specified after them (e.g. library files)

Useful options:
/m Generate map file with public symbols (map files are useful to find link stage bugs, e.g. wrong addresses)
/s Include detailed map of segments into the map file (map files are useful to find link stage bugs, e.g. wrong addresses)
/n No default libraries linked
/d Warn if duplicate symbols in libraries
/c Case sensitive processing of symbols, e.g. name!=NAME
/3 Enable 32-bit processing (link 32-bit code if any encountered)
/t Create .COM file (or raw binary, extenstion in executablename must not be .COM for raw binaries or the linker will try to make .COM and thus will fail)
/v Include full debug information into executable file (useful for TD only)

Note #1: to avoid any problems with linking, always specify the object file containing entry point (e.g. c0s.obj or c0t.obj) the first to be linked.

Note #2: default libraries are:

c?.lib Standard C library. ? denotes memory model symbol (s for tiny/small).
math?.lib Standard C library, mathematical functions. ? denotes memory model symbol (s for tiny/small).
emu.lib Emulation of 80x87 floating point unit.
fp87.lib Functions for 80x87 floating point unit.

6.3.3 TLIB, librarian

Command line:
TLIB [/C] [/E] libfile, commands, listfile
Note: commands and listfile are optional

Useful options and commands:
/C Case-sensitive library
+module Add module (object file) to library
-module Remove module (object file) from library
*module Extract module (object file) from library
-+module
+-module
Replace module (object file) in library
-*module
*-module
Extract module (object file) and remove it from library

6.3.4 MAKE

MAKE doesn't need tab characters in the makefile where unix make would normally require.
Note: make sure, you don't have another make available thru PATH environment variable, if you intend to use this particular MAKE.

6.4 NASM, assembler (not TASM, has nothing to do with Borland)

Command line:
NASM [-o outfilename] [-f format] [-l listfilename] [options] filename

Useful options and commands:
-f obj Will generate Intel/OMF .OBJ object outfile (compatible with Borland/Turbo C/C++/Pascal compilers) from the specified file.
-F obj Will generate Borland debug information (useful for TD only).
-D[=value] Predefines a macro.
-U Undefines a macro.

7 Compiling with Open Watcom C/C++

7.1 Important details on Open Watcom C/C++ compiler

8 Downloads

Item URL
Turbo C++ 1.01 http://community.borland.com/museum/
You will have to register at the Borland/Inprise web site to download the compiler.
NASM 0.98+ http://nasm.sourceforge.net/
Open Watcom 1.2 or 1.5 http://www.openwatcom.org/
C/C++ & Fortran compilers.
You will need these files for DOS/realmode/DPMI development:
FileDescription
readme.txt
c_doswin.zip C compiler (DOS & Win16 hosts)
clib_a16.zip C runtime libraries (16-bit, All targets)
clib_d16.zip C runtime libraries (16-bit DOS target)
clib_samples.zip C runtime library sample programs
cm_clib_a16.zip C runtime libraries (16-bit, all targets)
cm_clib_a32.zip C runtime libraries (32-bit, all targets)
cm_clib_d16.zip C runtime libraries (16-bit DOS target)
cm_clib_d32.zip C runtime libraries (32-bit DOS target)
cm_clib_hdr.zip C runtime library header files
cm_core_all.zip Core binaries (All hosts)
cm_core_dos.zip Core binaries (DOS host)
cm_core_doswin.zip Core binaries (DOS & Win hosts)
cm_dbg_all.zip Debugger (All hosts)
cm_dbg_dos.zip Debugger, profiler & sampler (DOS host)
cm_dbg_dosos2.zip Debugger (DOS & OS/2 hosts)
cm_dbg_doswin.zip Debugger (DOS & Win16 hosts)
cm_dbg_misc1.zip Debugger (DOS host or target)
cm_hlp_dos.zip Help files (Dos host)
cm_hlp_win.zip Help files (Win16 host), may be easier to use
cm_ide_all.zip IDE (All hosts)
cm_ide_dos.zip IDE (DOS host)
cm_plib_a16.zip C++ runtime libraries (16-bit, all targets)
cm_plib_a32.zip C++ runtime libraries (32-bit, all targets)
cm_samples.zip Sample programs (all targets)
core_all.zip Core binaries (All hosts)
core_doswin.zip Core binaries (Dos & Win16 hosts)
ext_causeway.zip Causeway DOS extender / DPMI host
ext_dos32a.zip DOS32 CauseWay DOS extender / DPMI host
ext_dos4gw.zip DOS/4GW DOS extender / DPMI host
ext_pmodew.zip PMODE/W DOS extender / DPMI host
hlp_dos.zip Help files (Dos host)
hlp_win.zip Help files (Win16 host), may be easier to use
ide_samples.zip Sample IDE files
misc_src.zip Misc source files and sample programs, include application startup codes
plib_a16.zip C++ runtime libraries (16-bit, all targets)
plib_a32.zip C++ runtime libraries (32-bit, all targets)
plib_hdr.zip C++ runtime library header files
plib_samples.zip C++ runtime library sample programs
open_watcom_1.2.0-src.zipOpen Watcom 1.2 source codes - you may learn more from them
Borland C++ clib src BCpp31CLibSrc.zip
Borland C++ 3.1 standard C/C++ library source codes. Make your own using its API and helper functions.
SDK for Turbo C++ & NASM C16SDKTurboNASM.zip
SDK for Open Watcom C/C++ C16SDKWatcom.zip

9 Work In Progress

The work on this document is in progress. Meanwhile, try learning things from the compiler documentation and source codes provided here (already available).

If you want to contact me regarding this doc or anything else, please post a message on the usenet: news:alt.os.development.
To post, use http://groups.google.com/ or http://news.individual.de/.

Alexei A. Frounze
July the 4th, 2004



Hosted by uCoz