The need for making 16-bit code in is primarily due to the following facts:
So, one would want 16-bit real mode code to run on the 80x86 PC to take advantage of using the BIOS and/or prepare to switch to the 32-bit protected mode of the CPU, like in e.g. bootloaders or OS loaders. For some purposes, pure 16-bit real mode code is enough as well. And you can compile your own ROM BIOS for an embedded x86-based system!
Let's revise realmode 80x86 memory addressing.
The intel 8086 CPU was derived from intel 8080/8085 CPU and inherited 16-bit ideas from it. Although being 16-bit and somewhat compatible with 8080/8085, the 8086 CPU has an enhanced memory addressing mechanism, which isn't condemned to the 16 lines of the address bus, instead the 8086 has a 20 lines-wide address bus. So, unlike 8080/8085 (which could address up to 216 = 65536 bytes of memory, i.e. 64 KB), the 8086 can address up to 220 = 1048576 bytes of memory, i.e. 1 MB.
Now, let's see how intel implemented memory addressing...
An 8080/8085 would access its worth of 64 KB memory using direct and
indirect forms of address specifications in the CPU instructions.
For example:
Instruction | Action |
LDA 2050H
|
Load A (8-bit accumulator register) with byte from memory location 2050H. |
LHLD 0A00H
|
Load HL (16-bit register) with word from memory location 0A00H (byte at 0A00H would go to L (least signigicant half of HL) and byte at 0A01H would go to H (most significant half of HL)). |
MOV A, M
|
Load A (8-bit accumulator register) with byte from memory location specified in the 16-bit register HL (M designates accessing memory indirectly thru the HL register). |
LDAX B
|
Load A (8-bit accumulator register) with byte from memory location specified in the 16-bit register BC. |
Hence, it's very simple with 8080/8085. Either the 16-bit address is a constant value encoded in the CPU instruction and the memory location is accessed directly by using the encoded address (this is direct addressing) or the 16-bit address is contained in a 16-bit register of the CPU (BC or HL in our examples) and this address is read from the register before accessing a memory location by this address (this is indirect addressing).
Now, the 8086 can do the same thing...
Instruction | Action |
MOV AL, [2050H]
|
Load AL (least significant half of 16-bit accumulator register AX) with byte from memory location 2050H. |
MOV BX, [0A00H]
|
Load BX (16-bit register) with word from memory location 0A00H (byte at 0A00H would go to BL (least signigicant half of BX) and byte at 0A01H would go to BH (most significant half of BH)). |
MOV AL, [BX]
|
Load AL (least significant half of 16-bit accumulator register AX) with byte from memory location specified in the 16-bit register BX. |
LODSB
|
Load AL (least significant half of 16-bit accumulator register AX) with byte from memory location specified in the 16-bit register SI. |
Same thing.
Almost...
Do you remember that 8086 has been said to have 20-bit-wide address bus?
You surely do, don't you?
Then how come the four 8086 instructions above specify only 16 bits
of the address?
Where's the leftover, the 4 other bits to make it 20-bit? :)
The fun part is that there's one special address register involved,
the DS register (data segment register). The DS register is also a
16-bit register.
The value of the DS register is concatenated with the 16-bit address
specified in the instruction. The concatenation is a bit tricky.
The DS value is shifted left by 4 binary positions (or, equivalently,
multiplied by 16) and then added to the 16-bit address specified in
the instruction.
Example:
BX=341BH
DS=123AH
MOV AL, [BX]
would load the AL register with a byte from memory
location 123AH * 16 + 341BH = 123A0H + 341BH = 157BBH. 157BBH is the physical
20-bit address that is placed on the address bus so that the memory value
at this address can be transferred to CPU
(or backward, from CPU to memory, with e.g. MOV [BX], AL
).
Really simple.
The address of the form 123AH:341BH is referred as to logical address.
The part that is specified before the colon is referred as to segment
part of the address (or often for shortness just segment).
The part that is specified after the colon is referred as to offset
part of the address (for shortness just offset or sometimes
displacement).
segment:offset pair is a logical address
segment * 16 + offset = physical address |
So, with a constant value of segment (say, constant DS; there can be other segment registers used) but with different values of offset, we can address up to 216 = 65536 bytes = 64 KB of memory starting at the physical address equal to segment * 16. This 64 KB region of memory is referred as to segment. Right, same word is often used to refer to different things and smart guys are known to do it all the time. :) This is important to remember, if you're new to this addressing stuff and its terminology. Hopefully, you'll be able to deduce from context what segment stands for.
By changing the segment value (say DS value) and offset value we can generate all the physical addresses from 0 up to 220-1, but this is not the upper bound. Technically, if we take segment=0FFFFH and offset=0FFFFH, then we'll end up with physical address equal to 10FFEFH, which needs 21 bit to be represented. The 8086 CPU has only 20 address lines, so such an address would lose its most significant bit and wrap around zero and in this example the 8086 CPU would access the byte at physical address 0FFEFH instead of 10FFEFH.
It is important to mention that there are many different logical address
possible such that transform to the same physical address. This
is the effect of the way the segment:offset pair is transformed to
the final, physical, address.
Just an example:
123AH * 16 + 341BH = 123A0H + 341BH = 157BBH
1239H * 16 + 342BH = 12390H + 342BH = 157BBH
143AH * 16 + 141BH = 143A0H + 141BH = 157BBH
...
With introduction of the intel 80286 CPU, the number of address lines
extended to 24, so on the 80286 you can access memory above 1 MB mark
by using the segment:offset pair. Only FFF0H = 65520 bytes (almost 64 KB)
above 1 MB can be accessed this way. But that can only be possible if you
enable the A20 address line (8086 had only A0 thru A19 lines).
For compatibility (with 8086 PCs) reasons, the PC engineers had added a
programmable hardware mechanism on 80286+ based PCs to enable and
disable the A20 address line, so that the address wrap around be possible
just like on the 8086. When the A20 is disabled, both 10FFEFH and 0FFEFH
physical addresses, generated by a 80286+ CPU, would appear to the
memory as physical address 0FFEFH, i.e. the 20th address bit would always be 0.
We won't discuss details of A20 enabling and disabling here because it's
an off-topic.
For now, let's just mention that in the protected mode of the
intel 80286+ and 80386+ CPUs, it's possible to access to much more memory than
1 MB. The 80286 can access up to 16 MB of memory and the 80386 and 80486 can
access up to 4 GB. Pentium class CPUs can access even more. That's it about
protected mode for now.
OK. Let's get back to the segment registers... In fact, the 8086 CPU always uses some segment register to read code/data from memory or write data to memory.
The instructions executed by the 8086 CPU are sequentially read from memory using the CS:IP pair of CPU registers (CS is Code Segment register, IP is Instruction Pointer register). After execution of an instruction has completed, the IP will increment so the next instruction can be feched and executed. IP can also be changed by the near jump, call and return instructions, e.g. the control is transferred within 64 KB segment starting at physical address equal to CS * 16. The far jump, call and return instructions modify both IP and CS and make it possible to transfer control to any part of a program anywhere in the 1 MB of addressable memory. Interrupt and return from interrupt instructions always modify CS and IP, similarly to far call and return instructions.
The 8086 CPU stack is organized with the SS:SP pair of registers (SS is Stack Segment register, SP is Stack Pointer register). SP decrements by 2 before a 16-bit word is stored on the stack, and conversly increments by 2 after a 16-bit word is removed from the stack. All interrupt, call and return instructions affect SP, not affecting SS.
Let alone instruction fetch (with CS:IP) and stack manipulations (with SS:SP)... The interesting thing is how the 8086 CPU transfers data between itself and memory using direct and indirect addressing with registers other than IP and SP. It might look a bit complicated, but here's how it works...
The 8086 CPU registers are:
|
|
|
|
FLAGS | ||||||||||||||||
DI | SI | BP | SP | IP | ||||||||||||||||
ES | DS | SS | CS |
Just for the completeness, 8086 CPU registers description:
Register | Description |
AX | 16-bit Accumulator register, least and most significant halves (AL and AH respectively) are separately accessible. Most suited for/dedicated to the ALU operations and I/O. |
BX | 16-bit Base register, least and most significant halves (BL and BH respectively) are separately accessible. Can be used as indirect address register when accessing memory. |
CX | 16-bit Counter register, least and most significant halves (CL and CH respectively) are separately accessible. Can be used to organize loops and repeat string instructions. |
DX | 16-bit Data register, least and most significant halves (DL and DH respectively) are separately accessible. Used in some special ALU and I/O operations. |
FLAGS | 16-bit Flags register. Contains control/status flags. |
IP | 16-bit Instruction Pointer register. Points to an instruction to be executed. |
SP | 16-bit Stack Pointer register. Points to the last 16-bit word pushed to the stack. |
BP | 16-bit Base Pointer register. Can be used as indirect address register when accessing memory (handy for stack memory accesses). |
SI | 16-bit Source Index register. Can be used as indirect address register when accessing memory (used by string instructions). |
DI | 16-bit Destination Index register. Can be used as indirect address register when accessing memory (used by string instructions). |
CS | 16-bit Code Segment register. Selects the 64 KB region of memory, from which instructions are fetched and executed by the CPU. |
SS | 16-bit Stack Segment register. Selects the 64 KB region of memory, where the CPU stack is located. |
DS | 16-bit Data Segment register. Selects the 64 KB region of memory, with which most of memory reads and writes are done. |
ES | 16-bit Extra data Segment register. Selects an additional 64 KB region (additional to one selected by DS) of memory, with which more memory reads and writes can be done. Used by string instructions that work with DI. |
Now, having introduced all of the 8086 CPU registers, let's see how we can access memory using them for indirect addressing. What if I want to use say register SI to indirectly address memory? Which segment register will be used by default in this case? The following table below lists all possible addressing modes and the default data segment register used in each of them.
Addressing Mode | Address Operand Format | Default Segment Register |
Direct/Displacement | [displacement/offset/label/whatever you call it] | DS |
Indirect | [BX] | DS |
Indirect | [BP] | SS |
Indirect | [SI] | DS |
Indirect | [DI] | DS (ES for string instructions) |
Indirect+Displacement | [BX+displacement] | DS |
Indirect+Displacement | [BP+displacement] | SS |
Indirect+Displacement | [SI+displacement] | DS |
Indirect+Displacement | [DI+displacement] | DS |
Double Indirect+Displacement | [BX][SI]+displacement | DS |
Double Indirect+Displacement | [BX][DI]+displacement | DS |
Double Indirect+Displacement | [BP][SI]+displacement | SS |
Double Indirect+Displacement | [BP][DI]+displacement | SS |
Notes:
To summarize:
If you need to override the use of the default segment register, you can
explicitly specify the segment register to use, like so:
MOV AL, CS:MyTable[BP][SI]
or
MOV AL, [CS:BP+SI+MyTable]
whichever format is
supported by your assembler (TASM/MASM/WASM/NASM/etc).
The prefix, consisting of segment name and colon, overrides the default
segment register to the one specified before the colon.
The following table summarizes the most common memory models employed by 16-bit realmode 80x86 compilers.
Near pointers (in real mode) are 16-bit pointers, consisting only of a 16-bit offset. The default segment register (CS for code, DS/SS for data/stack) is assumed to be constant. Near pointers are small and quick, need less code to handle.
Far pointers (in real mode) are 32-bit pointers, consisting of the both 16-bit parts, segment and offset. Far pointer increment/decrement usually doesn't affect the segment part of the far pointer. Far pointers are big and slow, need more code to handle.
It is problematic to access objects or arrays bigger than 64 KB with both near and far pointers in HLL (C/C++) compilers because this needs manual implementation of far pointer arithmetics.
Memory Model | Code Segment Size, Pointer Type | Data Segment Size, Pointer Type | Description |
Tiny | < 64 KB, near | < 64 KB, near |
Use the tiny model for small size
applications.
All four segment registers (CS, DS, ES, SS) are set to the same address, so you have a total of 64 KB for all of your code, data, and stack. Near pointers are always used. Tiny model programs can be compiled to .COM format. SS=ES=DS=CS, always |
Small | < 64 KB, near | < 64 KB, near |
Use the small model for average size
applications.
The code and data segments are different and don't overlap, so you have 64 KB of code and 64 KB of data and stack. Near pointers are always used. SS=DS, usually |
Medium | < 1 MB, far | < 64 KB, near |
The medium model is best for large programs
that don't keep much data in memory.
Far pointers are used for code but not for data. As a result, data plus stack are limited to 64 KB, but code can occupy up to 1 MB. SS=DS, usually |
Compact | < 64 KB, near | < 1 MB, far |
Use Compact model if your code is small but
you need to address a lot of data.
The opposite of the medium model is true for the compact model: far pointers are used for data but not for code; code is then limited to 64 KB, while data has a 1 MB range. All functions are near by default and all data pointers are far by default. SS!=DS, usually |
Large | < 1 MB, far | < 1 MB, far |
Use Large model for very large applications,
only.
Far pointers are used for both code and data, giving both a 1 MB range. All functions and data pointers are far by default. SS!=DS, usually |
Huge | < 1 MB, far | < 1 MB, far |
Use Huge Model for very large applications
only. Far pointers are used for both code and
data. Turbo C++ normally limits the size of
all data to 64 KB; the huge memory model sets
aside that limit, allowing data to occupy
more than 64 KB.
The Huge model allows multiple data segments, (each 64 KB in size), up to 1 MB for code, and 64 KB for stack. All functions and data pointers are assumed to be far. SS!=DS, usually |
Let limit us to only the two simplest memory models, which, I believe, are sufficient for most of our 16-bit applications. These memory models are Tiny and Small, both with near pointers, never changing segment registers (except probably when we need this, e.g. when calling BIOS functions and managing memory outside our program). This makes things easy.
An application compiled with the Tiny memory model option is usually compiled to the .COM file format, the format well-known from the DOS world. This format is basically a raw/flat executable binary w/o any relocation information. Since a program in this format is assumed to always have CS=DS=ES=SS and occupy 64 KB at most (i.e. full 64 KB segment), relocation of such a program in memory is just a simple matter of choosing a segment of memory for the program, loading it there as-is, and setting segment registers to point to this segment before jumping to the entry point of the program.
The good side of the Tiny/.COM model/format is that it's simplest ever for relocation (basically, no relocation needed) and you can always run your program in DOS, in DOS-box of windows, in DOSemu in Linux.
The Small memory model allows making a program whose code and data/stack segments can be both as big as 64 KB because these segments are separate and CS!=DS unlike Tiny/.COM. An application compiled with the Small memory model option is compiled to the .EXE format, also well-known from the DOS world. Applications in this format can also be relocated in the memory and this format (unlike .COM) keeps relocation information inside as well as entry point address (CS:IP) and initial stack configuration (SS:SP), which vary from prgram to program (unlike .COM, where entry point and stack configuration is fixed).
DOS allocates segment(s) of memory to load an .EXE, loads the .EXE image (which goes after the .EXE header portion of the file) and pefrorms address fix ups inside the loaded image using the relocation information from the .EXE file header, thereby completing relocation. After that DOS sets up stack and performs a jump to the entry point.
I won't discuss .EXE relocation with its address fixups here, the explanation of this process can be found elsewhere (e.g. http://www.wotsit.org/). .EXE format is best for 16-bit applications, and with it you can more than with .COM (all other memory models (Medium, Compact, Large, Huge) also naturally compile to .EXE).
As a full-time DOS/windows user, I found the following popular (and now free!) compilers to be very well suited for compiling non-DOS 16-bit 80x86 realmode applications such as bootloaders and various tools:
Compiler | Free? | 16/32 Bit | Assembler | Linker | Librarian | Make | Debugger | IDE | Description |
Borland's Turbo C++ 1.01 | Free | 16-bit |
TASM (not free) Use NASM or other |
TLINK | TLIB | MAKE |
TD (not free) Use ZD86, WD or other |
TC | An old, but very good 16-bit C/C++ compiler by Borland. Has a very nice IDE with integrated debugger. Unfortunately, the free distribution doesn't include assmbler (TASM) and as result you can't have inline assembly with this compiler. But if you're not afraid of writing external subroutines in assembly, and linking them with the high-level C/C++ code, you can use NASM with Turbo C++. Together they go very well. The free compiler distribution doesn't include a standalone 16-bit debugger (TD), but you may use ZD86, WD or some other. |
Open Watcom 1.x C++ compiler | Free | 16,32-bit | WASM | WLINK | WLIB | WMAKE | WD |
IDE (for win32) |
A really good 16/32 bit C/C++ compiler, free, comes with everything (all development tools, documentation, examples, source code), runs under DOS, Windows, OS/2, can be used to compile applications for those OSes. Open Watcom also includes a Fortran compiler. |
const int cvar = 1; int var = 2; int table[5] = {1, 2, 3, 4, 5}; char* birds[3] = {"robin", "finch", "wren"};All these variables "cvar", "var", "table" and "birds" and strings "robin", "finch", "wren" are put to the _DATA segment.
int var1; int array1[400];
Command line:
TCC [options] file[s]
Useful options:
-K- | Default char is signed (int and long are signed, likewise char is very often signed) |
-1 | Generate 80186/80286 instructions |
-N- | Don't check stack overflow |
-k | Standard stack frame (arguments referenced thru SS:BP+disp) |
-ms | Set memory model to small (for .COMs and discussed .EXEs) |
-c | Compile only (TCC can also link by calling linker and produce executable) |
-S | Produce assembly output (useful for studying assembler and finding bugs) |
-wxxx | Warning control |
-v | Include source level debug information into output object files (useful for TD only) |
-y | Include source file line number debug information into output object files (useful for TD only) |
/m | Generate map file with public symbols (map files are useful to find link stage bugs, e.g. wrong addresses) |
/s | Include detailed map of segments into the map file (map files are useful to find link stage bugs, e.g. wrong addresses) |
/n | No default libraries linked |
/d | Warn if duplicate symbols in libraries |
/c | Case sensitive processing of symbols, e.g. name!=NAME |
/3 | Enable 32-bit processing (link 32-bit code if any encountered) |
/t | Create .COM file (or raw binary, extenstion in executablename must not be .COM for raw binaries or the linker will try to make .COM and thus will fail) |
/v | Include full debug information into executable file (useful for TD only) |
Note #1: to avoid any problems with linking, always specify the object file containing entry point (e.g. c0s.obj or c0t.obj) the first to be linked.
Note #2: default libraries are:
c?.lib | Standard C library. ? denotes memory model symbol (s for tiny/small). |
math?.lib | Standard C library, mathematical functions. ? denotes memory model symbol (s for tiny/small). |
emu.lib | Emulation of 80x87 floating point unit. |
fp87.lib | Functions for 80x87 floating point unit. |
/C | Case-sensitive library |
+module | Add module (object file) to library |
-module | Remove module (object file) from library |
*module | Extract module (object file) from library |
-+module
+-module |
Replace module (object file) in library |
-*module
*-module |
Extract module (object file) and remove it from library |
MAKE doesn't need tab characters in the makefile where unix make would
normally require.
Note: make sure, you don't have another make available thru PATH environment
variable, if you intend to use this particular MAKE.
-f obj | Will generate Intel/OMF .OBJ object outfile (compatible with Borland/Turbo C/C++/Pascal compilers) from the specified file. |
-F obj | Will generate Borland debug information (useful for TD only). |
-D |
Predefines a macro. |
-U |
Undefines a macro. |
char* birds[3] = {"robin", "finch", "wren"}; printf ("Hello world\n");In the above example, the strings "Hello world\n", "robin", "finch", etc. appear in the CONST segment.
const int cvar = 1; int var = 2; int table[5] = {1, 2, 3, 4, 5}; char* birds[3] = {"robin", "finch", "wren"};In the above example, the constant variable "cvar" is placed in the CONST2 segment, "var", "table" and "birds" are placed in the _DATA segment. Finally, the strings "robin", "finch", "wren" are placed in the CONST segment.
int var1; int array1[400];
int cdecl fxn (int x);
will compile for stack-based argument
passing and the additional underscore in the name will appear in front of
the C name, e.g. _fxn. This (cdecl) calling and naming convention is exactly
the same as adopted by the Turbo C++ compiler, see
6.2 Calling Conventions and Register Conventions.
Item | URL | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Turbo C++ 1.01 |
http://community.borland.com/museum/
You will have to register at the Borland/Inprise web site to download the compiler. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
NASM 0.98+ | http://nasm.sourceforge.net/ | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Open Watcom 1.2 or 1.5 |
http://www.openwatcom.org/
C/C++ & Fortran compilers. You will need these files for DOS/realmode/DPMI development:
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Borland C++ clib src |
BCpp31CLibSrc.zip
Borland C++ 3.1 standard C/C++ library source codes. Make your own using its API and helper functions. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
SDK for Turbo C++ & NASM | C16SDKTurboNASM.zip | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
SDK for Open Watcom C/C++ | C16SDKWatcom.zip |
The work on this document is in progress. Meanwhile, try learning things from the compiler documentation and source codes provided here (already available).
If you want to contact me regarding this doc or anything else,
please post a message on the usenet:
news:alt.os.development.
To post, use http://groups.google.com/
or http://news.individual.de/.
Alexei A. Frounze
July the 4th, 2004