Basic SAB80C166 System Planning Considerations

Clock Speeds

The SAB80C166 has a divide by two prescaler so that a 40MHz XTAL or oscillator is required to yield the maximum possible 20MHz CPU clock. The basic unit of time in the C166 core is a single state time, corresponding to 50ns at 20MHz. Most SAB80C166 instructions execute in two state times, i.e. 100ns.

The 'W'-suffixed parts have no divide by two and thus can use a 20MHz clock source directly. Note that these parts must be used with a crystal as they must have a 50% duty cycle clock, which cannot be guaranteed with an oscillator module.

If an oscillator module is used, it must have a rise and fall time of <5ns. Such devices are readily available at a few pounds each (Farnell catalogue page, 867!).

As with other high clockrate CPUs (80C186XL-16 etc.), 40 and 32MHz crystals must be of the parallel resonance type. These can be tricky to find and so a cheap series resonant crystal used with an 74HC04 invertor could be an alternative to a full oscillator module.

The C167CR and C167SR are all of the 'W' type in that they can use a 20MHz crystal. They can also use a 5Mhz crystal and use the on-chip PLL to perform a x4 frequency multiplication up to the usual 20MHz. The PLL is enabled by a pull down resistor on P0L.

Bus Modes

The basic philosophy behind the SAB80C166 bus interface is simplicity: by providing 8 and 16 bit non-multiplexed modes, it is possible to dispense with an address latch and provide just a ROM and RAM to make a working SAB80C166 system. With the C165 and C167, the integral 5 chip selects can make all decoder logic redundant. Thus, despite is 20 fold improvement in performance, a C166 digital design can be simpler than an 8031!

One of the SAB80C166's most useful features is its ability to support two different bus configurations in a single hardware design. Thus whilst the main code and data areas can be 16 bit non-multiplexed with zero waitstates for best speed, slow (and low cost) peripherals such as RTCs can be addressed with, for example, and 8-bit bus with 3 wastates.

This secondary bus mode is controlled by the BUSCON1 and ADDRESEL1 registers which set the mode and address range base address respectively. In the C165 and C167, a further 3 secondary bus regions can be defined, each with its own external chip select (CS) pin for direct connection to peripheral devices' chip enable inputs.

Hint: It is essential when setting up the ADDRSEL and BUSCON registers to make sure that you configure the ADDRESELx before the corresponding BUSCONx! If you do not, the CPU will enable the ADDRSEL for an undefined bus configuration and a crash will ensue! Also note that whilst you can initialise these registers from C, any variables located in an region controlled by them will not be zeroed during startup.a66 as the corresponding chip select will not be low. It is therefore better to put your BUSCON and ADDRSEL set ups just after Keil's SYSCON and BUSCON0 initialisations.

Setting The Bus Mode

C166

This uses two dedicated pins (EBC0/1) to determine ths bus mode coming out of reset. These two pins are effectively written into the BTYP field in the SYSCON register. This default bus mode can be overridden by the user writing into the BTYP field.

C165/7

When coming out of reset, the C165 and C167 read the pattern of user-defined pull-down resistors on the upper at lower words of the data bus to set the default bus mode. Like the C166, the bus mode is transferred into SYSCON where it can be modified by the user. The number of chip selects and the overall address range of the processor are also set via PORT0 pull-down resistors.

Memory Map Recommendations

All C166 derivatives come out of reset at address zero. In the case of the C167 devices, the Chip select 0 (CS0) line goes low to enable the program store (usually EPROM) before the first address is emitted.

In the majority of C166 systems, the CPU uses the bus mode set either by the EBC pins or the data bus pull-down resistors and excution begins from an EPROM. Due to the internal architecture of the C166, the area from 0xc000 to 0xf9ff (0xf5ff) is best used for memory-mapped IO devices. This is because the CPU always sets DPP3 to point at 0xc000 and by using the "SDATA" data type in C, a very fast access can be made to this region. The area up to 0xffff is occupied by the on-chip RAM and sfr block and hence any memory devices placed here will be ignored.

A typical memory map might be:

EPROM:  0x0000-0x7FFF, 16 bit non-multiplexed
RAM:    0x8000-0xBFFF, 16 bit non-multiplexed
IO:     0xC000-0xF9FF, 8 bit muliplexed
RAM:    0x10000-0x3FFFF, 16 bit non-multiplexed

Of course, such a complicated map is not strictly necessary and is only given as an example!

In some systems, the CPU can have RAM at zero: all variants have a bootstrap loader built-in which can receive an application program via the serial port. This is often used to program FLASH EPROM during field program updates.

The bus interface supports hardware waitstates via asynchronous and synchronous READY signals - in the latter case, the CLKOUT pin can provide the synchronisation. In addition, HOLD is provided for use with external DMA controllers.

In cases where waitstates are required, the SYSCON register can be programmed to make the CPU automatically insert the required number of waitstates. Thus unlike the 80C186, for example, no external hardware is required to generate waitstates.

It should be noted that as the PEC pointers can only operate in the bottom 64K, it might be a good idea to place some RAM at 0x8000 in case any sizeable arrays are to be accessed via the PEC.

CPU Throughput And Bus Mode

The overall throughput of the SAB80C166 strongly dependent on the bus mode used. As a 16 bit machine, the 16 bit modes are obviously the most effecient: most istructions are two byte and so a complete instruction can be fetched with just one access across the bus. With the non-multiplexed mode, no latching of the address is required and so the CPU can run to best effect. Branch instructions are 4 byte and so these require two access to fetch.

The FLASH SAB80C166 has 32 bit internal bus so that even branches can fecthed in a single access. This configuration consequently has the highest throughput of all.

In the 8 bit mode, a minumum of two bus accesses are required to fetch any instruction. Thus CPU performance is considerably reduced. However, the fundamentally efficient design of the C166 core means that even in 8 bit modes, the CPU throughput is still considerably higher than comparable CPUs such as the 68000 and 80C186.

With such a high clockspeed, the access time of memory devices is crucial. At 40MHz clock (20MHz CPU), 70ns devices are required for zero waitstate operation. A single waitstate reduces the access time to 120ns but can reduce CPU performance by 30%. To allow the use of low cost 100ns EPROMS however, is best to reduce the clock speed to 32MHz and loose 20% performance rather than run with waitstates at 40MHz.

The change in CPU performance with bus mode, as observed on an embedded C test program is summarised below:

Bus Mode          Run Time (ms)     Normalised
----------------------------------------------
16 bit nonmux     19.355            1.00
16 bit mux        24.424            1.26
8 bit nonmux      37.328            1.92
8 bit mux         46.545            2.40

80C52 12mhz 350.000 18.00

Notes: Taken at 20MHz, 0 wait-states on a CB-step SAB80C166. The test program did not include any long operations so the performance advantage over the 8052 is reduced. If your application uses int or long maths, then expect > 20 times advantage.

Implications Of Bus Mode/Trading Port Pins For IO

The fastest bus mode, 16 bit non-multiplexed is also inevitably the greediest in terms of processor pins. In this configuration, both port 0 and 1 are solely concerned with the data and address bus, in that order. This allows a very simple memory system as the address pins of the EPROM are wired to P1 and the data pins to P0. As this is not multiplexed, no 74x573 latch is required, although the ALE pin will continue to operate.

If the number of IO pins is critical, is possible to free up the 16 pins of P1 by going to a multiplexed bus. The 16 bit variant is to be preferred for the reasons given above. Now, P0 will emit the address followed by ALE going low to latch into the 74x73. The data is then emitted to complete the access. This will slow the CPU down somewhat but by careful software design, the effects can be minimised. Steps to take might be to place all frequently accessed data into the on-chip ("idata") RAM when the multiplexing will have no effect.

The 16 bit modes do, of course, require 16 bit memory devices or at least HIGH and LOW 8 bit memories. In the latter case, the BHE (byte-high enable) and A0 pins can be used to select between high and low devices.

If cost is important, the 8 bit non-multiplexed mode allows simple (and single) 8-bit devices to be used without an address latch. This mode is very popular - remember, the basic CPU throughput is so high on SAB80C166 family devices that some performance can be sacrificed to reduce cost.

Finally, the 8-bit multiplexed mode is really only provided for accessing 8051-type peripheral devices. Although the performance loss is around 200%, a C166 running at 12MHz will still outperform an 8031 device by a factor of 5-10, especially if 16 and 32 bit operations are frequent.


[Back] [Contents] [Next]