<html> <head> <title> Project BB - PI and IO Bus Specification </title> </head> <body bgcolor="#ffffff" text="#000000" link="#004868" vlink="#986424" alink="#00ffff"> <table width="100%" cellpadding=2 cellspacing=0 border=0> <tr> <td bgcolor="#e0e0e0"> Project BB - PI and IO Bus Specification </td> <td align=right bgcolor="#f0c0c0" width="20%"> <font color=red> <b>Broad<i>On</i> confidential</b> </font> </td> </tr> </table> <p> <b><u> Overview </u></b> <p> The N64 PI was the module that converts the CBUS/DBUS interface into the AD16 peripheral bus. The main function of the AD16 bus was to connect burst storage devices (BSD) and other external io devices located on a cartridge to the RCP chip. Two domains (address spaces) allowed different hardware timing parameters to accomodate a variety of devices and access speeds, from EEPROM, SRAM, FLASH to burst ROM. The PI dma controller allowed for fast bulk data movement. <p> In BB, the cartridge has been replaced by NAND flash devices on a removable memory module and other on-board io devices. The memory module contains one soldered down NAND flash and two slots for user expansion. Conceptually, the NAND flash on the memory module is considered installed memory and not removable storage. Hardware has to implement logic that maps regions of these flash devices into what has been the cartridge address space. A direct access mechanism is also needed for programming and filesystem support. <p> The new PI implements a new io bus that connects the flash memory module, the local button inputs and the ide devices to a new PI module. The io bus protocol is based on the IDE protocol. <p> <b><u> PI Architecture </u></b> <p> The PI consists of system side controllers, data buffers and device side controllers. A pio and dma controller control the system side of the data buffers and the PI registers. The device side is comprised of the io bus controller, a NAND flash controller, an AES decryption controller and an ATB (address translation) controller. The dma controller is backwards compatible with the N64 design. Config and timing registers in the device controllers can accomodate for a variety of external devices and access speeds. The AES controller supports decryption only. The NAND flash controller implements on-the-fly ECC based on the smart media standard. Single bit detection and correction is done in hardware. An internal SRAM is the central data exchange point of all the units. The SRAM arbiter controls the individual units based on the PI register settings. The DMA Processor splits a traditional dma request into individual controls for each of the units. <p> <img src="pi-block.gif" width=640 height=450 border=0> <p> The PI responds to cbus requests falling into the 0x046x_xxxx register space and the io spaces 1 and 2, see <a href="addr-space.html"> BB Address Spaces </a>. Io spaces 1 and 2 are just different windows into the same io address space. They were kept for backwards compatibility with the N64 address map. <p> <b><u> IO Bus Interface </u></b> <p> The IO bus is a 16-bit wide multiplexed address/data bus. Two sets of control signals allow for a direct interface to NAND flash and to a generic 16-bit IDE devices. The generic controls support up to four pio devices and one dma device. Programmable timing parameters support a wide range of access speeds. Below table lists all the io bus signals and their function. Separate NAND flash controls are required for hot-plug support. <dl> <dd> <table cellspacing=2 cellpadding=2 border=1> <tr> <td> Signal </td> <td> Type </td> <td> Description </td> </tr> <th colspan=3> generic io bus controls </th> <tr> <td> IORST </td> <td> !O </td> <td> io system reset; <br> asserted automatically during chip reset (sysrst); <br> full control by software through PI_IDE_CONF register; <br> </td> </tr> <tr> <td> IOAD[15:0] </td> <td> IO </td> <td> shared address/data bus; <br> NAND flash devices connect to IOAD[15:8]; <br> button inputs connect to IOAD[15:0] through a driver; <br> ide devices connect to IOAD[15:0]; <br> </td> </tr> <tr> <td> IOALE </td> <td> O </td> <td> io address latch enable; <br> signals an address phase on IOAD[15:0]; <br> not active for flash and button accesses; <br> </td> </tr> <tr> <td> IOR </td> <td> !O </td> <td> io read pulse; <br> active-low signal controlling a pio read with IOCS[x]; <br> active-low signal controlling a dma read with IOACK; <br> setup, active and release times are programmable; <br> not active for flash and button accesses; <br> </td> </tr> <tr> <td> IOW </td> <td> !O </td> <td> io write pulse; <br> active-low signal controlling a pio write with IOCS[x]; <br> active-low signal controlling a dma write with IOACK; <br> setup, active and release times are programmable; <br> not active for flash and button accesses; <br> </td> </tr> <tr> <td> IOCS[3:0] </td> <td> !O </td> <td> io device selects; <br> signaling a pio read or write to an io device; <br> never active at the same time with IOACK; <br> IOCS[2] is reserved for button driver enable; <br> IOCS[3] is reserved for remote debug support; <br> not active for flash accesses; <br> </td> </tr> <tr> <td> IOREQ </td> <td> I </td> <td> io dma request; <br> io device requests a dma cycle to move data; <br> </td> </tr> <tr> <td> IOACK </td> <td> !O </td> <td> io dma acknowledge; <br> signaling a dma read or write to an io device; <br> </td> </tr> <tr> <td> IOINTR </td> <td> I </td> <td> io device interrupt; <br> external device signals interrupt; <br> </td> </tr> <tr> <td> GPIO[3:0] </td> <td> IO </td> <td> general purpose io; <br> inputs or outputs depending on software configuration; <br> </td> </tr> <th colspan=3> NAND flash controls </th> <tr> <td> FCE[3:0] </td> <td> !O </td> <td> flash chip selects </td> </tr> <tr> <td> FALE </td> <td> O </td> <td> flash address latch enable </td> </tr> <tr> <td> FCLE </td> <td> O </td> <td> flash command latch enable </td> </tr> <tr> <td> FRE </td> <td> !O </td> <td> flash read pulse </td> </tr> <tr> <td> FWE </td> <td> !O </td> <td> flash write pulse </td> </tr> <tr> <td> FWP </td> <td> !O </td> <td> flash write-protect signal </td> </tr> <tr> <td> FRYBY </td> <td> !I </td> <td> flash ready/busy; <br> needs external pullup; <br> </td> </tr> <tr> <td> MD </td> <td> !I </td> <td> flash module detect; <br> pulled high on chip side; <br> driven low by module; <br> </td> </tr> </table> </dl> <p> See the <a href="io-spec.html">IO Bus Specification</a> for details on the io bus physical and logical specifications. <p> <b><u> Example of External IO Bus Connections </u></b> <p> The IO bus has been designed so that the BB player can be implemented with a minimal number of external components. Below diagram shows the external connections for the BB player. The BB player connects three NAND flash devices on a memory module and the local button input port to the io bus. The GPIO pins are used to control LEDs. <p> <img src="pi-extdev.gif" width=640 height=240 border=0> <p> <b><u> N64 Compatibility Issues </u></b> <p> The BB hardware must implement the same io address spaces that the cartridge devices used and map them to NAND flash transparently. Regions of the NAND flash space must be remapped into a single contiguous space to give the appearance of game ROM. Additionally, the hardware has to deal with bad flash blocks, decryption and authentication on the fly. All the details of flash read access must be hidden, the increased initial latency of NAND flash in particular. Writing to the flash is considered a system operation and must go through a new API which deals with some sort of flash file system. Below diagram illustrates an example mapping of four NAND regions of different sizes into a contiguous 10MByte io space 1 region. <img src="pi-remap.gif" width=640 height=430 border=0> <p> <b><u> PI Buffer RAM </u></b> <p> The buffer sram is the central exchange point of all data flowing through the pi. The buffer must be at least the width of the dbus to satisfy dbus dma requirements. The actual width is more than the 64 dbus bits because the atb entries require a few more bits. The minimal write quantity is half the width of the buffer. This means that the minimal write width for pio accesses is 32 bits. <p> <table cellpadding=2 cellspacing=2 border=1> <tr> <td> CPU Address </td> <td> SRAM Index </td> <td> Size </td> <td> Function </td> </tr> <tr> <td> 0x0461_0000 ... 0x0461_01ff <br> 0x0461_0400 ... 0x0461_040f </td> <td> 0..63 <br> 128..129 </td> <td> 512+16 bytes </td> <td> Data Buffer 0 <br> flash controller uses all 528 bytes <br> ide controller uses the first 512 bytes <br> aes controller decrypts data in place <br> </td> </tr> <tr> <td> 0x0461_0200 ... 0x0461_03ff <br> 0x0461_0410 ... 0x0461_041f </td> <td> 64..127 <br> 130..131 </td> <td> 512+16 bytes </td> <td> Data Buffer 1 <br> flash controller uses all 528 bytes <br> ide controller uses the first 512 bytes <br> aes controller decrypts data in place <br> </td> </tr> <tr> <td> 0x0461_0420 ... 0x461_04cf </td> <td> 132..153 </td> <td> 44 x32 bits </td> <td> AES expanded key <br> read by aes controller <br> </td> </tr> <tr> <td> 0x0461_04d0 ... 0x461_04ff </td> <td> 154..155 </td> <td> 4 x32 bits </td> <td> AES CBC init vector <br> read for references to device address 0 <br> </td> </tr> <tr> <td> 0x0461_0500 ... 0x461_07ff </td> <td> 160..255 </td> <td> 96x2 x40 bits </td> <td> ATB entries <br> read only by atb controller <br> two entries per word are looked up in parallel <br> </td> </tr> </table> <p> <b><u> Compatible PI Registers </u></b> <p> The PI can process only one request at a time, either a dma through the dma control registers, or a pio through the cartrige spaces. In N64, the cartrige spaces were divided into domain 1 and 2 to support different device timings. In BB, no such division is neccessary. Hence, there is no timing difference and both domains behave the same. The traditional dma forces 2-byte aligment on PI_DRAM_ADDR and PI_DEV_ADDR. However, bit 0 is writable in both registers. <p> <table cellpadding=2 cellspacing=2 border=1> <tr> <td> Name </td> <td> Address </td> <td> Data </td> <td> Read/Write </td> <td> Reset </td> <td> N64 </td> <td> Description </td> </tr> <tr> <td> PI_DRAM_ADDR </td> <td> 0x0460_0000 </td> <td> [25:0] </td> <td> RW </td> <td> x </td> <td> y </td> <td> DRAM dma address; <br> limited to lower 16MB in x36 space; <br> limited to lower 32MB in x64 space; <br> dma ignores bit 0 for 2-byte aligment; <br> register aligns to first 0x80 boundary, <br> then increments by 0x80 for each burst; </td> </tr> <tr> <td> PI_DEV_ADDR </td> <td> 0x0460_0004 </td> <td> [29:0] </td> <td> RW </td> <td> x </td> <td> y </td> <td> flash device address; <br> supports up to 1GB of flash space; <br> used directly by flash controller; <br> dma ignores bit 0 for 2-byte aligment; <br> passed through address translation for dma; <br> </td> </tr> <tr> <td> PI_DMA_READ </td> <td> 0x0460_0008 </td> <td> [23:0] </td> <td> RW </td> <td> x </td> <td> y </td> <td> dma memory -> flash; <br> write sets dma length and traps; <br> register must be written with size - 1; <br> writes to the flash module with the pi dma are not supported; it is up to software to emulate the desired function; PI_DRAM_ADDR and PI_DEV_ADDR contain the respective addresses; PI_DMA_READ contains size + PI_DRAM_ADDR[2:0]; </td> </tr> <tr> <td> PI_DMA_WRITE </td> <td> 0x0460_000c </td> <td> [23:0] </td> <td> RW </td> <td> x </td> <td> y </td> <td> dma flash -> memory; <br> write sets dma length and starts dma; <br> register must be written with size - 1; <br> a length of 0 completes immediatedly without moving data; PI_DMA_WRITE is adjusted to size + PI_DRAM_ADDR[2:0], then decrements by 8 for each doubleword moved to memory; </td> </tr> <tr> <td> PI_DMA_STATUS </td> <td> 0x0460_0010 </td> <td> [31:2] </td> <td> W </td> <td> - </td> <td> - </td> <td> write data are ignored; </td> </tr> <tr> <td> </td> <td> </td> <td> [1] </td> <td> W CLR_INTR </td> <td> - </td> <td> y </td> <td> writing 1 clears the pi dma interrupt; </td> </tr> <tr> <td> </td> <td> </td> <td> [0] </td> <td> W DMA_STOP </td> <td> - </td> <td> y </td> <td> writing 1 aborts the current operation and resets the dma; </td> </tr> <tr> <td> </td> <td> </td> <td> [31:4] </td> <td> R </td> <td> 0 </td> <td> y </td> <td> reads return 0; </td> </tr> <tr> <td> </td> <td> </td> <td> [3] </td> <td> R INTR </td> <td> 0 </td> <td> y </td> <td> 1 = interrupt; <br> set upon completion of pi dma; <br> not set on abort of dma operation; <br> cleared by writing bit 1; <br> </td> </tr> <tr> <td> </td> <td> </td> <td> [2] </td> <td> R ERROR </td> <td> 0 </td> <td> y </td> <td> 1 = error; <br> set upon new request to a busy pi dma; <br> cleared by writing bit 0; <br> this bit is does not prevent another dma after the previously busy dma has finished; </td> </tr> <tr> <td> </td> <td> </td> <td> [1] </td> <td> R IO_BUSY </td> <td> 0 </td> <td> y </td> <td> 1 = io busy; <br> set by a pio access to cartrige space; <br> cleared when pi is done; <br> </td> </tr> <tr> <td> </td> <td> </td> <td> [0] </td> <td> R DMA_BUSY </td> <td> 0 </td> <td> y </td> <td> 1 = dma busy; <br> set by write to PI_DMA_READ, PI_DMA_WRITE, PI_BREAD or PI_BWRITE that start a dma; cleared when dma is complete; </td> </tr> <tr> <td> PI_DOM1_LAT </td> <td> 0x0460_0014 </td> <td> [7:0] </td> <td> RW </td> <td> 0x00 </td> <td> y </td> <td> was domain 1 latency timing register; <br> reserved, writes ignored, reads return 0; </td> </tr> <tr> <td> PI_DOM1_PWD </td> <td> 0x0460_0018 </td> <td> [7:0] </td> <td> RW </td> <td> 0x00 </td> <td> y </td> <td> was domain 1 pulse width timing register; <br> now reserved, writes ignored, reads return 0; </td> </tr> <tr> <td> PI_DOM1_PGS </td> <td> 0x0460_001c </td> <td> [3:0] </td> <td> RW </td> <td> 0 </td> <td> y </td> <td> was domain 1 page size register; <br> now reserved, writes ignored, reads return 0; </td> </tr> <tr> <td> PI_DOM1_RLS </td> <td> 0x0460_0020 </td> <td> [1:0] </td> <td> RW </td> <td> 0x0 </td> <td> y </td> <td> was domain 1 release timing register; <br> reserved, writes ignored, reads return 0; </td> </tr> <tr> <td> PI_DOM2_LAT </td> <td> 0x0460_0024 </td> <td> [7:0] </td> <td> RW </td> <td> 0x00 </td> <td> y </td> <td> was domain 2 latency timing register; <br> reserved, writes ignored, reads return 0; </td> </tr> <tr> <td> PI_DOM2_PWD </td> <td> 0x0460_0028 </td> <td> [7:0] </td> <td> RW </td> <td> 0x00 </td> <td> y </td> <td> was domain 2 pulse width timing register; <br> reserved, writes ignored, reads return 0; </td> </tr> <tr> <td> PI_DOM2_PGS </td> <td> 0x0460_002c </td> <td> [3:0] </td> <td> RW </td> <td> 0 </td> <td> y </td> <td> was domain 2 page size register; <br> reserved, writes ignored, reads return 0; </td> </tr> <tr> <td> PI_DOM2_RLS </td> <td> 0x0460_0030 </td> <td> [1:0] </td> <td> RW </td> <td> 0x0 </td> <td> y </td> <td> was domain 2 release timing register; <br> reserved, writes ignored, reads return 0; </td> </tr> <tr> <td> PI_IO_READ </td> <td> 0x0460_0034 </td> <td> [31:0] </td> <td> RW </td> <td> x </td> <td> y </td> <td> XXX </td> </tr> <tr> <td> PI_IO_WRITE </td> <td> 0x0460_0038 </td> <td> [31:0] </td> <td> RW </td> <td> x </td> <td> y </td> <td> XXX </td> </tr> </table> <p> <b><u> AES Controller </u></b> <p> The AES decryption core strictly operates on data in the PI buffer. The key length is fixed at 128 bits. Software has to expand the 128-bit key into 44 32-bit words and store them in the buffer. The AES controller can be used independently of the other units to decrypt any data. Data for decryption can be stored anywhere in the buffer, except in the area of the expanded key or areas used by other active units. Either pio accesses or the system side dma can be used to get data in and out of the pi buffer. All data must be aligned to 16-byte (128-bit) boundaries. Completion of an operation can be determined by polling or by an interrupt. The aes controller has the highest arbitration priority after the system dma. The decryption of one 128-bit word takes 52 sysclk cycles, plus clocks stolen by the system dma. There is no hardware timeout mechanism and no errors. The dma processor automatically controls the AES controller for the traditional pi dma. The new PI registers below are selectively accessible. <p> <table cellpadding=2 cellspacing=2 border=1> <tr> <td> Name </td> <td> Address </td> <td> Data </td> <td> Read/Write </td> <td> Reset </td> <td> N64 </td> <td> Description </td> </tr> <tr> <td> PI_AES_CTRL </td> <td> 0x0460_0050 </td> <td> [31] </td> <td> W START </td> <td> 0 </td> <td> n </td> <td> AES start/stop bit; <br> writing 1 starts a new AES operation; <br> writing 0 stops the current operation; <br> </td> </tr> <tr> <td> </td> <td> </td> <td> [31] </td> <td> R BUSY </td> <td> 0 </td> <td> n </td> <td> AES busy bit; <br> 0=idle, 1=busy; <br> </td> </tr> <tr> <td> </td> <td> </td> <td> [30] </td> <td> RW INTR </td> <td> 0 </td> <td> n </td> <td> AES interrupt flag <br> writing 1 enables interrupt for this request; <br> read bit returns status of AES interrupt; <br> interrupt is cleared by START=0 and INTR=0; <br> </td> </tr> <tr> <td> </td> <td> </td> <td> [21:16] </td> <td> RW SIZE </td> <td> x </td> <td> n </td> <td> data size in 16-byte blocks - 1; <br> size of 0 means 16 bytes; <br> </td> </tr> </tr> <tr> <td> </td> <td> </td> <td> [15:9] </td> <td> RW DA </td> <td> x </td> <td> n </td> <td> 16-byte word offset of data in pi buffer; <br> aligned so that pi buffer index can be written; <br> data must be aligned on 16-byte boundaries; <br> </td> </tr> <tr> <td> </td> <td> </td> <td> [7:1] </td> <td> RW IA </td> <td> x </td> <td> n </td> <td> 16-byte word offset of CBC init vector; <br> aligned so that pi buffer index can be written; <br> vector must be aligned on 16-byte boundaries </td> </tr> <tr> <td> </td> <td> </td> <td> [0] </td> <td> RW HC </td> <td> x </td> <td> n </td> <td> cbc hardware chaining; </br> 0 use init vector for aes operation; <br> 1 use state left in aes core for block chaining; </td> </tr> <tr> <td> PI_AES_EKEY </td> <td> 0x0461_0420 <br> ... <br> 0x0461_04cf </td> <td> [31:0] </td> <td> RW </td> <td> x </td> <td> n </td> <td> expanded 128-bit AES key; <br> 44 words in length; <br> fixed at pi buffer index 132..153; <br> must be initialized by software; <br> </td> </tr> <tr> <td> PI_AES_INIT </td> <td> 0x0461_04d0 <br> ... <br> 0x0461_04df </td> <td> [31:0] </td> <td> RW </td> <td> x </td> <td> n </td> <td> AES CBC init vector; <br> only for pi dma references to device address 0; <br> fixed at pi buffer index 154..155; <br> </td> </tr> </table> <p> Before releasing the pi to a game apllication, software has to setup the following bits in the AES controller; <pre> PI_AES_CTRL START = 0; INTR = 0; SIZE, DA, IA, HC are don't care; PI_AES_EKEY expanded key; PI_AES_INIT cbc init vector; </pre> <p> <b><u> Flash Controller </u></b> <p> The flash controller moves data between the flash interfaces and the pi data buffers. Up to four flash devices are supported. The flash controller is triggred by writes to the flash control register PI_FLASH_CTRL or the dma processor. Flash operations are limited to one block, ie. there is no automatic crossing of pages for a continuation of the transfer. The interface has been kept generic for support of new NAND flash commands in future devices. <br> The new PI registers below are selectively accessible. <p> <table cellpadding=2 cellspacing=2 border=1> <tr> <td> Name </td> <td> Address </td> <td> Data </td> <td> Read/Write </td> <td> Reset </td> <td> N64 </td> <td> Description </td> </tr> <tr> <td> PI_FLASH_ADDR </td> <td> 0x0460_0070 </td> <td> [29:0] </td> <td> RW ADDR </td> <td> x </td> <td> n </td> <td> address bits sent out in address phases; </td> </tr> <tr> <td> PI_FLASH_CTRL </td> <td> 0x0460_0048 </td> <td> [31] </td> <td> W START </td> <td> 0 </td> <td> n </td> <td> flash start/stop bit; <br> writing 1 starts a new flash operation; <br> writing 0 stops the current operation; <br> </td> </tr> <tr> <td> </td> <td> </td> <td> [31] </td> <td> R BUSY </td> <td> 0 </td> <td> n </td> <td> flash busy bit; <br> 0=idle, 1=busy; <br> </td> </tr> <tr> <td> </td> <td> </td> <td> [30] </td> <td> RW INTR </td> <td> 0 </td> <td> n </td> <td> flash interrupt flag; <br> writing 1 enables interrupt for this request; <br> read bit returns status of flash interrupt; <br> interrupt is cleared by START=0 and INTR=0; <br> </td> </tr> <tr> <td> </td> <td> </td> <td> [29] </td> <td> RW WDPH </td> <td> x </td> <td> n </td> <td> 1 write data phase; </td> </td> </tr> <tr> <td> </td> <td> </td> <td> [28] </td> <td> RW RDPH </td> <td> x </td> <td> n </td> <td> 1 read data phase; </td> </tr> <tr> <td> </td> <td> </td> <td> [27:24] </td> <td> RW ADPH </td> <td> x </td> <td> n </td> <td> address phase selects; <br> depends on flash device; <br> [27] enables the A25.. phase; <br> [26] enables the A17..24 phase; <br> [25] enables the A9..16 phase; <br> [24] enables the A0..7 phase; <br> </td> </tr> <tr> <td> </td> <td> </td> <td> [23:16] </td> <td> RW CMD </td> <td> x </td> <td> n </td> <td> flash command byte; <br> 0x00 read data/spare; <br> 0x10 programm commit; <br> 0x50 read spare only; <br> 0x60 block erase; <br> 0x70 read status; <br> 0x80 page program; <br> 0x90 read ID; <br> 0xd0 erase commit; <br> 0xFF reset; <br> PI_FLASH_ADDR[8] is ORed with cmd[0] to form the command that is sent to the flash; <br> </td> </tr> </tr> <tr> <td> </td> <td> </td> <td> [15] </td> <td> RW WRDY </td> <td> x </td> <td> n </td> <td> wait ready; <br> 1 require assertion and wait for deassertion of RY/BY; <br> </td> </tr> <tr> <td> </td> <td> </td> <td> [14] </td> <td> RW BUF </td> <td> x </td> <td> n </td> <td> buffer to use; <br> 0 = use buffer 0; <br> 1 = use buffer 1; <br> </td> </tr> <tr> <td> </td> <td> </td> <td> [13:12] </td> <td> RW DEV </td> <td> x </td> <td> n </td> <td> flash device id; <br> 0..2 on flash module; <br> </td> </tr> <tr> <td> </td> <td> </td> <td> [11] </td> <td> W ECC </td> <td> x </td> <td> n </td> <td> enable ecc and single-bit error correction; </td> </tr> <tr> <td> </td> <td> </td> <td> [11] </td> <td> R SBERR </td> <td> x </td> <td> n </td> <td> single-bit error detected; <br> cleared on start of new flash operation; <br> </td> </tr> <tr> <td> </td> <td> </td> <td> [10] </td> <td> W MCMD </td> <td> x </td> <td> n </td> <td> multi-cycle command; <br> set for first phase of multi-cycle commands, <br> such as block erase and program; <br> next command must be to same device; <br> must be 0 for last of the multi-cycle commands; <br> </td> </tr> <tr> <td> </td> <td> </td> <td> [10] </td> <td> R DBERR </td> <td> x </td> <td> n </td> <td> double-bit error detected; <br> cleared on start of new flash operation; <br> </td> </tr> <tr> <td> </td> <td> </td> <td> [9:0] </td> <td> RW SIZE </td> <td> x </td> <td> n </td> <td> size of data phase in bytes; <br> this depends on the command itself; <br> </td> </tr> <tr> <td> PI_FLASH_CONF </td> <td> 0x0460_004c </td> <td> [31] </td> <td> RW </td> <td> 1 </td> <td> n </td> <td> 1 flash write-protected; </td> </tr> <tr> <td> </td> <td> </td> <td> [30..28] </td> <td> RW </td> <td> 7 </td> <td> n </td> <td> end of cycle time - 2; </td> </tr> <tr> <td> </td> <td> </td> <td> [26..24] </td> <td> RW </td> <td> 5 </td> <td> n </td> <td> read data sample time - 1; </td> </tr> <tr> <td> </td> <td> </td> <td> [23..16] </td> <td> RW </td> <td> 0x3e </td> <td> n </td> <td> RE active times; </td> </tr> <tr> <td> </td> <td> </td> <td> [15..8] </td> <td> RW </td> <td> 0x3e </td> <td> n </td> <td> WE active times; </td> </tr> <tr> <td> </td> <td> </td> <td> [7..0] </td> <td> RW </td> <td> 0xff </td> <td> n </td> <td> CLE/ALE active times; </td> </tr> </table> <p> The address loaded into the flash for each command is a byte address. The number of address phases depends on the device and the command issued. The ADPH bits have to be programmed accordingly. An operation can start anywhere within a page. The lower 9 bits of the page offset are also used as the buffer index. All commands stop at the page boundary or when the requested amount of data has been moved, whichever comes first. For example, if a read is issued to flash address 0x1f0 with size of 100, bytes 0x1f0..0x21f will be read from the flash page and put into the specified pi buffer at offset 0x1f0. <p> The PI_FLASH_CONF register controls the timing of flash control signals. Each flash command results in a number of cycles, depending on the command. For example, a read command (0x00) results in cycles to: latch the command; latch the address (up to 4 cycles); input data (as many cycles as bytes of data to read). PI_FLASH_CONF controls the intra-cycle timing relative to the time the flash controller is granted access to the io bus. The duration of the grant will be PI_FLASH_CONF[30:28] + 2 sysclk periods, and the minimum time between grants is 1 sysclk. At the time bus access is granted, the flash control signals RE, WE, CLE, and ALE will all be deasserted. One sysclk later, these signals will be controlled by their respective bits in PI_FLASH_CONF (i.e., [23:16] for RE, [15:8] for WE, and [7:0] for ALE/CLE). The bit patterns determine if the signal is asserted (bit value of 1) or not (bit value 0). The bits are applied from LSB to MSB, and each bit determines the output during one sysclk period. If the grant time is less than the time covered by these bit patterns, these signals will also be deasserted 1 sysclk after the bus access grant signal is deasserted. For cycles where data is read, sampling occurs at the sysclk rising edge PI_FLASH_CONF[26:24] + 2 sysclks after the grant. An example for the flash command 0x90, manufacture and device ID read, is depicted below with PI_FLASH_CONF = 0x430f0f3f. <p><IMG src="flash_conf.png" align=center> <br> <p> Before releasing the pi to a game apllication, software has to setup the following bits in the flash controller; <pre> PI_FLASH_CTRL START = 0; INTR = 0; WDPH = 0; RDPH = 1; ADPH for read, depending on device; CMD = 0x00; WRDY = 1; ECC = 1; MCMD = 0; SIZE = 0x3ff; PI_FLASH_CONF for slowest device on module; </pre> <p> <b><u> Address Translation Block </u></b> <p> The atb is used for the traditional pi dma. It maps virtual device addresses into physical addresses before any accesses by the flash controller are started. The atb controller implements a binary search which requires at most eight lookups to find the corresponding mapping. Two lookups are done in parallel for each word read from the pi buffer. The atb is located in a fixed area of the pi buffer. The cpu cannot do pio accesses of more than 32 bits. For this reason, the upper bits are taken from the PI_ATBU register on writes and show up in the PI_BUF_ATBU address space for reads. <p> <table cellpadding=2 cellspacing=2 border=1> <tr> <td> Name </td> <td> Address </td> <td> Data </td> <td> Read/Write </td> <td> Reset </td> <td> N64 </td> <td> Description </td> </tr> <tr> <td> PI_ATBU </td> <td> 0x0460_0040 </td> <td> [8:0] </td> <td> WO ATBU </td> <td> x </td> <td> n </td> <td> upper bits of atb entry <br> written to upper bits of pi buffer <br> </td> </tr> <tr> <td> PI_BUF_ATB </td> <td> 0x0461_0500 <br> ... <br> 0x0461_07ff </td> <td> [31:0] </td> <td> RW </td> <td> x </td> <td> n </td> <td> acess to lower 32 bits of atb entries </td> </tr> <tr> <td> PI_BUF_ATBU </td> <td> 0x0461_0800 <br> ... <br> 0x0461_0fff </td> <td> [24:16] </td> <td> RO </td> <td> x </td> <td> n </td> <td> returns upper atb bits of entry 0 </td> </tr> <tr> <td> </td> <td> </td> <td> [8:0] </td> <td> RO </td> <td> x </td> <td> n </td> <td> returns upper atb bits of entry 1 </td> </tr> </table> <p> The binary search requires that the entries in the atb buffer are sorted by increasing virtual addresses and that the entire atb buffer contains valid entries. The virtual address of each atb entry must be unique. Behavior is undefiend if multiple entries contain the same virtual address. Atb searches are only done when the virtual block address changes. <p> <center> <table cellpadding=2 cellspacing=2 border=1> <th colspan=3> ATB entry </th> <tr> <td> Bits </td> <td> Mnemo </td> <td> Function </td> </tr> <tr> <td> [40] </td> <td> IV </td> <td> pseudo atb entry for iv; <br> 1 use iv from pi buffer; <br> PERM must be set to 11; <br> SIZE must be set to 16k; <br> atb errors will occur if data other than the iv is referenced; <br> </td> </tr> <tr> <td> [39:38] </td> <td> DEV </td> <td> flash device id <br> 0..2 on memory module <br> </td> </tr> <tr> <td> [37:36] </td> <td> PERM </td> <td> permission bits <br> 10 allow pio read <br> 01 allow dma read <br> 00 access error <br> </td> </tr> <tr> <td> [35:32] </td> <td> SIZE </td> <td> size of block <br> 0 16 kBytes <br> 1 32 kBytes <br> 2 64 kBytes <br> : <br> 15 512 MBytes <br> </td> </tr> <tr> <td> [31:16] </td> <td> PADDR </td> <td> physical block address; </td> </tr> <tr> <td> [15:0] </td> <td> VADDR </td> <td> virtual block address; <br> bits [29:14] of the device address; </td> </tr> </table> </center> <p> There is only an alignment restriction on the virtual address VADDR. It must be aligned to the SIZE of the atb entry. The physical address can have any 16k offset. Atb entries with the IV bit set are special, in that they tell the aes engine to use the cbc init vector from the pi buffer instead of reading it from flash. <p> <b><u> IO Bus Controller </u></b> <p> The IO Bus Controller moves data between the io bus interface and the pi data buffers. The external io bus protocol is based on IDE and allows for direct connection of ide devices, such as CD or hard drives. The io bus supports one ide dma channel and up to four pio channels. When enabled in the si, one of the pio channels is reserved for button sampling. Another port is reserved for a debug and development kit. When accessing any of the four ide pio spaces (PI_IDE0 ... PI_IDE3), bits [16:1] or the cpu address are sent out on the io data bus in an address phase to be latched by external hardware. Bus timing is programmable globally, not on a port-by-port basis. <p> <table cellpadding=2 cellspacing=2 border=1> <tr> <td> Name </td> <td> Address </td> <td> Data </td> <td> Read/Write </td> <td> Reset </td> <td> N64 </td> <td> Description </td> </tr> <tr> <td> PI_IDE_CONF </td> <td> 0x0460_0064 </td> <td> [31] </td> <td> RW </td> <td> 1 </td> <td> n </td> <td> 1 ide bus reset; <br> bit directly controls IO_RST; <br> software is responsible for reset timing; <br> bus accesses are legal during bus reset; </td> </tr> <tr> <td> </td> <td> </td> <td> [30:16] </td> <td> RW </td> <td> - </td> <td> n </td> <td> write data are ignored; <br> reads return 0; </td> </tr> <tr> <td> </td> <td> </td> <td> [30:26] </td> <td> RW DEND </td> <td> 8 </td> <td> n </td> <td> IOR/IOW dma cycle end time; <br> defaults to dma mode 2 at 62.5MHz </td> </tr> <tr> <td> </td> <td> </td> <td> [25:21] </td> <td> RW DRWD </td> <td> 7 </td> <td> n </td> <td> IOR/IOW dma deassertion time; <br> defaults to dma mode 2 at 62.5MHz </td> </tr> <tr> <td> </td> <td> </td> <td> [20:16] </td> <td> RW DRWA </td> <td> 2 </td> <td> n </td> <td> IOR/IOW dma assertion time; <br> defaults to dma mode 2 at 62.5MHz </td> </tr> <tr> <td> </td> <td> </td> <td> [15:10] </td> <td> RW PEND </td> <td> 9 </td> <td> n </td> <td> pio cycle end time; <br> defaults to pio mode 2 at 62.5MHz </td> </tr> <tr> <td> </td> <td> </td> <td> [9:5] </td> <td> RW PRWD </td> <td> 8 </td> <td> n </td> <td> IOR/IOW pio deassertion time; <br> defaults to pio mode 2 at 62.5MHz </td> </tr> <tr> <td> </td> <td> </td> <td> [4:0] </td> <td> RW PRWA </td> <td> 1 </td> <td> n </td> <td> IOR/IOW pio assertion time; <br> defaults to pio mode 2 at 62.5MHz </td> </tr> <tr> <td> PI_IDE_CTRL </td> <td> 0x0460_0068 </td> <td> [31:0] </td> <td> RW </td> <td> - </td> <td> n </td> <td> XXX </td> </tr> <tr> <td> PI_IDE0 </td> <td> 0x0468_0000 <br> ... <br> 0x0469_ffff </td> <td> [31:16] <br> [15:0] </td> <td> RW </td> <td> - </td> <td> n </td> <td> ide IO_CS[0] space; <br> writes start an ide write cycle; <br> reads return data in both 16-bit words; <br> </td> </tr> <tr> <td> PI_IDE1 </td> <td> 0x046a_0000 <br> ... <br> 0x046b_ffff </td> <td> [31:16] <br> [15:0] </td> <td> RW </td> <td> - </td> <td> n </td> <td> ide IO_CS[1] space; <br> writes start an ide write cycle; <br> reads return data in both 16-bit words; <br> </td> </tr> <tr> <td> PI_IDE2 </td> <td> 0x046c_0000 <br> ... <br> 0x046d_ffff </td> <td> [31:16] <br> [15:0] </td> <td> RW </td> <td> - </td> <td> n </td> <td> ide IO_CS[2] space; <br> writes start an ide write cycle; <br> reads return data in both 16-bit words; <br> if enabled in the SI, this is the button port; </td> </tr> <tr> <td> PI_IDE3 </td> <td> 0x046e_0000 <br> ... <br> 0x046f_ffff </td> <td> [31:16] <br> [15:0] </td> <td> RW </td> <td> - </td> <td> n </td> <td> ide IO_CS[3] space; <br> writes start an ide write cycle; <br> reads return data in both 16-bit words; <br> a development kit may use this port; </td> </tr> <tr> <td> PI_IDE_FC </td> <td> 0x0462_0000 </td> <td> [31:0] </td> <td> RW </td> <td> - </td> <td> n </td> <td> ide flow-control space; <br> mirror of pi register space; <br> reads stall until completion of ide pio cycle; </td> </tr> </table> <p> The cpu must issue 16-bit reads and writes to the ide bus spaces, ie. LH or SH instructions. The hardware will pick the appropriate write data based on cpu addres bit 1. IDE cycle timing is slow compared to io bus or cpu timing, ie. a write in pio mode 0 may take up to 600ns. Reads from the io bus do not have a flow control problem, because the read stalls the cpu until the data is available. From the cpu's point of view, ide writes are fire and forget. Write flow control can be done in software by spacing back-to-back ide write requests based on the timing configuration. The ide controller also has a hardware flow control feature. Reading from PI_IDE_FC space will stall the cpu until the ide controller has finished the requested ide pio cycle. Since this is an uncached read, it will also flush the cpu store buffer to commit the write to the sysad interface. <p> <b><u> General Purpose IO Signals </u></b> <p> The PI supports four general purpose pins GPIO[3:0]. Each pin can be programmed as input or output. The io data bus is tri-stated while the chip reset input is asserted. During this time, weak pull resistors can be used to drive a board id onto the io data bus, which is latched into the PI_GPIO register on the rising edge of the pin reset signal. <p> <table cellpadding=2 cellspacing=2 border=1> <tr> <td> Name </td> <td> Address </td> <td> Data </td> <td> Read/Write </td> <td> Reset </td> <td> N64 </td> <td> Description </td> </tr> <tr> <td> PI_GPIO </td> <td> 0x0460_0060 </td> <td> [31:16] </td> <td> W </td> <td> - </td> <td> n </td> <td> write data are ignored </td> </tr> <tr> <td> </td> <td> </td> <td> [31:16] </td> <td> R ID </td> <td> - </td> <td> n </td> <td> io id bits; <br> io bus is tristated during pin reset; <br> id is latched from io bus at rising edge of chip reset; <br> external resistors supply a weak pull to desired levels; </td> <tr> <td> </td> <td> </td> <td> [15:8] </td> <td> RW </td> <td> 0 </td> <td> n </td> <td> write data are ignored; <br> reads return 0; <br> </td> </tr> <tr> <td> </td> <td> </td> <td> [7:4] </td> <td> RW GPE </td> <td> - </td> <td> n </td> <td> GPIO output enables; <br> writes set the GPIO output enables; <br> reads return the current setting; <br> writing 1 enables the appropriate GPIO pin as output; <br> GPIO[1:0] default to output 0 at falling edge of pin reset; <br> GPIO[3:2] default to inputs at falling edge of pin reset; <br> GPIO[3:2] need external pull resistors; <br> software can reconfigure all GPIO after reset; </td> </tr> <tr> <td> </td> <td> </td> <td> [3:0] </td> <td> W GPO </td> <td> - </td> <td> n </td> <td> GPIO output values; <br> writes set logic levels of GPIO outputs; levels are driven out when GPIO has been configured for output; </td> </tr> <tr> <td> </td> <td> </td> <td> [3:0] </td> <td> R GPI </td> <td> - </td> <td> n </td> <td> GPIO input values; <br> reads return the value of the GPIO pins; </td> </tr> </table> <p> <b><u> PI Buffer DMA </u></b> <p> The dma controller can be used to move data between the pi sram buffer and main memory. The PI_DRAM_ADDR register must point to main memory. Bits [9:1] of the PI_DEV_ADDR register now address the data in the pi buffer. The hardware does not look at the other address bits. If a dma length is specified to go beyond the end of the buffers (1k), then the address will wrap around. Dma to the pi buffer must have 8-byte aligned start addresses, and the length will be rounded up to 8-byte alignment. Dma from the buffer to memory can use the same 2-byte alignment that the traditional dma allows. The PI_DMA_READ and PI_DMA_WRITE registers reappear at a new address as PI_DMA_BREAD and PI_DMA_BWRITE. Like the traditional dma length registers, they trigger the dma controller to move the data. The separate register window is neccessary for backwards compatibility. Interrupts, aborts and the PI_DMA_STATUS register are identical to the traditional dma. <p> <table cellpadding=2 cellspacing=2 border=1> <tr> <td> Name </td> <td> Address </td> <td> Data </td> <td> Read/Write </td> <td> Reset </td> <td> N64 </td> <td> Description </td> </tr> <tr> <td> PI_DMA_BREAD </td> <td> 0x0460_0058 </td> <td> [23:0] </td> <td> RW </td> <td> x </td> <td> n </td> <td> dma memory -> pi buffer; <br> write sets dma length and starts dma; <br> register must be written with size - 1; <br> addresses must be 8-byte aligned; <br> length will be rounded up to 8-bytes alignment; <br> a length of 0 completes immediatedly without moving data; </td> </tr> <tr> <td> PI_DMA_BWRITE </td> <td> 0x0460_005c </td> <td> [23:0] </td> <td> RW </td> <td> x </td> <td> n </td> <td> dma pi buffer -> memory; <br> write sets dma length and starts dma; <br> register must be written with size - 1; <br> addresses can be 2-byte aligned; <br> a length of 0 completes immediatedly without moving data; </td> </tr> </table> <p> <b><u> PI Error Reporting </u></b> <p> The new pi must deal with both correctable and fatal errors. The PI_ERROR register configures error handling and captures error information. The only correctable error is a single-bit ecc error on flash reads. It is corrected in the pi buffer on the fly. The ADDR and COR_ECC fields are set, but no interrupt is raised. Only the first such error is captured. The purpose is to report the address of the first block that may need relocation. The PI_ERROR register is not involved in handling ecc errors when the flash controller is used directly. <p> <table cellpadding=2 cellspacing=2 border=1> <tr> <td> Name </td> <td> Address </td> <td> Data </td> <td> Read/Write </td> <td> Reset </td> <td> N64 </td> <td> Description </td> </tr> <tr> <td> PI_ERROR </td> <td> 0x0460_0044 </td> <td> [31] </td> <td> RW SK_ERR </td> <td> 0 </td> <td> n </td> <td> 1 pi errors cause secure-kernel trap; </td> </tr> <tr> <td> </td> <td> </td> <td> [30] </td> <td> RW INT_ERR </td> <td> 0 </td> <td> n </td> <td> 1 pi errors cause pi error interrupt; </td> </tr> <tr> <td> </td> <td> </td> <td> [29:8] </td> <td> RO ADDR </td> <td> x </td> <td> n </td> <td> virtual device address of error; </td> </tr> <tr> <td> </td> <td> </td> <td> [7:5] </td> <td> RO </td> <td> 0 </td> <td> n </td> <td> write data are ignored, reads return 0; </td> </tr> <tr> <td> </td> <td> </td> <td> [4] </td> <td> RW WR_TRAP </td> <td> 0 </td> <td> n </td> <td> 1 flash write trap; <br> set by attempts to write to the flash by starting a read dma or by pio writes to any cartrige space; <br> PI_DEV_ADDR contains virtual address; <br> ADDR field is not loaded; <br> cleared by writing 0; <br> writing 1 causes error trap or interrupt; <br> </td> </tr> <tr> <td> </td> <td> </td> <td> [3] </td> <td> RW COR_ECC </td> <td> 0 </td> <td> n </td> <td> 1 correctable flash ecc error during pi dma; <br> ADDR contains virtual address; <br> cleared by writing 0; <br> writing 1 does not cause error trap or interrupt; <br> </td> </tr> <tr> <td> </td> <td> </td> <td> [2] </td> <td> RW UNC_ECC </td> <td> 0 </td> <td> n </td> <td> 1 uncorrectable flash ecc error during pi dma; <br> ADDR contains virtual address; <br> cleared by writing 0; <br> writing 1 causes error trap or interrupt; <br> </td> </tr> <tr> <td> </td> <td> </td> <td> [1] </td> <td> RW ATB_DMA </td> <td> 0 </td> <td> n </td> <td> 1 atb error caused by dma; <br> ADDR contains virtual address; <br> cleared by writing 0; <br> writing 1 causes error trap or interrupt; <br> </td> </tr> <tr> <td> </td> <td> </td> <td> [0] </td> <td> RW ATB_PIO </td> <td> 0 </td> <td> n </td> <td> 1 atb error caused by pio; <br> ADDR contains virtual address; <br> cleared by writing 0; <br> writing 1 causes error trap or interrupt; <br> </td> </tr> <tr> <td> PI_EDATA </td> <td> 0x0460_006c </td> <td> [31] </td> <td> RO </td> <td> x </td> <td> n </td> <td> pio write data for emulation; </td> </tr> </table> <p> Fatal errors are atb access/mapping errors, uncorrectable flash ecc errors, or the removal of the memory card. The first fatal error overwrites the ADDR field, sets the appropriate error bit and raises a secure kernel trap or error interrupt, if enabled. A previously captured correctable ecc error is lost. No further secure kernel trap or error interrupt is raised as long as bits [5:0] are not 0. <p> <b><u> PI Access Control </u></b> <p> The new PI_ACCESS register controls access right of all new registers and the pi buffer in non-secure mode. In secure mode, all bits are writable with any value. In non-securre mode, access can only be taken away by writing 0. A 1 must be written to keep enabled rights enabled. All the pi hardware features are accessible in secure mode, independent of the settings of PI_ACCESS. <p> <table cellpadding=2 cellspacing=2 border=1> <tr> <td> Name </td> <td> Address </td> <td> Data </td> <td> Read/Write </td> <td> Reset </td> <td> N64 </td> <td> Description </td> </tr> <tr> <td> PI_ACCESS </td> <td> 0x0460_0054 </td> <td> [31:7] </td> <td> RW </td> <td> - </td> <td> n </td> <td> write data are ignored; <br> reads return 0; </td> </tr> <tr> <td> </td> <td> </td> <td> [7] </td> <td> RW ERROR_ACC </td> <td> 0 </td> <td> n </td> <td> 1 allows access to the error registers in non-secure mode, PI_ERROR; PI_EDATA is always accessible for emulation in the application; </td> </tr> <tr> <td> </td> <td> </td> <td> [6] </td> <td> RW IDE_ACC </td> <td> 0 </td> <td> n </td> <td> 1 allows access to the ide controller in non-secure mode, PIO_IOC_CONF, PI_IDE0, PI_IDE1, PI_IDE2, PI_IDE3; </td> </tr> <tr> <td> </td> <td> </td> <td> [5] </td> <td> RW GPIO_ACC </td> <td> 0 </td> <td> n </td> <td> 1 allows access to the gpio hardware in non-secure mode, PI_GPIO; </td> </tr> <tr> <td> </td> <td> </td> <td> [4] </td> <td> RW BDMA_ENA </td> <td> 0 </td> <td> n </td> <td> 1 enables the buffer dma, PI_DMA_BREAD, PI_DMA_BWRITE; </td> </tr> <tr> <td> </td> <td> </td> <td> [3] </td> <td> RW AES_ACC </td> <td> 0 </td> <td> n </td> <td> 1 allows access to aes controller in non-secure mode, PI_AES_EKEY, PI_AES_INIT, PI_AES_CTRL; </td> </tr> <tr> <td> </td> <td> </td> <td> [2] </td> <td> RW ATB_ACC </td> <td> 0 </td> <td> n </td> <td> 1 allows access to atb hardware in non-secure mode, PI_ATBU, PI_BUF_ATB, PI_BUF_ATBU; </td> </tr> <tr> <td> </td> <td> </td> <td> [1] </td> <td> RW FLASH_ACC </td> <td> 0 </td> <td> n </td> <td> 1 allows access to flash controller in non-secure mode, PI_FLASH_ADDR, PI_FLASH_CTRL, PI_FLASH_CONF; </td> </tr> <tr> <td> </td> <td> </td> <td> [0] </td> <td> RW BUF_ACC </td> <td> 0 </td> <td> n </td> <td> 1 allows access to pi data buffer 0 and 1 in non-secure mode, PI_BUF0, PI_BUF1, PI_SP0, PI_SP1; </td> </tr> </table> <p> <b><u> PI Interrupts </u></b> <p> The new pi has three interrupt paths. The completion of a pi dma is still reported by setting a bit in the bcp interrupt register and the activation of the cpu INTR[0] signal. Software has to read a bcp register to find the interrupt source. The flash controller, aes controller and ide controller report their interrupts on the new device interrupt signal INTR[3]. Fatal errors are signalled through either the system error interrupt on the INTR[1] signal or by a secure kernel trap, see PI_ERROR bit SK. <hr> <font size="-1"> Problems and comments to <a href="mailto:berndt@broadon.com"> berndt@broadon.com </a> </font> </body> </html>