ARM610 detailsTaken from 'Eureka' by Dave Walker.
A Chip off the Old !block: ARM3 and ARM610 compared; ARM7 rumoursRecently (OK, not _too_ recently!) the ARM family was joined by a new member; the 610. This processor has received a great deal of speculative press, having been selected as the processor for Apple's Newton Palmtop machine. Also, Acorn forums on the net have been rife with rumours of yet another, incredibly high-end pair of ARMs, so far designated (for want of a better name) ARM 7 and ARM 8.
To add a little light relief to this issue of EurekA, I thought I'd distil what I've heard, and compare it to our friend the ARM 3; not only does this make a change from reading about my perennial wars with the PCM, but I haven't much choice; I deleted my emulator last week. Not from disgust at MS DOS, I hasten to add (although it was more than a passing contributor to the reasons why it didn't get reinstated!) but because of the dreaded "disc full" message. From my internal 120Mb hard drive (the 100Mb external hard drive being reserved for UNIX). One vaped PC partition and some rearranging later, I'm now happy with 45Mb free on it, and the partition ain't coming back. :-)
Anyhow, to work.
The ARM 610 is the first implementation of the ARM 6 macrocell, released jointly with the ARM 60 (an unaugmented ARM 600 package, pin-compatible with the ARM 2). I don't think an ARM 600 was ever released, although it was designed. Not surprisingly, it was designed to look like an ARM 3.
The ARM 6 cell is built out of fully static CMOS, which means that the clock signal to it can be turned off without losing information on the chip's state or the data contained therein. As the macro is based on micron-scale data paths, the final cell is sufficiently small to be included wholesale on a custom wafer of standard size, (2.8 mm^2) with physical room to spare for cache, controllers etc. Indeed, the cell is already in use as an embedded controller in some of the latest laser printers.
At most levels, the 610 is instruction-compatible with the ARM 3. However, the address space has been expanded to the full 32 bits possible (the processor flags & status mode bits have been moved to a new register, making 17 registers visible), and an extra pair of data lines (PROG32 and DATA32) which switch between full 32 bit instruction and data paths, and the 26 bit versions for ARM 1,2 & 3 compatibility. (Does anyone out there have an old ARM 1 evaluation system? If so, please let me know!) Also note that BYTE (second-sourcing the relevant bits of a December 1991 article, here) seem to have it wrong again... the data path (even the coprocessor data path) is the full 32 bits wide! My print set says so!
In addition, there is a status line to switch the processor addressing mode between big and little-endian. This makes the ARM 610 compatible with EVERYBODY else!
Virtual memory drivers have been implemented in hardware; instead of the old address exception error, the system now calls the drive controller via a software service routine to load the missing page into RAM, and the specific instruction into cache. These exceptions may be handled in both supervisor and user mode.
Recalling the Pre-Fetch and Data-Abort vectors in Archimedes zero page (see EurekA #2), it's interesting to see that on the 610, there are moves towards delaying the last possible abort until even later in the cycle. This would result in far less strain on the cache, but require extra garbage collection. The feature is not implemented on the ARM 6; expect it on the ARM 8. However, a LATEABT signal has been added, which when high, will simulate this extra feature.
Pages, and indeed page size (4-64K, variable) is controlled by the on-chip MMU. Virtual and real page numbers are stored in a cached translation lookaside buffer (TLB), which also stores memory protection level data. When accessing a page, the processor checks to see if the TLB contains a translation from the virtual address to a real address in RAM; if so, the physical address is output.
However, the interface to this (fairly conventional) memory model is based on an object-oriented user view of memory. If the page is not in RAM, the MMU allocates an index into the translation table, ofset by an address held in the translation table base register. If the table entry is for a section (number of pages), the index will contain the section's base address, which may be combined with a virtual-address index to give a physical address to load the data at. If the entry is for a single page, the index contains a pointer to the page table, where the address can be found.
One of the nicest tricks used in the new ARM is that lists of pointers are terminated by dummy instructions not on word boundaries. As the MMU traps accesses to all such words, the end of a list may be detected by intercepting the error raised by the MMU.
Access permissions are mapped separately from the virtual addresses, and can be manipulated in isolation. Page faults and permission faults are also handled by independent hardware. Access permissions are held in a 'domain space,' which is a contiguous arbitrary-sized block of pages. The chip supports up to 16 of these domains, each of which may be assigned an access level by the task managing it. Tasks are divided into clients, which may be denied access to a domain if the relevant access bits are set, and managers, which can always grab a domain directly. Each task may (and usually will) use multiple domains.
The trouble with this kind of management is that it generates huge amount of garbage, as tasks claim and release domains. It is predicted that the ARM 610 will support (not had the specs on this) a garbage collection program, running concurrently with whatever other task are present, which will release areas of memory which were claimed by a task since terminated by changing the permissions; I'm still not sure that this wouldn't leave some tasks stranded, though. Further info on this would be appreciated.
Rumours of the ARMs 7 and 8 exist, and it is believed that a prototype ARM 8 (probably built from ECL) has been run on Acorn's testbeds. Both ICs are said to be fully parallel-capable, and rumour has it that the ARM 8 is capable of asynchronous processing (although the current Arcs canbe considered asynch, as each of the four main ICs runs at a different speed; I'd be interested to know how an ARM 250 clocks the various parts of its mega-wafer). If this is so, an Archimedes n (where n is large!) would basically put mainframe processing power squarely on the desk. Thank you, Intel, and goodnight.