Home Products Documents links News Events Tech Info

ARM ASSEMBLER PROGRAMMER'S POCKET REFERENCE (v.0.25)

A BITS PUBLICATION

WRITTEN AND ENTERED INTO THE PUBLIC DOMAIN BY DAVE WALKER JANUARY 1994


History
~~~~~~~
November 1992: Version 0.1,  placed in PD by Dave Walker
February 1993: Version 0.1A, incorporating corrections by Graham Willmott

(Dave Walker then goes and does assorted other things, including doing an MSc,
falling in love, losing an A540 and getting a job)

December 1993: Version 0.2, incorporating corrections and extensions
                            by Martin Ebourne and extra bits at suggestion
                            of Kevin Bracey; compiled from two issues of
                            EurekA into one document for John Veness.
January 1994: Version 0.25, correcting the obvious typo in 0.2 and adding 
                            rather more on the (very nontrivial) Floating Point 
                            system.

ARM 6xx Pocket Reference (which should also work for pretty much anything based 
on the uncustomised ARM 7xx) planned somewhen, but don't hold your breath.

Email corrections to bits-admin@bristol, and they will somehow get through to 
me.

Dedicated, as is everything else I do, to DVG. 


Dave Walker, Cambridge 1994.

Memory Assignment, Parameter Passing and Assembly Directives
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Memory is reserved with DIM  & ; set P% to start of 
code as normal. 
P% acts as (usual) load address, and then execute address.
O% may be used for offset assembly, however RMs are NOT paged, but sent to
the Heap Manager. O% is  still useful, however, so that the object code can
still be assembled with the BASIC  source remaining in RAM at &8000. Range
checking has now been implemented, so program assembly aborts if reserved
memory  is exceeded. This is done using L%, and produces a listing like:

  DIM code% 256
  P%=code% 
  L%=P%+256
  [.... 

Multiple-pass assembly, and indeed all assembler options, are implemented 
using the OPT directive:

OPT  is a pseudo-opcode which directs the assembler to perform specific
           functions, and when used, is always the first operator after the [.
           When OPT is not explicitly specified, the default value is set to 3.

            is an integer, or variable containing an integer, which has
           bits flagged to produce the following effects:

Bit 0   clear = No listing produced
          set = Listing produced
Bit 1   clear = No errors reported
          set = Errors reported
Bit 2   clear = No offset assembly
          set = Offset assembly performed
Bit 3   clear = No range check performed
          set = Range check performed (via L%)


 The "no errors" bit is necessary for the first pass of a multi-pass assembly
(ie where forward-referenced labels have been used).


Parameters may be passed to an assembler program either by assigning integers
to variables A% to H % which are then copied into R0-R7, or by using 

CALL 
{} The parameter set may be decoded by examining R9 and R10 on entry to the routine; R9 contains a pointer to the parameter descriptor block, and R10 contains the number of parameters passed. The memory block indicated by R9 has a 2-word entry in it for each parameter, the entries being set up in reverse order (last parameter passed is described by first 2 words of block). The first word of an entry is a pointer to the address where the variable itself is stored (or, in the case of a string, to where a pointer to it is stored), and the second word indicates the type of the variable. Type number First word points to Example of possible BASIC var passed ----------- -------------------- ------------------------------------ 0 Single-byte number ?var 4 4-byte integer !var,var%,var%(n) 5 5-byte real var, var(n) 128 String info block var$, var$(n) 129 Terminated char string $var 256+4 Integer array block var%() 256+5 Real array block var() 256+128 String array block var$ The locations pointed to by the first word are not guaranteed to be word-aligned. In the case of types 4, 5 and 129, the first word points directly to the variable; otherwise, it points to a further information block. For the case of strings, the first word points to a word-aligned string information block, which has the format Bytes 0-3: Pointer to the characters comprising the string Byte 4: Current number of characters in the string. The value of R0 produced by a routine may be read back into a BASIC variable using the command =USR() Registers and Processor Modes ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ User Mode --------- 16 registers, R0-R15, visible. Conventionally: R13 used as primary stack pointer R14 used as link register R0-R12 are general and available. R15 used as program counter (PC) and is also location of status flags: Bits 2-25 used as PC; "current" memory location is 2 instructions (8 bytes) past instruction being executed, as a result of pipelining. Need subtraction to compensate when writing interrupt routines. Bits 26-31 are the main status flags: Bit Letter Description --- ------ ----------- 26 F Fast Interrupt disable 27 I Interrupt disable 28 V Arithmetic overflow 29 C Carry 30 Z Zero 31 N Negative Bits 0 and 1 set processor status mode: Bit 1 (S1) Bit 0 (S0) Mode ---------- ---------- ---- 0 0 User (default) 0 1 Fast interrupt (FIQ) 1 0 Interrupt (IRQ) 1 1 Supervisor (SVC) Interrupts may NOT be disabled by directly writing masked bits to R15 in User Mode, but SWI"OS_IntOn" and SWI"OS_IntOff" may be used to toggle IRQ (bit 27). FIQ Mode -------- R8-R14 are replaced by private R8_FIQ - R14_FIQ IRQ Mode -------- R13 and R14 replaced by private R13_IRQ and R14_IRQ SVC Mode -------- R13 and R14 replaced by private R13_SVC and R14_SVC. Direct writing to writable support hardware is permitted. All hardware devices are memory mapped. Interrupts and SVC Mode ----------------------- SWI calls WILL corrupt R14_SVC. When executing an SWI from IRQ or FIQ, the code supplied by Acorn to get round this is: MOV R9,PC ; Preserve current processor mode ORR R8,R9,#3 ; Move to R8, selecting SVC mode TEQP R8,#0 ; Enter SVC mode MOV R0,R0 ; No-op to sync internal registers STMFD R13!,{R14} ; Preserve R14_SVC on SVC stack LDMFD R13!,{R14} ; Restore R14_SVC from SVC stack TEQP R9,#0 ; Back to original processor mode MOV R0,R0 ; No-op to sync internal registers In IRQ and FIQ modes, it is necessary to set bits 26 and 27 of R15, and then reset them before re-entering User mode. Note in the code segment above the change from the original MOVNV R0,R0 to MOV R0,R0; an official statement has been made by Acorn to the effect that the NV conditional extender is not to be used in further software. Addressing Modes in ARM Assembler ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ As R15 has a 26-bit wide address field, it is obviously not possible to fit an absolute address description into a 32 bit instruction which also has to contain the opcode, conditional(s) and operand registers. Hence, when specifying absolute addresses in ARM code, it is necessary to use indirection. The basic operands to load and store single-word registers are LDR and STR. LDR will be used in the examples illustrating addressing modes. Pre-Indexed Addressing ---------------------- Syntax: LDR , [{,}] is a simple register. is an optional quantity such that the memory word loaded into is the contents of address (base+offset). may be a simple register, a shifted register or an immediate constant in the range -4095 to +4095. Write-back is also implemented, so the instruction LDR R0,[R3,R8,LSL#2] has the effect Load R0 with the contents of (R3+4*R8) and set R3=R3+4*R8. Post-Indexed Addressing ----------------------- Syntax: LDR [], The contents of are taken as the address to load from, and then the contents of are added to it. Write-back is implicit. Hence: LDR R8,[R2],R5,LSL#4 ; Load R8 from address R2: R2=R2+16*R5. Assembler Mnemonics ~~~~~~~~~~~~~~~~~~~ Register Loading ---------------- MOV , Loads register with absolute value of expression; ie if expression is a label, the absolute address of same at compile time. Affects N,Z. MVN , Loads NOT(op1) into reg. As 2's complement is -n=NOT(n)+1, use MVN ,(n-1) to load reg with -n. ADR , Loads register with address given as offset from P%, giving relocatable code. For expr=label, the offset calculations are user-transparent. Shift Instructions ------------------ LSL#n or Rx Shifts the operated register left by n places, or by the no. of places indicated by Rx. Zeroes are inserted into bit 0, successive bit 31s are shifted into the carry, overflow is lost. ASL #n or Rx Use is exactly the same as LSL. LSR #n or Rx Zeroes inserted at bit 31; data shifted right and through bit 0. Data overflow at least significant end is lost. ASR #n or Rx As LSR, but bit 31 (sign) is preserved and bit 0 moves into the carry. For a shift of 1 place, bit 31 is copied to bit 30; afterwards treat as an LSR on a 31 bit word, bit 32 remaining signed. ROR #n or Rx Barrel shift right n places; bits move from bit 0 to bit 31. A copy of the initial bit 0 is preserved in the carry flag. RRX Rotate right by one place, using the carry as a "bit 32." Bit 0 goes to the carry, the carry goes to bit 31 etc. Essentially, a 33 bit rotate. Logical Processing Instructions ------------------------------- ADD ,, Add operand 1 to operand 2, store in destination. N,Z,C,V are updated if S suffix used. ADC ,, As ADD, but accounts for the initial status of the carry flag. N,Z,C,V affected if S suffix used. SUB ,, Subtract operand 2 from operand 1, store in destination. Valid if op1 and op2 are both unsigned or both 2's complement. Affects N,Z,C,V if S suffix used. SBC ,, Subtract with carry; ie dest=op1-op2-NOT(carry). Affects N,Z,C,V if S suffix used. RSB ,, Subtract op1 from op2, so shift ops can be done on op2. Affects N,Z,C,V if S suffix used. RSC ,, dest=op2-op1-NOT(carry). Affects N,Z,C,V if S suffix used. CMP , Reflect notional result of op1-op2 in N,Z,C,V flags. S suffix is implicitly assumed. CMN , Reflect notional result of op1-(-op2). Note not NOT(op2)! Sets Z if op1=op2. Affects N,Z,C,V. S suffix implicitly assumed. AND ,, dest=op1 AND op2. Affects N,Z if S suffix used. ORR ,, dest=op1 OR op2. Affects N,Z if S suffix used. EOR ,, dest=op1 EOR op2. Affects N,Z if S suffix used. BIC ,, dest=op1 AND (NOT (op2)). If we take op1 and treat op2 as a mask, a set bit in op2 will cause the corresponding bit in op1 to clear. Perform for all 32 bits, then store the modified op1 in dest. Affects N,Z if S suffix used. TST , Bitwise notional AND of op1 and op2. Either op can be the bit mask; Z sets if the bit set in the mask is also set in the op. Affects N,Z. S suffix is implicitly assumed. TEQ , Notional bitwise OR, used to test equivalence. Affects N,Z. S suffix implicitly assumed. MUL ,, dest=op1*op2. Restrictions: op1 and op2 must be simple registers, must be different, and must not be R15. N and Z reflect the result, V is unchanged and C becomes ill-defined (if S suffix used). MLA ,,, dest=(op1*op2)+op3. Useful for running totals. Permissible for dest to be the same as op3. Flags set as with MUL. Block Data Transfer ------------------- LDM{!}, ] ] See "Stack Implementation" section STM{!}, ] Miscellany ---------- B Direct branch to routine at addr; wasteful on pipeline. BL Copies pipeline-corrected PC to R14 and branches to routine at addr; return may be affected by MOV PC,R14 at end of subroutine. SWI Executes an SWI routine. See available list of routines. Conditional Execution Suffixes ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ All ARM assembler operations may be executed conditionally; the conditions are defined by a two-letter extension appended to the mnemonic to be executed subject to the conditions. In the table below, "flag status" refers to the conditions required for a command extended by the suffix to be executed. Suffix Literal Meaning Flag Status Comments ------ --------------- ----------- -------- EQ Equal Z set eg from CMPS with equal args NE Not equal Z clear VS Overflow set V set VC Overflow clear V clear AL Always Default, implicit NV Never Only workable in a large chunk of Boolean; use now prohibited by Acorn HI Higher (unsigned) C set AND arg1>arg2 ] Z clear ]Use after a CMP or ]CPN LS Less than or same as C clear OR Z arg1<=arg2] (unsigned) set PL Plus N clear Zero is positive MI Minus N set Bit 31 of prev. result is 1 CS Carry set C set prev. instruction carried or overflowed. Also HS (higher or same) allowed CC Carry clear C clear Also LO (lower) allowed GE Greater than or equal (N set AND V clear)OR(N clear AND V clear) (signed) LT Less than (signed) (N set AND V clear)OR(N clear AND V set) GT Greater than (signed) ((N set AND V set)OR(N clear AND V clear))AND Z clear LE Less than or equal ((N set AND V clear)OR(N clear AND V set))AND (signed) Z clear For signed suffixes, signing convention is 2's complement (ie NOT and add 1) Other Mnemonic Extenders ------------------------ B Byte flag Operator only operates on 8-bit operand; word alignment not necessary S Status flag Reflect result of operation in status flags. Beware; this is implicit in 6502, but must be specified in ARM unless otherwise stated (above) P Pipeline suspend Disable pipelining to allow writing to R15; result of operand (usually TEQ) is written back _directly_ to status bits (26-31) of R15. Stack Implementation -------------------- Stacks are most easily implemented using STM and LDM. Both of these instructions take the syntax {!}, where is either STM (store multiple) or LDM (load multiple) is I Increment address after storing each register D Decrement address after storing each register A Modify address after storing each register B Modify address before storing each register FA Operate on full ascending stack FD Operate on full descending stack EA Operate on empty ascending stack ED Operate on empty descending stack is the address of the stack pointer; usually held in R13 {!} specifies write-back of the modified address to the stack pointer (to prevent data being accidentally overwritten) is a comma-separated list of arguments, enclosed in braces {}, to be pushed to the stack. Registers are ALWAYS placed in ascending memory addresses according to their register number. The FA-ED extenders save the problem of having to reverse the usual extenders when pushing and pulling; to pull from a stack which had been pushed to using STMIA, it was necessary to use LDMDB. Now we can just use STMEA and LDMEA, for example. Floating Point Model ~~~~~~~~~~~~~~~~~~~~ The following instructions are _NOT_ included in the standard ARM BASIC assembler; however, for those of you who may have cross-assemblers or the Acorn Object Assembler, or are prepared to hand-code the instructions in hex, here goes. The FP model complies with the IEEE specification, and provides a further 8 registers, F0-F7. Numbers may be stored in the following 4 formats: IEEE Single Precision (S) 32 bits: 1 sign bit 23 bit mantissa 8 bit exponent IEEE Double Precision (D) 64 bits (double word): 1 sign bit 52 bit mantissa 11 bit exponent [Note: Older FPE and FPPC had a different format to this current system] Double Extended Prec. (E) 96 bits (triple word): 1 sign bit 64 bit mantissa 15 bit exponent 16 bits unused Packed decimal BCD (P) 96 bits (triple word): 1 sign digit 19 digit mantissa 4 digit exponent Expanded packed BCD (EP) 96 bits (triple word): 1 sign digit 24 digit mantissa 7 digit exponent [Note: This is not available in older FPE & FPPC] Note: P and EP are mutually exclusive - the one available is determined by the EP bit in the System Control byte. The FP status register has separate flags for Overflow, Underflow, Division by Zero, Inexact result and Invalid operation; a subset of the flags indicating these is copied into the ARM status register (R15) by the co-processor interface. In brief, the instructions are: LDF ,
STF ,
where: is one of S,D,F,P is F0-F7
is either [Rn]{,#offset} or [Rn,#offset]{!} and is defined as an offset from the ARM base register specified; the offset is in the range -1020 to +1020. The offset is added to the base register when write-back {!} is specified with pre-indexed addressing, and is always added with post-indexed. FLT Change integer to FP FIX Change FP to integer WPS Write FP status to FP status reg. RFS Read FP status from FP status reg. WFC Write FP control ] Processor SVC mode only RFC Read FP control ] This set uses the general form eg FLT{rounding mode} ,( #value) where the rounding mode is chosen from: Round to nearest (no mnemonic) Round to + infinity (P) Round to - infinity (M) Round to Zero (Z) Unary Operations ---------------- Command format: {rounding mode} ,( #val) Mnemonic Effect Calculation Opcode -------- ------ ----------- ------ MVF Move Fdest=Fop 00001 MNF Move negated Fdest=-Fop 00011 ABS Absolute value Fdest=ABS(Fop) 00101 RND Round to integer Fdest=INT(Fop) 00111 SQT Square root Fdest=SQR(Fop) 01001 LOG Log to base #10d Fdest=LOG(Fop) 01011 LGN Natural log Fdest=LN(Fop) 01101 EXP Exponent Fdest=EXP(Fop) 01111 SIN Obvious Fdest=SIN(Fop) 10001 COS Equally obvious Fdest=COS(Fop) 10011 TAN Ditto Fdest=TAN(Fop) 10101 ASN ] Fdest=ASN(Fop) 10111 ACS ] (Obvious)^-1 Fdest=ACS(Fop) 11001 ATN ] Fdest=ATN(Fop) 11011 Binary Operations ----------------- Command format: {rounding mode} ,,( #value) Mnemonic Effect Calculation Opcode -------- ------ ----------- ------ ADF Add Fdest=Fop1+Fop2 00000 MUF Multiply Fdest=Fop1*Fop2 00010 SUB subtract Fdest=Fop1-Fop2 00100 RSF Reverse subtract Fdest=Fop2-Fop1 00110 DVF Divide Fdest=Fop1/Fop2 01000 RDF Reverse divide Fdest=Fop2/Fop1 01010 POW Raise to power Fdest=Fop1^Fop2 01100 RPW Reverse raise... Fdest=Fop2^Fop1 01110 RMF Remainder Fdest=Fop1 MOD Fop2 10000 FML Fast multiply Fdest=Fop1*Fop2 10010 FDV Fast divide Fdest=Fop1/Fop2 10100 FRD Fast reverse divide Fdest=Fop2/Fop1 10110 POL Polar angle Fdest=angle between 11000 Fop1 and Fop2 Note that the "fast" instructionsonly produce single-figure accuracy, regardless of the precision specified in the mnemonic. System Memory Map ~~~~~~~~~~~~~~~~ Please note that this is the provisional version of the memory map details; the master table (at the top) is for an A540/R200 series machine (the addresses are the same on the entire Archimedes range, but there are no Acorn-endorsed RAM expansions to take you over the 4Mb), whereas the logically-mapped RAM area was assigned from much hacking on an A310. The addresses in this latter table are, of course, affected directly by machine configuration, and the data thus given is merely representative of the setup of that machine at that time. Read Write Addr ---- ----- ---- -------------------------------- 3FFFFFF ROM (high) | Logical to Physical | address translator ------------------- -------------------------------- 3800000 / 4Mb daughter card ROM (low) | DMA Address / 3rd slot MEMC (z) | generators / ------------------- |-------------------- 3600000 / 4Mb daughter card | Video Controller / 2nd slot MEMC (y) -------------------------------- 3400000 / ------------------- Input/Output Controllers / 4Mb daughter card -------------------------------- 3000000 / Ist slot MEMC (x) Physically mapped RAM ------------------- -------------------------------- 2000000 \ 4Mb on motherboard Logically mapped RAM \ Master MEMC (w) -------------------------------- 0000000 \----- ------------------- 0 ---------------------- \ ARM Reset \ \ 4 ---------------------- \ \ Undefined instruction \ \ 8 ---------------------- \ \ Software Interrupt (SWI) \ \ C ---------------------- \ 0 --------------------------------- Abort (pre-fetch) Bootstrap and hardware exception 10 ---------------------- vectors Abort (data) / 1C --------------------------------- 14 ---------------------- / System and BASIC workspace Address exception / 8000 --------------------------------- 18 ---------------------- / Application RAM IRQ_Vec / (RISC OS Desktop Tasks map to here) 1C --------------------- / A7FFF --------------------------------- FIQ_Vec / Unassigned address space --------------------- 1000000 --------------------------------- RAM filing system 1800000 --------------------------------- RAM-based Relocatable Modules (incl. those downloaded from podules) Font definitions grow downwards 1825FFF --------------------------------- Unassigned address space 1C00000 --------------------------------- System Heap 1C03FFF --------------------------------- Unassigned address space 1F00000 --------------------------------- Cursor data and Desktop scratchpad 1F07FFF --------------------------------- Unassigned address space 1FEC000 --------------------------------- Screen RAM (grows downwards) 1FFFFFF --------------------------------- IOC --- The I/O controller is mapped into RAM from &3000000 to &3400000, and _ALL_ I/O devices are also thus memory mapped. The chip has 4 operation modes; sync, fast, medium and slow. It therefore follows that a good many operations will be duplicated in different modes. A (necessarily incomplete) list of some device entry points follows; generally, these entry points are only directly accessible in SVC mode. Cycle Type Base Address Device Device IC ---------- ------------ ------ --------- Fast &3310000 Floppy disc controller WD1772 Sync &33A0000 Econet controller 6854 Sync &33B0000 Serial controller 6551 Fast &3350000 Printer data LS374 Fast &3360000 Podule IRQ register Fast &3360004 Podule IRQ mask register Slow &3270000 Extended external podule space Slow &3240000 Podule 0 Med &32C0000 Podule 0 Fast &3340000 Podule 0 Sync &33C0000 Podule 0 Slow &3244000 Podule 1 Med &32C4000 Podule 1 Fast &3344000 Podule 1 Sync &33C4000 Podule 1 Slow &3248000 Podule 2 Med &32C8000 Podule 2 Fast &3348000 Podule 2 Sync &33C8000 Podule 2 Slow &324C000 Podule 3 Med &32CC000 Podule 3 Fast &334C000 Podule 3 Sync &33CC000 Podule 3 Although these are the current lookup addresses, Acorn make no guarantees that they will stay here in future incarnations of the OS. Writing to and reading from these addresses is currently the fastest way to interact with a podule, but for future compatibility, it's safer to use the SWIs detailed later this issue. Alternatively, build your own lookup table using SWI "Podule_HardwareAddress", or (from what I've heard about it) take a look at !StrongHlp. NB -- When writing to a device via IOC, the data must be on the top 16 bits of the data word. Data to be read appears on the bottom 16 bits.

poppy@poppyfields.net