top left Image    Architecture


Introduction

The Megaprocessor is a 16 bit processor. There are four general purpose 16 bit registers: R0, R1, R2, R3. There is a program counter, PC. A register dedicated as the stack pointer SP. And a processor status register, PS. For computation there is a 16 bit ALU and a 16 bit adder. The registers and compute resources are connected via multiplexors which allow the Megaprocessor to configure its data flows in different ways for different instructions. For most instructions the processor is configured in this way:

base configuration

In this configuration the ALU takes two operands from the General Purpose Registers and writes the result of a calculation back to one of them. Concurrently with this the PC is being updated to get the next instruction.

The operation of the Megaprocessor is controlled by a state machine which is described here.

Nearly all instructions cause the PS register to be updated in some way, the meaning of the PS register bits are described here.

All registers are 16 bits wide and all operations are 16 bit orientated (there are no byte operations). The Megaprocessor has a load/store architecture: ALU operations can take place only between the general purpose registers so data must first be loaded into registers before being operated on and the results stored as required.

Whilst all internal data paths are 16 bits wide the external data bus is 8 bits wide. There is a 64K address space. There is no mechanism for stalling the processor, all byte accesses take one cycle. Data can be loaded/stored to memory as either bytes (8 bits) or words (16 bits). Words are stored in memory in littleendian format. Bytes loaded from memory are zero extended to 16 bits to fill a register. There is a SXT (sign extend) instruction to sign extend the LSB of a register if the byte should have been interpreted as a signed value.
I've added several peripherals, they're described here.

Instruction Set

Instructions consist of 8 bit opcodes with up to two bytes of following immediate data. This means that there are 256 possible instructions, and all are used. They are divided into 16 groups of 16 as per the table below.

Group: MOVER AND XOR OR ADD ADDQ SUB CMP
  0x00 0x10 0x20 0x30 0x40 0x50 0x60 0x70
0x00 sxt r0 test r0 xor r0,r0 inv r0 add r0,r0 addq r0,#2 neg r0 abs r0
0x01 move r1,r0 and r1,r0 xor r1,r0 or r1,r0 add r1,r0 addq r1,#2 sub r1,r0 cmp r1,r0
0x02 move r2,r0 and r2,r0 xor r2,r0 or r2,r0 add r2,r0 addq r2,#2 sub r2,r0 cmp r2,r0
0x03 move r3,r0 and r3,r0 xor r3,r0 or r3,r0 add r3,r0 addq r3,#2 sub r3,r0 cmp r3,r0
0x04 move r0,r1 and r0,r1 xor r0,r1 or r0,r1 add r0,r1 addq r0,#1 sub r0,r1 cmp r0,r1
0x05 sxt r1 test r1 xor r1,r1 inv r1 add r1,r1 addq r1,#1 neg r1 abs r1
0x06 move r2,r1 and r2,r1 xor r2,r1 or r2,r1 add r2,r1 addq r2,#1 sub r2,r1 cmp r2,r1
0x07 move r3,r1 and r3,r1 xor r3,r1 or r3,r1 add r3,r1 addq r3,#1 sub r3,r1 cmp r3,r1
0x08 move r0,r2 and r0,r2 xor r0,r2 or r0,r2 add r0,r2 addq r0,#-2 sub r0,r2 cmp r0,r2
0x09 move r1,r2 and r1,r2 xor r1,r2 or r1,r2 add r1,r2 addq r1,#-2 sub r1,r2 cmp r1,r2
0x0A sxt r2 test r2 xor r2,r2 inv r2 add r2,r2 addq r2,#-2 neg r2 abs r2
0x0B move r3,r2 and r3,r2 xor r3,r2 or r3,r2 add r3,r2 addq r3,#-2 sub r3,r2 cmp r3,r2
0x0C move r0,r3 and r0,r3 xor r0,r3 or r0,r3 add r0,r3 addq r0,#-1 sub r0,r3 cmp r0,r3
0x0D move r1,r3 and r1,r3 xor r1,r3 or r1,r3 add r1,r3 addq r1,#-1 sub r1,r3 cmp r1,r3
0x0E move r2,r3 and r2,r3 xor r2,r3 or r2,r3 add r2,r3 addq r2,#-1 sub r2,r3 cmp r2,r3
0x0F sxt r3 test r3 xor r3,r3 inv r3 add r3,r3 addq r3,#-1 neg r3 abs r3



Group: Indirect PostInc Stack rel Absolute Push/Pop Immediate Branch Misc
  0x80 0x90 0xA0 0xB0 0xC0 0xD0 0xE0 0xF0
0x00 ld.w r0,(r2) ld.w r0,(r2++) ld.w r0,(sp+m) ld.w r0,addr pop r0 ld.w r0,#data buc dd move r0,sp
0x01 ld.w r1,(r2) ld.w r1,(r2++) ld.w r1,(sp+m) ld.w r1,addr pop r1 ld.w r1,#data bus dd move sp,r0
0x02 ld.w r0,(r3) ld.w r0,(r3++) ld.w r2,(sp+m) ld.w r2,addr pop r2 ld.w r2,#data bhi dd jmp (r0)
0x03 ld.w r1,(r3) ld.w r1,(r3++) ld.w r3,(sp+m) ld.w r3,addr pop r3 ld.w r3#data bls dd jmp addr
0x04 ld.b r0,(r2) ld.b r0,(r2++) ld.b r0,(sp+m) ld.b r0,addr pop ps ld.b r0,#data bcc dd andi ps,#data
0x05 ld.b r1,(r2) ld.b r1,(r2++) ld.b r1,(sp+m) ld.b r1,addr [NOP2] ld.b r1,#data bcs dd ori ps,#data
0x06 ld.b r0,(r3) ld.b r0,(r3++) ld.b r2,(sp+m) ld.b r2,addr ret ld.b r2,#data bne dd add.b sp,#data
0x07 ld.b r1,(r3) ld.b r1,(r3++) ld.b r3,(sp+m) ld.b r3,addr reti ld.b r3#data beq dd sqrt
0x08 st.w (r2),r0 st.w (r2++),r0 st.w (sp+m),r0 st.w addr,r0 push r0 shift r0,descr bvc dd mulu
0x09 st.w (r2),r1 st.w (r2++),r1 st.w (sp+m),r1 st.w addr,r1 push r1 shift r1,descr bvs dd muls
0x0A st.w (r3),r0 st.w (r3++),r0 st.w (sp+m),r2 st.w addr,r2 push r2 shift r2,descr bpl dd divu
0x0B st.w (r3),r1 st.w (r3++),r1 st.w (sp+m),r3 st.w addr,r3 push r3 shift r3,descr bmi dd divs
0x0C st.b (r2),r0 st.b (r2++),r0 st.b (sp+m),r0 st.b addr,r0 push ps bit r0,descr bge dd addx r0,r1
0x0D st.b (r2),r1 st.b (r2++),r1 st.b (sp+m),r1 st.b addr,r1 trap #n bit r1,descr blt dd subx r0,r1
0x0E st.b (r3),r0 st.b (r3++),r0 st.b (sp+m),r2 st.b addr,r2 jsr (r0) bit r2,descr bgt dd negx r0
0x0F st.b (r3),r1 st.b (r3++),r1 st.b (sp+m),r3 st.b addr,r3 jsr addr bit r3,descr ble dd nop

A detailed description of each instruction is in this document.


There is one stage of pipelining, the next instruction is fetched whilst processing the current instruction. This means that in general instructions take one cycle per byte of memory access required (including the instruction itself). e.g. simple ALU instruction are one byte long, and take one cycle to execute. Loading a word from an absolute address takes 5 cycles as 5 bytes of memory access are required: 1 for the instruction, 2 for the address, 2 to load the word.

The multiply, divide and square root instructions are implemented using iterative algorithms described here.To implement some of the more complex instructions there is a "helper/hidden" register, H, which is used by the processor as a scratchpad. The H register does not carry any information from one instruction to the next.

Opcode 0xC5 is no longer used and is currently mapped to NOP.

Exceptions

There are four exceptions. For each exception there is an associated vector to which the processor will jump after completing exception processing. The base address for the vectors may be 0x0000 or 0xFFF0 selected by hardware switch. The vectors are offsets of  0x0, 0x4, 0x8 and 0xC relative to the vector base.

Structure

The Megaprocessor was designed as a set of 10 subsystems

  • State machine
  • Status register
  • Instruction Fetch
  • Instruction Decoding
  • Program Counter (PC)
  • Stack Pointer (SP)
  • Helper Register (H)
  • Address & Write data
  • ALU
  • General Purpose Registers (R0..R3)

For construction these were mapped onto 14 modules mounted in 5 frames as shown below:

layout of frames and modules






© 2014-2016 James Newman.