posts - 71,  comments - 41,  trackbacks - 0
© 2003 by Charles C. Lin. All rights reserved.

Background

You should know what UB and 2C representation is. You should also know about sign-extension.

ISA

When you started learning how to program, you were told that your program had to be compiled. That is, it had to be converted from a high-level language into a low-level language. For C and C++, the low-level language is basically machine code.

An ISA defines the machine and assembly code used by a CPU.

ISA stands for "Instruction Set Architecture". Effectively, the ISA is the programmer's view of the computer.

An ISA consists of:

  • The instruction set This is the set of instructions supported. This is the part that's usually called the assembly language.
  • The register set This is the set of registers you can use. (There are other hidden registers which you can't use directly. They are used indirectly, however).
  • The address space This is the set of memory addresses that can be used by your program.

The ISA is basically a hardware specification. It's the view of the hardware as seen by an assembly language programmer.

The ISP (instruction set processor) is an implementation of the ISA. There may be many implementations for a given ISA. For example, IA32 is the instruction set architecture for x86 processors. Intel has the Pentium and Celeron lines of CPUs that implement this ISA. AMD also has its own CPUs that implement the ISA. Each implementation is different, but they all run code written in IA32.

Why You Need to Know About Instructions

We study instruction sets because that's what CPUs process. They run one instruction after another. In order to understand how a computer works, you need to know what instructions are, and more importantly, how to write them.

There are two ways to write instructions. Either you can write them in assembly language, which is human-readable. Or you can write them in machine code, which is basically, 0's and 1's. CPUs process machine code, but humans usually program in assembly language.

You need to know both, in order to understand how a CPU works.

The MIPS ISA

There are two ways to write instructions. You can write it in assembly language, which is human readable, or you can write it in machine code, which is 0's and 1's. For MIPS32, each machine code instruction is a 32-bit bitstring.

The "32" in MIPS32 refers to the size of the registers (i.e., how many bits each register holds) and to the number of bits used in an address. There is also a MIPS64, which has 64 bit addresses and 64 bit registers.

The MIPS32 architecture contains 32 general purpose int registers. The registers are named $r0, $r1, ..., $r31. Each register can store 32 bits. Most of the times the registers either store signed or unsigned ints. However, sometimes they store addresses, and occasionally ASCII characters, etc.

MIPS also has 32 floating point registers, but we won't worry about them too much.

Unlike programming languages where you can declare as many variables as you want, you can't create any more registers. The number of registers doesn't change.

MIPS32 allows you to access data in memory using 32 bit addresses. In principle, you can access up to 232 different addresses, using 32 bits. In practice, some of those addresses may be invalid. For example, the CPU may simply not have that much memory (232 addresses is 4 GB). Thus, you might be able to generate the 32-bit address, but there may be nothing stored at that address (an error usually occurs when you access an invalid address).

In MIPS, nearly all registers are general purpose. You can classify ISAs into those that use general purpose registers (i.e., instructions can refer to any register---all registers perform the same operations) or special purpose (certain instructions can only be used on specific, i.e., not all, registers).

However, there is at least one exception. $r0 is not general purpose. It is hardwired to 0. No matter what you do to this register, it always has a value of 0. You might wonder why such a register is needed in MIPS.

The designers of MIPS used benchmarks (programs used to determine the performance of a CPU), which convinced them that having a register hardwired to 0 would improve the performance (speed) of the CPU as opposed to not having it. Not everyone agrees a register hardwired to 0 is essential, so not all ISAs have a zero register.

Assembly vs. Machine Code

CPUs process binary bitstrings. These bitstrings are really instructions, encoded in 0's and 1's. When people began to write programs for computers, they wrote it in binary. That is, programs were written in 0's and 1's. The code probably looked something like this:

0000 0000 0101 1000 0000 0000 0101 1000
1010 1101 0000 1011 1000 1100 1001 0110

This is called machine code.

As you might imagine, machine code was difficult to read and difficult to debug. The amount of time wasted trying to find whether you had accidentally written a 0 instead of a 1, lead to the invention of assembly language.

Assembly language is a somewhat more human-readable version of machine code. For example, assembly code might look something like:

add   $r2, $r3, $r4
addi  $r2, $r3, -10

While you may not understand the code above, you've certainly got a much better chance of figuring it out than the machine code equivalent. Each line of assembly code contains an instruction. Each instruction tells the computer one small task to accomplish. Instructions are the building blocks of programs.

CPUs can't handle assembly code directly. Instead, assembly code is translated to machine code. If this sounds like compiling, that's because it basically is comiling. However, people usually call the process of translating assembly to machine code assembling, instead of compiling.

You'll write code in assembly, and learn how to translate some instructions from assembly to machine code. It's very important that you understand the machine code, because that's what the CPU processes. Furthemore, by studying machine code, you get to see how information is encoded into 0's and 1's, and you get to see how the CPU uses these binary values to execute the instruction in hardware.

Encoding Registers

In the previous set of notes, we talked about how many bits you needed to create N different labels. We assume each label has k bits long.

You need k = ceil( lg N ) bits to uniquely label N items.

MIPS32 has 32 integer registers. We want to label each register by a number, so instructions can refer to registers by number. Since MIPS has 32 registers, you need ceil( lg 32 ) = 5 bits.

If we think of the 5 bit numbers as unsigned binary numbers, then the registers are numbered from 0 up to 31, inclusive. In fact, that's exactly how MIPS numbers its registers. Registers are numbered from $r0 up to $r31. The binary equivalent are numbered from 00000 to 11111.

In assembly language, you'd write $r6. In machine code, you'd write the same register as: 00110. In assembly language, you'd write $r30. In machine code, you'd write 11110.

This is important because we're going to use register encoding in the machine language instructions for MIPS. Recall that machine code is a 32-bit bitstring. When we refer to registers within the instruction, it's going to be using the 5 bit binary numbers written in UB (unsigned binary).

What is an instruction?

An assembly language instruction is basically a function call. Like C functions, assembly language instructions have a fixed number of arguments. You can't add or remove the number of arguments.

Like C functions, arguments of assembly language instructions have type. Or at least, something that resembles type. Basically, there are 4 kinds of "types" for MIPS.

  • Registers ($r0, $r1,..., $r31)

  • Immediates Constants, such as, 10, -20, etc. Sometimes written in hexadecimal, e.g., 0x3a.

  • Register Offset This is a constant and a register, written as -10($r3) or 214($r4). That is, you write the immediate (constant) value, then a left parenthesis, then a register, then a right parenthesis.

    The computation is performed by adding the contents of the register to the offset, usually resulting in a 32 bit address. Thus, -10($r3) is -10 added to the contents of register 3. This result is "temporary" and register 3 is not modified (just like x + y in a programming language merely adds x to y, but the sum does not change x or y

  • Labels There are identifiers to locations in memory. Generally, you write labels in uppercase letters and underscores, such as FOR_LOOP.

For the most part, we'll only consider registers and immediate values.

Let's consider two examples of instructions and their operands:

  • add $r2, $r3, $r4 This instruction adds the contents of register 3 and register 4, and places the result in register. It's basically R[2] = R[3] + R[4], if you pretend that the registers form an array.

  • addi $r2, $r3, -10 This instruction adds the contents of register 3 to -10 and places the result in register 2. It's basically R[2] = R[3] - 10

The first instruction is an add instruction. add requires exactly 3 operands (arguments). Each operand must be a valid register. The operand can not be anything besides a register. In particular, you can not create expressions such as:

# WRONG! Operands can't be expressions
add $r2, $r3, (add $r4, $r5, $r6) 
The second instruction is an addi instruction. addi must also have three operands. The first two operands must be registers, while the third one must be an integer between -215 to 215 - 1, inclusive.

There is a reason for this restriction in value, which we will discuss momentarily.

Unlike higher level programming languages, you can't create new registers. You're forced to use the ones available. You can't create new instructions either. You must use the ones provided in the instruction set.

Machine Code

A machine language instruction ususally consists of:

  • opcode This is a binary representation of the instruction. For example, an add instruction has an opcode of 000 000.
  • operands Operands means the same thing as arguments. It's older terminology usually associated with assembly/machine code instructions.

MIPS divides instructions into three formats. Instructions are either R-type (register type), I-type (immediate type), or J-type (jump type). The types refer to the format, not to its purpose. (For example, branch instructions are I-type, because of its format, even if it would seem like it should be J-type).

Here are the layouts of the three kinds of instructions.

R-type Instruction

OpcodeRegister sRegister tRegister dShift AmtFunction
B31..26B25..21B20..16B15..11B11..6B5..0
ooo ooossssstttttdddddaaaaaffffff

  • R-type instructions are short for "register type" instructions.
  • Bits B31..26 are used for the opcode. For R-type instructions, the opcode is almost always 000 000. Normally, this makes no sense, because every instruction should have a unique opcode. However, bits B5..0 (the function part) uses 6 bits to specify the instruction. Only R-type instruction uses a function.
  • Bits B25..21 specify a 5-bit UB encoding for the first source register.
  • Bits B20..16 specify a 5-bit UB encoding for the second source register.
  • Bits B15..11 specify a 5-bit UB encoding for the destination register. This specifies which register stores the result of the operation.
  • Bits B11..6 specify the shift amount. This is usually 00000, except for shift instructions.
  • Bits B5..0 specify a 6-bit function. Each R-type instruction has a unique 6 bit value. For example, add has a 6-bit value that's different from sub. add and sub are two different instructions.

I-type Instruction

OpcodeRegister sRegister tImmediate
B31..26B25..21B20..16B15..0
ooo ooossssstttttiiii iiii iiii iiii

  • I-type instructions are short for "immediate type" instructions.
  • Bits B31..26 are used for the opcode. Unlike R-type instructions, the 6-bit value is NOT 000 000. There is no function code for I-type instructions.
  • Bits B25..21 specify a 5-bit UB encoding for the source register.
  • Bits B20..16 specify a 5-bit UB encoding for the destination register. Although this is called register t, instead of register d, it is treated as the destination register for I-type instructions.
  • Bits B15..0 is the 16-bit immediate value. This may be a 16-bit UB number or a 16-bit 2C number. Notice that the immediate value is encoded directly into the instruction.

J-type Instruction

OpcodeTarget
B31..26B25..0
ooo ooott tttt tttt tttt tttt tttt tttt

  • J-type instructions are short for "jump type" instructions.
  • Bits B31..26 are used for the opcode. Unlike R-type instructions, the 6-bit value is NOT 000 000. There is no function code for J-type instructions.
  • Bits B25..0 are used for the offset. This is usually used to generate an address.

Notice that the J-type instruction has no source or destination registers.

add, an R-type instruction

The general format for an add instruction is:
add $rd, $rs, $rt
$rd, $rs, and $rt are not real registers. They are merely place holders. For example, if we write add $r2, $r3, $r4, then for this particular example, $rd = $r2, $rs = $r3, and $rt = $r4.

In assembly language, the instructions are written with the destination register (i.e. register d), then the first source register, (i.e. register s) then the second source register (i.e. register t).

Note: This is NOT the same order as it is written in machine code. In assembly, it's destination, source 1, source 2. In MIPS machine code, it's written source 1, source 2, destination.

Don't ask me why the MIPS folks did it that way. They just did.

Let's translate the following instruction into MIPS assembly.

add $r2, $r3, $r4

For add, the opcode is 000 000. The function code is 100 000. Since the shift amount isn't used, it's set to 00000.

We encode $r2 as 00010, $r3 as 00011, and $r4 as 00100.

This is how the machine code equivalent looks:

OpcodeRegister sRegister tRegister dShift AmtFunction
B31..26B25..21B20..16B15..11B11..6B5..0
  $r3$r4$r2   
000 00000011001000001000000100 000

Again, notice that bits B25..21 is source 1 (i.e., $r3), then B20..16 is source 2 (i.e., $r4), then B15..11 is the destination register (i.e., $r2).

It's important that you learn how to translate a few instructions, because the CPU manipulates the binary version of this, not the assembly version. In particular, pay attention to how the registers are encoded, and just as importantly, which bits refer to which registers.

addi, an I-type instruction

addi stands for add immediate. It's an I-type instruction.

The general format for an addi instruction is:

addi $rt, $rs, IMMED
For I-type instructions, $rt is the destination register (not $rs). $rs is still the first source register. For addi, the immediate value is written in base 10 (or possibly, hexadecimal), but it eventually gets translated to 2C.

Let's look at a specific example.

addi $r3, $r10, -3

This instruction adds the contents of register 10, to the value -3, and stores the result in register 3.

The opcode for addi is 001 000. In 2C, you write -3ten as 1111 1111 1111 1101.

This is how the instruction is encoded.

OpcodeRegister sRegister tImmediate
B31..26B25..21B20..16B15..0
  $r10$r3-10, represented in 2C
001 00001010000111111 1111 1111 1101

Again, notice that in the assembly code $r3 (i.e., the destination register) appears first, while in the machine code $r3 appears second. Also, notice that the immediate value is written in 16 bits, two's complement.

Now that you see why it's written in 16 bits, 2C, you see why the immediate value can only be between -2-15 through 215 - 1. This is the range of valid values for 16 bit 2C.

The assembler must translate base 10 representation to 2C representation when translating addi from assembly to machine code.

Some instructions encode the immediate in 2C, while other instructions encode it in UB.

Summary

This section on instructions is not trying to teach you how to program in MIPS assembly. Instead, it's to briefly introduce you to what an instruction is, and how it is encoded.

While it's useful to know how to program in MIPS assembly, it's isn't essential to understand how a CPU works. To understand how a CPU works, at least, initially, all you need to know is what an instruction looks like in binary, and what that individual instruction is supposed to do.

posted on 2007-01-23 15:42 Charles 阅读(420) 评论(0)  编辑 收藏 引用 所属分类: 拿来主义

只有注册用户登录后才能发表评论。
网站导航: 博客园   IT新闻   BlogJava   知识库   博问   管理


<2007年1月>
31123456
78910111213
14151617181920
21222324252627
28293031123
45678910

决定开始写工作日记,记录一下自己的轨迹...

常用链接

留言簿(4)

随笔分类(70)

随笔档案(71)

charles推荐访问

搜索

  •  

积分与排名

  • 积分 - 48227
  • 排名 - 455

最新评论

阅读排行榜

评论排行榜