Implementation Supplement Fujitsu SPARC64 V User Manual

SPARC JPS1

Implementation Supplement:

Fujitsu SPARC64 V

Fujitsu Limited

Release 1.0, 1 July 2002

Fujitsu Limited

4-1-1 Kamikodanaka

Nahahara-ku, Kawasaki, 211-8588

Japan

Part No. 806-6755-1.0

SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002

F.CHAPTER

Contents

1. O verview 1

Navigating the S P A RC64 V I m p l e m entation S u pplement 1

Fonts and Notational Conventions 1

The SPARC64 V p rocessor 2

Component Overview 4

Instruction Control Unit (IU) 6

Execution Unit (EU) 6

Storage Unit (SU) 7

Secondary Cache and External Access Unit (S X U ) 8

2. D efinitions 9

3. Arc h itectural Overview 13

4. Data Formats 15

5. Registers 17

Nonprivileged Registers 17

Floating-Point State Register (FSR) 18

Tick (TICK) Register 19

Privileged Registers 19

Trap State (TSTATE) Register 19

Version (VER) Register 20

Ancillary State Registers (ASRs) 20

Registers Referenced Through ASIs 22

Floating-Point Deferred-Trap Queue (FQ) 24

IU Deferred-Trap Queue 24

6. I n structions 25

Instruction Execution 25

Data Prefetch 25

Instruction Prefetch 26

S y ncing Instructions 27

Instruction Formats and Fields 28

Instruction Categories 29

Control-T r ansfer Instructions (CTIs) 29

Floating-Point Operate (FPop) Instructions 30

I m plementation-Dependent Instructions 30

Processor Pipeline 31

Instruction Fetch Stages 31

Issue Stages 33

Execution Stages 33

Completion Stages 34

7. T r aps 35

Processor States, Normal and Special T r aps 35

Uses of the T r ap Categories 37

T r ap Control 38

PIL Control 38

Trap-Table Entry Addresses 38

Trap Type (TT) 38

Details of Supported Traps 39

Trap Processing 39

Exception and Interrupt Descriptions 39

SPARC V9 Implementation-Dependent, Optional Traps That Are

Mandatory in SPARC JPS1 39

SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002

SPARC JPS1 Implementation-Dependent Traps 39

8. Memory Models 41

Overview 42

SPARC V9 Memory Model 42

Mode Control 42

Synchronizing Instruction and Data Memory 42

A. Instruction Definitions: SPARC64 V Extensions 45

Block Load and Store Instructions (VIS I) 47

Call and Link 49

I m plementation-Dependent Instructions 49

Floating-Point Multiply-Add/ Subtract 50

J u mp and L ink 53

Load Quadword, Atomic [Physical] 54

Memory B a rrier 55

Partial Store (VIS I) 57

Prefetch Data 57

Read State Register 58

SHUTDOWN (VIS I) 58

Write State Register 59

Deprecated Instructions 59

Store Barrier 59

B. IEEE Std 754-1985 Requirements for SPARC V9 61

Traps Inhibiting Results 61

Floating-Point Nonstandard Mode 61

fp_exception_other Exception (ftt=unfinished_FPop) 62

Operation Under FSR.NS = 1 65

C. Implementation Dependencies 69

Definition of an Implementation Dependency 69

Hardware Characteristics 70

Implementation Dependency Categories 70

List of Implementation Dependencies 70

Release 1.0, 1 July 2002

F. Chapter

Contents

iii

D. F ormal Specification of the Memory Models 81

E. Opcode Maps 83

F. M emory Management Unit 85

Vi r tual Address T r anslation 85

T r anslation T a ble Entry (TTE) 86

TSB Organization 88

TSB Pointer Formation 88

Faults and T r aps 89

Reset, Disable, and RED_state Behavior 91

Internal Registers and ASI operations 92

Accessing MMU Registers 92

I/ D TLB Data In, Data Access, and Tag Read Registers 93

I/ D TSB Extension Registers 97

I/ D Synchronous Fault Status Registers (I-SFSR, D-SFSR) 97

MMU Bypass 104

TLB Re p lacement Policy 105

G. Assembly Language Syntax 107

H. Software Considerations 109

I. Extending the S P A RC V9 Architecture 1 11

J. Changes from S P A RC V8 to SPARC V9 1 13

K. Programming with the Memory Models 115

L. Address Space Identifiers 117

SPARC64 V ASI Assignments 117

Special Memory Access ASIs 119

Barrier Assist for Parallel Processing 121

Interface Definition 121

ASI Registers 122

M. Cache Organization 125

Cache Types 125

Level-1 Instruction Cache (L1I Cache) 126

SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002

Level-1 Data Cache (L1D Cache) 127

Level-2 Unified Cache (L2 Cache) 127

Cache Coherency Protocols 128

Cache Control/ Status Instructions 128

Flush Level-1 Instruction Cache (ASI_FLUSH_L1I) 129

Level-2 Cache Control Register (ASI_L2_CTRL) 130

L2 Diagnostics Tag Read (ASI_L2_DIAG_TAG_READ) 130

L2 Diagnostics Tag Read Registers (ASI_L2_DIAG_TAG_READ_REG) 131

N. Interrupt Handling 133

Interrupt Dispatch 133

Interrupt Receive 135

Interrupt Global Registers 136

Interrupt-Related ASR Registers 136

Interrupt Vector Dispatch Register 136

Interrupt Vector Dispatch Status Register 136

Interrupt Vector Receive Register 136

O. Reset, RED_state, and error_state 137

Reset T y pes 137

Power-on Reset (POR) 137

W a tchdog Reset (WDR) 138

Externally Initiated Reset (X IR) 138

S o ftware-Initiated Reset (SIR) 138

RED_state and error_state 139

RED_state 140

error_state 140

CPU Fatal Error state 141

Processor State after Reset and in RED_state 141

Operating Status Register (OPSR) 146

Hardware Power-On Reset Sequence 147

Firmware Initialization Sequence 147

P. Error Handling 149

Error Classification 149

Fatal Error 149

Release 1.0, 1 July 2002

F. Chapter

Contents

error_state Transition Error 150

Urgent Error 150

Restrainable Error 152

Action and Error Control 153

Registers Related to Error Handling 153

S u mmary of Actions Upon Error Detection 154

Extent of Automatic Source Data Correction for Correctable Error 157

Error Marking for Cacheable Data Error 157

ASI_EIDR 161

Control of Error Action (ASI_ERROR_CONTROL) 161

Fatal Error and error_state Transition Error 163

ASI_STCHG_ERROR_INFO 163

Fatal Error T y pes 164

T y pes of error_state T r ansition Errors 164

Urgent Error 165

URGENT ERROR S T ATUS (ASI_UGESR) 165

Action of async_data_error (ADE) Trap 168

Instruction End-Method at ADE Trap 170

Expected Software Handling of ADE Trap 171

Instruction Access Errors 173

Data Access Errors 173

Restrainable Errors 174

ASI_ASYNC_FAULT_STATUS (ASI_AFSR) 174

ASI_ASYNC_FAULT_ADDR_D1 177

ASI_ASYNC_FAULT_ADDR_U2 178

Expected Software Handling of Restrainable Errors 179

Handling of Internal Register Errors 181

ASR Error Handling 182

ASI Register Error Handling 183

Cache Error Handling 188

Handling of a Cache Tag Error 188

Handling of an I1 Cache Data Error 190

Handling of a D1 Cache Data Error 190

Handling of a U2 Cache Data Error 192

Automatic Way Reduction of I1 Cache, D1 Cache, and U2 Cache 193

SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002

TLB Error Handling 195

Handling of TLB Entry Errors 195

Automatic Way Reduction of sTLB 196

Handling of Extended UPA Bus Interface Error 197

Handling of Extended UPA Address Bus Error 197

Handling of Extended UPA Data Bus Error 197

Q. Performance Instrumentation 201

Performance Monitor Overview 201

S a mple Pseudocodes 201

Performance Monitor Description 203

Instruction Statistics 204

Trap-Related Statistics 206

MMU Event Counters 207

Cache Event Counters 208

UPA Event Counters 210

Miscellaneous Counters 2 1 1

R. UPA Programmer ’s M odel 213

Mapping of the CPU ’s U P A P ort Slave Area 213

UPA PortID Register 214

UPA Config Register 215

S. Summary of Differences between SPARC64 V and UltraSPARC-III 219

Bibliography 223

General References 223

Index 225

Release 1.0, 1 July 2002

F. Chapter

Contents

vii

viii

SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002

F.CHAPTER

Overview

1.1

Navigating the SPARC64 V

I m plementation Supplement

We suggest that you approach this I m plem entation Supplement SPARC Joint

Programming Specification as follows.

1. Familiarize yourself with the SPARC64 V processor and its components by

reading these sections:

■ The SPARC64 V processor on page 2

■ Compo n e n t Overview on page 4

■ Processor Pipeline on page 31

2. Study the terminology in Chapter 2, Definitions:

3. For details of architectural changes, see the remaining chapters in this

Implementation Supplement as your interests direct.

For this revision, we added new appendixes: Appendix R, UPA Programmer’s Model,

and Appendix S, Summary of Differences between SPARC64 V and UltraSPARC-III.

1.2

Fonts and Notational Conventions

Please refer to Section 1.2 of Commonality for font and notational conventions.

1.3

The SPARC64 V processor

The SPARC64 V processor is a high-performance, high-reliability, and high-integrity

processor that fully implements the instruction set architecture that conforms to

SPARC V9, as described in JPS1 Commonality. In addition, the SPARC64 V processor

implements the following features:

■ 64-bit virtual address space and 43-bit physical address space

■ Advanced RAS features that enable high-integrity error handling

Microarchitecture for High Performance

The SPARC64 V is an out-of-order execution superscalar processor that issues up to

four instructions per cycle. Instructions in the predicted path are issued in program

order and are stored temporarily in reservation stations until they are dispatched out

of program order to appropriate execution units. Instructions commit in program

order when no exceptional conditions occur during execution and all prior

instructions commit (that is, the result of the instruction execution becomes visible).

Out-of-order execution in SPARC64 V contributes to high performance.

SPARC64 V implements a large branch history buffer to predict its instruction path.

The history buffer is large enough to sustain a good prediction rate for large-scale

programs such as DBMS and to support the advanced instruction fetch mechanism

of SPARC64 V. This instruction fetch scheme predicts the execution path beyond the

multiple conditional branches in accordance with the branch history. It then tries to

prefetch instructions on the predicted path as much as possible to reduce the effect

of the performance penalty caused by instruction cache misses.

High Integration

SPARC64 V integrates an on-board, associative, level-2 cache. The level-2 cache is

unified for instruction and data. It is the lowest layer in the cache hierarchy.

This integration contributes to both performance and reliability of SPARC64 V. It

enables shorter access time and more associativity and thus contributes to higher

performance. It contributes to higher reliability by eliminating the external

connections for level-2 cache.

High Reliability and High Integrity

SPARC64 V implements the following advanced RAS features for reliability and

integrity beyond that of ordinary microprocessors.

SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002

1. Advanced RAS features for caches

■ Strong cache error protection:

■

ECC protection for D1 (Data level 1) cache data, U2 (unified level 2) cache data,

and the U2 cache tag.

■

Parity protection for I1 (Instruction level 1) cache data.

Parity protection and duplication for the I1 cache tag and the D1 cache tag.

■ Automatic correction of all types of single-bit error:

■

Automatic single-bit error correction for the ECC protected data.

Invalidation and refilling of I1 cache data for the I1 cache data parity error.

Copying from duplicated tag for I1 cache tag and D1 cache tag parity errors.

■ Dynamic way reduction while cache consistency is maintained.

■ Error marking for cacheable data uncorrectable errors:

■

Special error-marking pattern for cacheable data with uncorrectable errors. The

identification of the module that first detects the error is embedded in the

special pattern.

■

Error-source isolation with faulty module identification in the special error-

marking. The identification information enables the processor to avoid

repetitive error logging for the same error cause.

2. Advanced RAS features for the core

■ Strong error protection:

■

Parity protection for all data paths.

■

Parity protection for most of software-visible registers and internal temporary

registers.

■

Parity prediction or residue checking for the accumulator output.

■ Hardware instruction retry

■ Support for software instruction retry (after failure of hardware instruction retry)

■ Error isolation for software recovery:

■

Error indication for each programmable register group.

Indication of retryability of the trapped instruction.

Use of different error traps to differentiate degrees of adverse effects on the

CPU and the system.

3. Extended RAS interface to software

■ Error classification according to the severity of the effect on program execution:

■

Urgent error (nonmaskable): Unable to continue execution without OS

intervention; reported through a trap.

■

Restrainable error (maskable): OS controls whether the error is reported

through a trap, so error does not directly affect program execution.

■ Isolated error indication to determine the effect on software

Release 1.0, 1 July 2002

F. Chapter 1

Overview

■ Asynchronous data error (ADE) trap for additional errors:

■

Relaxed instruction end method (precise, retryable, not retryable) for the

async_data_error exception to indicate how the instruction should end; depends

on the executing instruction and the detected error.

■

Some ADE traps that are deferred but retryable.

Simultaneous reporting of all detected ADE errors at the error barrier for correct

handling of retryability.

1.3.1

Component Overview

The SPARC64 V processor contains these components.

■ Instruction control Unit (IU)

■ Execution Unit (EU)

■ Storage Unit (SU)

■ Secondary cache and eXternal access Unit (SXU)

FIGURE 1-1 illustrates the major units; the following subsections describe them.

SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002

Extended UPA Bus

E-Unit

SX-Unit

UPA interface logic

MoveOut buffer

MoveIn buffer

U2$

tag

U2$ data

2M 4-way

ALUs

ALU

Input

Registers

and

Output

Registers

EXA

EXB

FLA

FLB

EAGA

S-Unit interface

S-Unit

SX interface

EAGB

SX order queue Store queue

GUB

GPR

FUB

FPR

I-TLB

tag

data

D-TLB

2048

+ 32

entry

tag

data

2048

+ 32

entry

Level-1 I cache

128 KB, 2-way

Level-1 D cache

128 KB, 2-way

I-Unit

E-unit

Commit stack entry

Reservation stations

Instruction Instruction

nPC

CCR

FSR

fetch

buffer

control

logic

pipeline

Branch

history

FIGURE 1-1 SPARC64 V Major Units

Release 1.0, 1 July 2002

F. Chapter 1

Overview

1.3.2

Instruction Control Unit (IU)

The IU predicts the instruction execution path, fetches instructions on the predicted

path, distributes the fetched instructions to appropriate reservation stations, and

dispatches the instructions to the execution pipeline. The instructions are executed

out of order, and the IU commits the instructions in order. Major blocks are defined

in TABLE 1-1.

TABLE 1-1

Name

Instruction Control Unit Major Blocks

Description

Instruction fetch pipeline Five stages: fetch address generation, iTLB access, iTLB match,

I-Cache fetch, and a write to I-buffer.

Branch history

16K entries, 4-way set associative.

Six entries, 32 bytes/ entry.

Instruction buffer

Reservation station

Six reservation stations to hold instructions until they can

execute: RSBRfor branch and the other control-transfer

instructions; RSAfor load/ store instructions; RSEAand RSEBfor

integer arithmetic instructions; RSFAand RSFBfor floating-point

arithmetic and VIS instructions.

Commit stack entries

Sixty-four entries; basically one instruction/ en tr y , t o hold

information about instructions issued but not yet committed.

PC, nPC, CCR, FSR

Program-visible registers for instruction execution control.

1.3.3

Execution Unit (EU)

The EU carries out execution of all integer arithmetic, logical, shift instructions, all

floating-point instructions, and all VIS graphic instructions. TABLE 1-2 describes the

EU major blocks.

TABLE 1-2

Name

Execution Unit Major Blocks

Description

General register (gr) renaming

buffer)

Thirty-two entries, 8 read ports, 2 write ports

Gr architecture register file (GPR) 160 entries, 1 read port, 2 write ports

Floating-point (fr) renaming

buffer)

Thirty-two entries, 8 read ports, 2 write ports

Fr architecture register file (FPR) Thirty-two entries,

6 read ports, 2 write ports

EU control logic

Controls the instruction execution stages: instruction

selection, register read, and execution.

SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002

TABLE 1-2

Name

Execution Unit Major Blocks (Continued)

Description

Interface registers

Input/ output registers to other units.

Two integer execution pipelines 64-bit ALU and shifters.

(EXA, EXB)

Two floating-point and graphics Each floating-point execution pipeline can execute floating

execution pipelines (FLA, FLB) point multiply, floating point add/ sub, floating-point

multiply and add, floating point div/ sqrt, and floating-

point graphics instruction.

Two virtual address adders for

memory access pipeline (EAGA,

EAGB)

Two 64-bit virtual addresses for load/ store.

1.3.4

Storage Unit (SU)

The SU handles all sourcing and sinking of data for load and store instructions.

TABLE 1-3 describes the SU major blocks.

TABLE 1-3

Name

Storage Unit Major Blocks

Description

Instruction level-1 cache

Data level-1 cache

128-Kbyte, 2-way associative, 64-byte line; provides low latency

instruction source

128-Kbyte, 2-way associative, 64-byte line, writeback; provides

the low latency data source for loads and stores.

Instruction Translation

Buffer

1024 entries, 2-way associative TLB for 8-Kbyte pages,

1024 entries, 2-way associative TLB for 4-Mbyte pages ,

32 entries, fully associative TLB for unlocked 64-Kbyte, 512-

Kbyte, 4-Mbyte¹pages and locked pages in all sizes.

Data Translation Buffer

1024 entries, 2-way associative TLB for 8-Kbyte pages,

1024 entries, 2-way associative TLB for 4-Mbyte pages¹,

32 entries, fully associative TLB for unlocked 64-Kbyte, 512-

Kbyte, 4-Mbyte¹pages and locked pages in all sizes.

Store queue

Decouples the pipeline from the latency of store operations.

Allows the pipeline to continue flowing while the store waits for

data, and eventually writes into the data level 1 cache.

1. Unloced 4-Mbyte page entry is stored either in 2-way associative TLB or fully associative

TLB exclusively, depending on the setting.

Release 1.0, 1 July 2002

F. Chapter 1

Overview

1.3.5

Secondary Cache and External Access Unit (SXU)

The SXU controls the operation of unified level-2 caches and the external data access

interface (extended UPA interface). TABLE 1-4 describes the major blocks of the SXU.

TABLE 1-4

Name

Secondary Cache and External Access Unit Major Blocks

Description

Unified level-2 cache

Movein buffer

2-Mbyte, 4-way associative, 64-byte line, writeback; provides low

latency data source for both instruction level-1 cache and data

level-1 cache.

Sixteen entries, 64-bytes/ entry; catches returning data from

memory system in response to the cache line read request. A

maximum of 16 outstanding cache read operations can be issued.

Moveout buffer

Eight entries, 64-bytes/ entry; holds writeback data. A maximum

of 8 outstanding writeback requests can be issued.

Extended UPA interface

control logic

Send/ receive transaction packets to/ from Extended UPA

interface connected to the system.

SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002

F.CHAPTER

Definitions

This chapter defines concepts unique to the SPARC64 V, the Fujitsu implementation

of SPARC JPS1. For definition of terms that are common to all implementations,

please refer to Chapter 2 of Commonality.

committed Term applied to an instruction when it has completed without error and all

prior instructions have completed without error and have been committed. When

an instruction is committed, the state of the machine is permanently changed

to reflect the result of the instruction; the previously existing state is no longer

needed and can be discarded.

completed Term applied to an instruction after it has finished, has sent a nonerror status to

the issue unit, and all of its source operands are nonspeculative. Note:

Although the state of the machine has been temporarily altered by completion

of an instruction, the state has not yet been permanently changed and the old

state can be recovered until the instruction has been committed.

executed Term applied to an instruction that has been processed by an execution unit

such as a load unit. An instruction is in execution as long as it is still being

processed by an execution unit.

fetched Term applied to an instruction that is obtained from the I2 instruction cache or

from the on-chip internal cache and sent to the issue unit.

finished Term applied to an instruction when it has completed execution in a functional

unit and has forwarded its result onto a result bus. Results on the result bus are

transferred to the register file, as are the waiting instructions in the instruction

queues.

initiated Term applied to an instruction when it has all of the resources that it needs (for

example, source operands) and has been selected for execution.

instruction dispatch Synonym: instruction initiation.

instruction issued Term applied to an instruction when it has been dispatched to a reservation

station.

instruction retired Term applied to an instruction when all machine resources (serial numbers,

renamed registers) have been reclaimed and are available for use by other

instructions. An instruction can only be retired after it has been committed.

instruction stall Term applied to an instruction that is not allowed to be issued. Not every

instruction can be issued in a given cycle. The SPARC64 V implementation

imposes certain issue constraints based on resource availability and program

requirements.

issue-stalling

instruction An instruction that prevents new instructions from being issued until it has

committed.

machine sync The state of a machine when all previously executing instructions have

committed; that is, when no issued but uncommitted instructions are in the

machine.

Memory Management

Unit (MMU) Refers to the address translation hardware in SPARC64 V that translates 64-bit

virtual address into physical address. The MMU is composed of the mITLB,

mDTLB, uITLB, uDTLB, and the ASI registers used to manage address

translation.

mTLB Main TLB. Split into I and D, called mITLB and mDTLB, respectively. Contains

address translations for the uITLB and uDTLB. When the uITLB or uDTLB do

not contain a translation, they ask the mTLB for the translation. If the mTLB

contains the translation, it sends the translation to the respective uTLB. If the

mTLB does not contain the translation, it generates a fast access exception to a

software translation trap handler, which will load the translation information

(TTE) into the mTLB and retry the access. See also TLB.

uDTLB Micro Data TLB. A small, fully associative buffer that contains address

translations for data accesses. Misses in the uDTLB are handled by the mTLB.

uITLB Micro Instruction TLB. A small, fully associative buffer that contains address

translations for instruction accesses. Misses in the uTLB are handled by the

mTLB.

nonspeculative A distribution system whereby a result is guaranteed known correct or an

operand state is known to be valid. SPARC64 V employs speculative

distribution, meaning that results can be distributed from functional units

before the point at which guaranteed validity of the result is known.

reclaimed The status when all instruction-related resources that were held until commit

have been released and are available for subsequent instructions. Instruction

resources are usually reclaimed a few cycles after they are committed.

rename registers A large set of hardware registers implemented by SPARC64 V that are invisible

to the programmer. Before instructions are issued, source and destination

registers are mapped onto this set of rename registers. This allows instructions

that normally would be blocked, waiting for an architected register, to proceed

SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002

in parallel. When instructions are committed, results in renamed registers are

posted to the architected registers in the proper sequence to produce the correct

program results.

scan A method used to initialize all of the machine state within a chip. In a chip that

has been designed to be scannable, all of the machine state is connected in one

or several loops called “scan rings.” Initialization data can be scanned into the

chip through the scan rings. The state of the machine also can be scanned out

through the scan rings.

reservation station A holding location that buffers dispatched instructions until all input operands

are available. SPARC64 V implements dataflow execution based on operand

availability. When operands are available, the instructions in the reservation

station are scheduled for execution. Reservation stations also contain special

tag-matching logic that captures the appropriate operand data. Reservation

stations are sometimes referred to as queues (for example, the integer queue).

speculative A distribution system whereby a result is not guaranteed as known to be

correct or an operand state is not known to be valid. SPARC64 V employs

speculative distribution, meaning results can be distributed from functional

units before the point at which guaranteed validity of the result is known.

superscalar An implementation that allows several instructions to be issued, executed, and

committed in one clock cycle. SPARC64 V issues up to 4 instructions per clock

cycle.

sync Synonym: machine sync.

syncing instruction An instruction that causes a machine sync. Thus, before a syncing instruction is

issued, all previous instructions (in program order) must have been committed.

At that point, the syncing instruction is issued, executed, completed, and

committed by itself.

TLB Translation lookaside buffer.

Release 1.0, 1 July 2002

F. Chapter 2

Definitions

SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002

F.CHAPTER

Architectural Overview

Please refer to Chapter 3 in the Commonality section of SPARC Joint Programming

Specification.

SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002

F.CHAPTER

Data Formats

Please refer to Chapter 4, Data Formats in Commonality.

SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002

F.CHAPTER

Registers

The SPARC64 V processor includes two types of registers: general-purpose—that is,

working, data, control/ status—and ASI registers.

The SPARC V9 architecture also defines two implementation-dependent registers:

the IU Deferred-Trap Queue and the Floating-Point Deferred-Trap Queue (FQ);

SPARC64 V does not need or contain either queue. All processor traps caused by

instruction execution are precise, and there are several disrupting traps caused by

asynchronous events, such as interrupts, asynchronous error conditions, and

RED_stateentry traps.

For general information, please see parallel subsections of Chapter 5 in

Commonality. For easier referencing, this chapter follows the organization of

Chapter 5 in Commonality.

For information on MMU registers, please refer to Section F.10, Internal Registers and

ASI operations, on page 92.

The chapter contains these sections:

■ Nonprivileged Registers on page 17

■ Privileged Registers on page 19

5.1

Nonprivileged Registers

Most of the definitions for the registers are as described in the corresponding

sections of Commonality. Only SPARC64 V-specific features are described in this

section.

5.1.7

Floating-Point State Register (FSR)

Please refer to Section 5.1.7 of Commonality for the description of FSR.

The sections below describe SPARC64 V-specific features of the FSRregister.

FSR_n onstandard_fp (NS)

SPARC V9 defines the FSR.NSbit which, when set to 1, causes the FPU to produce

implementation-dependent results that may not conform to IEEE Std 754-1985.

SPARC64 V implements this bit.

When FSR.NS= 1, denormal input operands and denormal results that would

otherwise trap are flushed to 0 of the same sign and an inexact exception is signalled

(that may be masked by FSR.TEM.NXM). See Section B.6, Floating-Point Nonstandard

Mode, on page 61 for details.

When FSR.NS= 0, the normal IEEE Std 754-1985 behavior is implemented.

FSR_version (ver)

For each SPARC V9 IU implementation (as identified by its VER.implfield), there

may be one or more FPU implementations or none. This field identifies the

particular FPU implementation present. For the first S P A RC64 V, FSR.ve r = 0 (i m pl.

dep. #19); however, future versions of the architecture may set FSR.verto other

values. Consult the SPARC64 V Data Sheet for the setting of FSR.verfor your

chipset.

FSR_floating-point_trap_type (ftt)

The complete conditions under which SPARC64 V triggers fp_exception_other with

trap type unfinished_FPop is described in Section B.6, Floating-Point Nonstandard Mode,

on page 61 (impl. dep. #248).

FSR_current_exception (cexc)

Bits 4 through 0 indicate that one or more IEEE_754 floating-point exceptions were

generated by the most recently executed FPop instruction. The absence of an

exception causes the corresponding bit to be cleared.

In SPARC64 V, the cexcbits are set according to the following pseudocode:

if (<LDFSR or LDXFSR commits>)

<update using data from LDFSR or LDXFSR>;

else if (<FPop commits with ftt = 0>)

SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002

else if (<FPop commits with IEEE_754_exception>)

<set one bit in the CEXC field as supplied by FPU>;

else if (<FPop commits with unfinished_FPop error>)

<no change>;

else if (<FPop commits with unimplemented_FPop error>)

<no change>;

else

<no change>;

FSR Conformance

SPARC V9 allows the TEM, cexc, and aexcfields to be implemented in hardware in

either of two ways (both of which comply with IEEE Std 754-1985). SPARC64 V

follows case (1); that is, it implements all three fields in conformance with IEEE Std

754-1985. See FSR Conformance in Section 5.1.7 of Commonality for more

information about other implementation methods.

5.1.9

Tick (TICK) Register

SPARC64 V implements TICK.counterregister as a 63-bit register (impl. dep.

#105).

Implementation Note – On SPARC64 V, the counterpart of the value returned

when the TICKregister is read is the value of TICK.counterwhen the RDTICK

instruction is executed. The difference between the countervalues read from the

TICKregister on two reads reflects the number of processor cycles executed between

the executions of the RDTICKinstructions, not their commits. In longer code

sequences, the difference between this value and the value that would have been

obtained when the instructions are committed would have been small.

5.2

Privileged Registers

Please refer to Section 5.2 of Commonality for the description of privileged registers.

5.2.6

Trap State (TSTATE) Register

SPARC64 V implements only bits 2:0 of the TSTATE.CWPfield. Writes to bits 4 and 3

are ignored, and reads of these bits always return zeroes.

Release 1.0, 1 July 2002

F. Chapter 5

Registers

Note – S p urious setting of the PSTATE.REDbit by privileged software should not

be performed, since it will take the SPARC64 V into RED_statewithout the

required sequencing.

5.2.9

Version (VER) Register

TABLE 5-1 shows the values for the VERregister for SPARC64 V.

TABLE 5-1 VERRegister Encodings

Bits

Field

Value

63:48

47:32

31:24

15:8

4:0

manuf

impl

0004₁₆(impl. dep. #104)

5 (impl. dep. #13)

mask

n (The value of n depends on the processor chip version)

maxtl

maxwin

The manuffield contains Fujitsu’s 8-bit JEDEC code in the lower 8 bits and zeroes in

the upper 8 bits. The manuf, impl, and maskfields are implemented so that they

may change in future SPARC64 V processor versions. The maskfield is incremented

by 1 any time a programmer-visible revision is made to the processor. See the

SPARC64 V Data Sheet to determine the current setting of the maskfield.

5.2.11

Ancillary State Registers (ASRs)

Please refer to Section 5.2.11 of Commonality for details of the ASRs.

Performance Control Register (PCR) (ASR 16)

SPARC64 V implements the PCRregister as described in SPARC JPS1 Commonality,

with additional features as described in this section.

In SPARC64 V, the accessibility of PCRwhen PSTATE.PRIV= 0 is determined by

PCR.PRIV. If PSTATE.PRIV= 0 and PCR.PRIV= 1, an attempt to execute either

RDPCRor WRPCRwill cause a privileged_action exception. If PSTATE.PRIV= 0 and

PCR.PRIV= 0, RDPCRoperates without privilege violation and WRPCRcauses a

privileged_action exception only when an attempt is made to change (that is, write 1

to) PCR.PRIV(impl. dep. #250).

See Appendix Q, Performance Instrumentation, for a detailed discussion of the PCR

and PICregister usage and event count definitions.

SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002

The Performance Control Register in SPARC64 V is illustrated in FIGURE 5-1 and

described in TABLE 5-2.

OVF

OVRO 0

ULRO UT ST PRIV

48 47 32 31 27 26

25 24 22 21

18 17 16 11 10

FIGURE 5-1 SPARC64 V Performance Control Register (PCR) (ASR 16)

TABLE 5-2 PCRBit Description

Bit

Field

Description

47:32

OVF

Overflow Clear/ Set/ Status. Used to read counter overflow status (via RDPCR) and clear

or set counter overflow status bits (via WRPCR). PCR.OVFis a SPARC64 V-specific field

(impl. dep. #207).

The following figure depicts the bit layout of SPARC64 V OVFfield for four counter

pairs. Counter status bits are cleared on write of 0 to the appropriate OVFbit.

U3 L3 U2 L2 U1 L1 U0 L0

OVRO

Overflow read-only. Write-only/ read-as-zero field specifying PCR.OVFupdate behavior

for WRPCR.PCR. The OVROfield is implementation -dependent (impl. dep. #207).

WRPCR.PCRwith PCR.OVRO = 1inhibits updating of PCR.OVFfor the current write

only. The intention of PCR.OVROis to write PCRwhile preserving current PCR.OVF

value. PCR.OVFis maintained internally by hardware, so a subsequent RDPCR.PCR

returns accurate overflow status at the time.

24:22

20:18

Number of counter pairs. Three-bit, read-only field specifying the number of counter

pairs, encoded as 0–7 for 1–8 counter pairs (impl. dep. #207).

For SPARC64 V, the hardcoded value of NCis 3 (indicating presence of 4 counter pairs).

Select PIC. In SPARC64 V, three-bit field specifying which counter pair is currently

selected as PIC(ASR 17) and which SU/ SLvalues are visible to software. On write,

PCR.SCselects which counter pair is updated (unless PCR.ULROis set; see below). On

read, PCR.SCselects which counter pair is to be read through PIC(ASR 17).

16:11

9:4

Defined (as S1) in SPARC JPS1 Commonality.

Defined (as S0) in SPARC JPS1 Commonality.

ULRO

Implementation-dependent field (impl. dep. #207) that specifies whether SU/ SLare

read-only. In SPARC64 V, this field is write-only/ read-as-zero, specifying update

behavior of SU/ SLon write. When PCR.ULRO= 1, SU/ SLare considered as read-only;

the values set on PCR.SU/PCR.SLare not written into SU/SL. When PCR.ULRO= 0,

SU/SLare updated. PCR.ULROis intended to switch visible PICby writing PCR.SC,

without affecting current selection of SU/SLof that PIC. On PCRread, PCR.SU/PCR.SL

always shows the current setting of the PICregardless of PCR.ULRO.

Defined in SPARC JPS1 Commonality.

Release 1.0, 1 July 2002

F. Chapter 5

Registers

TABLE 5-2 PCRBit Description (Continued)

Bit

Field

Description

PRIV

Defined in SPARC JPS1 Commonality, with the additional function of controlling PCR

accessibility as described above (impl. dep. #250).

Performance Instrumentation Counter (PIC) Register (ASR

17)

The PICregister is implemented as described in SPARC JPS1 Commonality.

Four PICs are implemented in SPARC64 V. Each is accessed through ASR 17, using

PCR.SCas a select field. Read/ write access to the PICwill access the PICU/ PICL

counter pair selected by PCR. For PICU/ PICLencodings of specific event counters,

see Appendix Q, Performance Instrumentation.

Counter Overflow.On overflow, counters wrap to 0, SOFTINTregister bit 15 is set,

and an interrupt level-15 exception is generated. The counter overflow trap is

triggered on the transition from value FFFFFFFF₁₆to value 0. If multiple overflows

are generated simultaneously, then multiple overflow status bits will be set. If

overflow status bits are already set, then they remain set on counter overflow.

Overflow status bits are cleared by software writing 0 to the appropriate bit of

PCR.OVFand may be set by writing 1 to the appropriate bit. Setting these bits by

software does not generate a level 15 interrupt.

Dispatch Control Register (DCR) (ASR 18)

The DCRis not implemented in SPARC64 V. Zero is returned on read, and writes to

the register are ignored. The DCRis a privileged register; attempted access by

nonprivileged (user) code generates a privileged_opcode exception.

5.2.12

Registers Referenced Through ASIs

Data Cache Unit Control Register (DCUCR)

ASI 45₁₆(ASI_DCU_CONTROL_REGISTER), VA = 0₁₆

The Data Cache Unit Control Register contains fields that control several memory-

related hardware functions. The functions include Instruction, Prefetch, write and

data caches, MMUs, and watchpoint setting. SPARC64 V implements most of

DCUCUR’s functions described in Section 5.2.12 of Commonality.

SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002

After a power-on reset (POR), all fields of DCUCR, including implementation-

dependent fields, are set to 0. After a WDR, XIR, or SIRreset, all fields of DCUCR,

including implementation-dependent fields, are set to 0.

The Data Cache Unit Control Register is illustrated in FIGURE 5-2 and described in

TABLE 5-3. In the table, bits are grouped by function rather than by strict bit sequence.

—

Implementation dependent

WEAK_SPCA PM

VM PR PW VR VW

21 20

DM IM

—

49 48 47 42

41 40 33 32 25 24 23 22

FIGURE 5-2 DCU Control Register Access Data Format (ASI 45

)

TABLE 5-3

Bits

DCUCR Description

Field

Type

Use — Description

49:48

CP, CV

Not implemented in SPARC64 V (impl. dep. #232). It reads as 0 and writes to

it are ignored.

47:42

impl. dep.

Not used. It reads as 0 and writes to it are ignored.

WEAK_SPCA

Used for disabling speculative memory access (impl. dep. #240). When

DCUCR.WEAK_SPCA= 1, the branch history table is cleared and no longer

issues aggressive instruction prefetch.

During DCUCR.WEAK_SPCA= 1, aggressive instruction prefetching is

disabled and any load and store instructions are considered presync

instructions that are executed when all previous instructions are committed.

Because all CTI are considered as not taken, instructions residing beyond 1

Kbyte of a CTI may be fetched and executed.

On entering aggressive instruction Prefetch disable mode, supervisor

software should issue membar #Sync, to make sure all in-flight instructions

in the pipeline are discarded.

During DCUCR.WEAK_SPCA= 1, an L2 cache flush by writing 1 to

ASI_L2_CTRL.U2_FLUSHremains pending internally until

DCUCR.WEAK_SPCAis set to 0. To wait for completion of the cache flush, a

member #Syncmust be issued after DCUCR.WEAK_SPCAis set to 0.

Executing a membar #Syncwhile the DCUCR.WEAK_SPCA= 1 after writing 1

to ASI_L2_CTRL.U2_FLUSHdoes not wait for the cache flush to complete.

40:33

32:25

24, 23

22, 21

20:4

PM<7:0>

VM<7:0>

PR, PW

VR, VW

—

Defined in SPARC JPS1 Commonality.

Reserved.

Defined in SPARC JPS1 Commonality.

Release 1.0, 1 July 2002

F. Chapter 5

Registers

TABLE 5-3

DCUCR Description (Continued)

Bits

Field

Type

Use — Description

Not implemented in SPARC64 V (impl. dep. #252). It reads as 0 and writes to

it are ignored.

Not implemented in SPARC64 V (impl. dep. #253). It reads as 0 and writes to

it are ignored.

Data Watchpoint Registers

No implementation-dependent feature of SPARC64 V reduces the reliability of data

watchpoints (impl. dep. #244).

SPARC64 V employs conservative check of PA/ VA watchpoint over partial store

instruction. See Section A.42, Partial Store (VIS I), on page 57 for details.

Instruction Trap Register

SPARC64 V implements the Instruction Trap Register (impl. dep. #205).

In SPARC64 V, the least significant 11 bits (bits 10:0) of a CALLor branch (BPcc,

FBPfcc, Bicc, BPr) instruction in an instruction cache are identical to their

architectural encoding (as it appears in main memory) (impl. dep. #245).

5.2.13

5.2.14

Floating-Point Deferred-Trap Queue (FQ)

SPARC64 V does not contain a Floating-Point Deferred-trap Queue (impl. dep. #24).

An attempt to read FQwith an RDPRinstruction generates an illegal_instruction

exception (impl. dep. #25).

IU Deferred-Trap Queue

SPARC64 V neither has nor needs an IU deferred-trap queue (impl. dep. #16)

SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002

F.CHAPTER

Instructions

This chapter presents SPARC64 V implementation-specific instruction details and the

processor pipeline information in these subsections:

■ Instruction Execution on page 25

■ Instruction Formats and Fields on page 28

■ Instruction Categories on page 29

■ Processor Pipeline on page 31

For additional, general information, please see parallel subsections of Chapter 6 in

Commonality. For easy referencing, we follow the organization of Chapter 6 in

Commonality.

6.1

Instruction Execution

SPARC64 V is an advanced superscalar implementation of SPARC V9. Several

instructions may be issued and executed in parallel. Although SPARC64 V provides

serial program execution semantics, some of the implementation characteristics

described below are part of the architecture visible to software for correctness and

efficiency. The affected software includes optimizing compilers and supervisor code.

6.1.1

Data Prefetch

SPARC64 V employs speculative (out of program order) execution of instructions; in

most cases, the effect of these instructions can be undone if the speculation proves to

be incorrect.¹However, exceptions can occur because of speculative data

prefetching. Formally, SPARC64 V employs the following rules regarding speculative

prefetching:

1. An async_data_error may be signalled during speculative data prefetching.

1. If a memory operation y resolves to a volatile memory address (location[y]),

SPARC64 V will not speculatively prefetch location[y] for any reason; location[y]

will be fetched or stored to only when operation y is commitable.

2. If a memory operation y resolves to a nonvolatile memory address (location[y]),

SPARC64 V may speculatively prefetch location[y] subject, adhering to the

following subrules:

a. If an operation y can be speculatively prefetched according to the prior rule,

operations with store semantics are speculatively prefetched for ownership

only if they are prefetched to cacheable locations. Operations without store

semantics are speculatively prefetched even if they are noncacheable as long as

they are not volatile.

b. Atomic operations (CAS(X)A, LDSTUB, SWAP) are never speculatively

prefetched.

SPARC64 V provides two mechanisms to avoid speculative execution of a load:

1. Avoid speculation by disallowing speculative accesses to certain memory pages or

I/ O spaces. This can be done by setting the E(side-effect) bit in the PTEfor all

memory pages that should not allow speculation. All accesses made to memory

pages that have the Ebit set in their PTEwill be delayed until they are no longer

speculative or until they are cancelled. See Appendix F, Memory Management Unit,

for details.

2. Alternate space load instructions that force program order, such as

ASI_PHYS_BYPASS_WITH_EBIT[_L] (AS I = 15₁₆, 1D₁₆), will not be speculatively

executed.

6.1.2

Instruction Prefetch

The processor prefetches instructions to minimize cases where the processor must

wait for instruction fetch. In combination with branch prediction, prefetching may

cause the processor to access instructions that are not subsequently executed. In

some cases, the speculative instruction accesses will reference data pages.

SPARC64 V does not generate a trap for any exception that is caused by an

instruction fetch until all of the instructions before it (in program order) have been

committed.¹

1. Hardware errors and other asynchronous errors may generate a trap even if the instruction that caused the

trap is never committed.

SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002

6.1.3

Syncing Instructions

SPARC64 V has instructions, called syncing instructions, that stop execution for the

number of cycles it takes to clear the pipeline and to synchronize the processor.

There are two types of synchronization, pre and post. A presyncing instruction waits

for all previous instructions to commit, commits by itself, and then issues successive

instructions. A postsyncing instruction issues by itself and prevents the successive

instructions from issuing until it is committed. Some instructions have both pre- and

postsync attributes.

In SPARC64 V almost all instructions commit in order, but store instruction commit

before becoming globally visible. A few syncing instructions cause the processor to

discard prefetched instructions and to refetch the successive instructions. TABLE 6-1

lists all pre-/ postsync instructions and the effects of instruction execution.

TABLE 6-1 SPARC64 V Syncing Instructions

Presyncing

Wait for

Postsyncing

Discard

Opcode

Sync?

store global

visibility?

Sync?

prefetched

instructions?

ALIGNADDRESS{_LITTLE}

Yes

BMASK

DONE

Yes

FCMP(GT,LE,NE,EQ)(16,32)

Yes

FLUSH

FMOV(s,d)icc

FMOVr

LDD

Yes

LDDA

LDDFA

memory access with

ASI=ASI_PHYS_BYPASS_EC{_LITTLE},

ASI_PHYS_BYPASS_EC_WITH_E_BIT{_LITTLE}

LDFSR, LDXFSR

MEMBAR

MOVfcc

MULScc

PDIST

Yes

RDASR

Yes

RETRY

Yes

SIAM

STBAR

Yes

STD

Release 1.0, 1 July 2002

F. Chapter 6

Instructions

TABLE 6-1 SPARC64 V Syncing Instructions (Continued)

Presyncing

Wait for

Postsyncing

Discard

Opcode

Sync?

store global

visibility?

Sync?

prefetched

instructions?

STDA

Yes

STDFA

Yes

STFSR, STXFSR

Tcc

Yes

WRASR

1. When #cmask != 0.

2. WRGSRonly.

6.2

Instruction Formats and Fields

Instructions are encoded in five major 32-bit formats and several minor formats.

Please refer to Section 6.2 of Commonality for illustrations of four major formats.

FIGURE 6-1 illustrates Format 5, unique to SPARC64 V.

Format 5 (op = 2, op3 = 37₁₆): FMADD, FMSUB, FNMADD, and FNMSUB(in place of IMPDEP2B)

op3

rs1

rs3

var

size

rs2

31 30 29

25 24

19 18 17

14 13 12 11 10

FIGURE 6-1 Summary of Instruction Formats: Format 5

Instruction fields are those shown in Section 6.2 of Commonality. Three additional

fields are implemented in SPARC64 V. They are described in TABLE 6-2.

TABLE 6-2

Instruction Fields Specific to SPARC64 V

Bits

Field

Description

13:9

rs3

This 5-bit field is the address of the third fregister source operand for

the floating-point multiply-add and multiply-subtract instruction.

8.7

6.5

var

This 2-bit field specifies which specific operation (variation) to perform

for the floating-point multiply-add and multiply-subtract instructions

size

This 2-bit field specifies the size of the operands for the floating-point

multiply-add and multiply-subtract instructions.

SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002

Since size= 00 is not IMPDEP2Band since size= 11 assumed quad operations but

is not implemented in SPARC64 V, the instruction with size= 00 or 11 generates an

illegal_instruction exception in SPARC64 V.

6.3

Instruction Categories

SPARC V9 instructions comprise the categories listed below. All categories are

described in Section 6.3 of Commonality. Subsections in bold face are SPARC64 V

implementation dependencies.

■ Memory access

■ Memory synchronization

■ Integer arithmetic

■ Control transfer (CTI)

■ Conditional moves

■ Register window management

■ State register access

■ Privileged register access

■ Floating-point operate (FPop)

■ Implementation-dependent

6.3.3

Control-Transfer Instructions (CTIs)

These are the basic control-transfer instruction types:

■ Conditional branch (Bicc, BPcc, BPr, FBfcc, FBPfcc)

■ Unconditional branch

■ Call and link (CALL)

■ Jump and link (JMPL, RETURN)

■ Return from trap (DONE, RETRY)

■ Trap (Tcc)

Instructions other than CALLand JMPLare described in their entirety in Section 6.3.2

of Commonality. SPARC64 V implements CALLand JMPLas described below.

CALL and JMPL Instructions

SPARC64 V writes all 64 bits of the PCinto the destination register when

PSTATE.AM= 0. The upper 32 bits of r[15](CALL) or of r[rd](JMPL) are written

as zeroes when PSTATE.AM= 1 (impl. dep. #125).

Release 1.0, 1 July 2002

F. Chapter 6

Instructions

SPARC64 V implements JMPLand CALLreturn prediction hardware in a form of

special stack, called the Return Address Stack (RAS). Whenever a CALLor JMPLthat

writes to %o7(r[15]) occurs, SPARC64 V “pushes” the return address (PC+8) onto

the RAS. When either of the synthetic instructions retl (JMPL[%o7+8]) and ret (JMPL

[%i7+8]) are subsequently executed, the return address is predicted to be the

address stored on the top of the RAS and the RAS is “popped.” If the prediction in

the RAS is incorrect, SPARC64 V backs up and starts issuing instructions from the

correct target address. This backup takes a few extra cycles.

Programming Note – For maximum performance, software and compilers must

take into account how the RAS works. For example, tricks that do nonstandard

returns in hopes of boosting performance may require more cycles if they cause the

wrong RAS value to be used for predicting the address of the return. Heavily nested

calls can also cause earlier entries in the RAS to be overwritten by newer entries,

since the RAS only has a limited number of entries. Eventuall y , s ome return

addresses will be mispredicted because of the overflow of the RAS.

6.3.7

Floating-Point Operate (FPop) Instructions

The complete conditions of generating an fp_exception_other exception with

FSR.ftt= unfinished_FPop are described in Section B.6, Floating-Point Nonstandard

Mode on page 61.

The SPARC64 V-specific FMADDand FMSUBinstructions (described below) are also

floating-point operations. They require the floating-point unit to be enabled;

otherwise, an fp_disabled trap is generated. They also affect the FSR, like FPop

instructions. However, these instructions are not included in the FPop category and,

hence, reserved encodings in these opcodes generate an illegal_instruction exception, as

defined in Section 6.3.9 of Commonality.

6.3.8

Implementation-Dependent Instructions

SPARC64 V uses the IMPDEP2instruction to implement the Floating-Point Multiply-

Add/ Subtract and Negative Multiply-Add/ Subtract instructions; these have an op3

field = 37 (IMPDEP2). See Floating-Point Multiply-Add/Subtract on page 50 for fuller

definitions of these instructions. Opcode space is reserved in IMPDEP2for the quad-

precision forms of these instructions. However, SPARC64 V does not currently

implement the quad-precision forms, and the processor generates an illegal_instruction

exception if a quad-precision form is specified. Since these instructions are not part

of the required SPARC V9 architecture, the operating system does not supply

software emulation routines for the quad versions of these instructions.

SPARC64 V uses the IMPDEP1instruction to implement the graphics acceleration

instructions.

SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002

6.4

Processor Pipeline

The pipeline of SPARC64 V consists of fifteen stages, shown in FIGURE 6-2. Each

stage is referenced by one or two letters as follows:

6.4.1

Instruction Fetch Stages

■ IA (Instruction Address generation) — Calculate fetch target address.

■ IT (Instruction TLB Tag access) — Instruction TLB tag search. Search of BRHIS

and RAS is also started.

■ IM (Instruction TLB tag Match) — Check TLB tag is matched.

The result of BRHIS and RAS search is also available at this stage and is

forwarded to IA stage for subsequent fetch.

■ IB (Instruction cache Buffer read) — Read L1 cache data if TLB is hit.

■ IR (Instruction read Result) — Write to I-Buffer.

IA through IR stages are dedicated to instruction fetch. These stages work in concert

with the cache access unit to supply instructions to subsequent stages. The

instructions fetched from memory or cache are stored in the Instruction Buffer (I-

buffer). The I-buffer has six entries, each of which can hold 32-byte-aligned 32-byte

data (eight instructions).

SPARC64 V has a branch prediction mechanism and resources named BRHIS

(BRanch HIStory) and RAS (Return Address Stack). Instruction fetch stages use these

resources to determine fetch addresses.

Instruction fetch stages are designed so that they work independently of subsequent

stages as much as possible. And they can fetch instructions even when execution

stages stall. These stages fetch until the I-Buffer is full; further fetches are possible by

requesting prefetches to the L1 cache.

Release 1.0, 1 July 2002

F. Chapter 6

Instructions

IF EAG

iTLB

L1I

BRHIS

Instruction Buffer

IWR

RSFA

RSA

RSFB

RSEA

RSEB

RSBR

CSE

FXB

FXA

EXB

EXA

EAGA

EAGB

dTLB

FUB

GUB

L1D

FPR

GPR

ccr fsr PC nPC

FIGURE 6-2 SPARC64 V Pipeline

SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002

6.4.2

Issue Stages

■ E (Entry) — Instructions are passed from fetch stages.

■ D (Decode) — Assign resources and dispatch to reservation station (RS.)

SPARC64 V is an out-of-order execution CPU. It has six execution units (two of

arithmetic and logic unit, two of floating-point unit, two of load/ store unit). Each

unit except the load/ store unit has its own reservation station. E and D stages are

issue stages that decode instructions and dispatch them to the target RS. SPARC64 V

can issue up to four instructions per cycle.

The resources needed to execute an instruction are assigned in the issue stages. The

resources to be allocated include the following:

■

Commit stack entry (CSE)

Renaming registers of integer (GUB) and floating-point (FUB)

Entries of reservations stations

Memory access ports

Resources needed for an instruction are specific to the instruction, but all resources

must be assigned at these stages. In normal execution, assigned resources are

released at the very last stage of the pipeline, W-stage.¹Instructions between the E-

stage and W-stage are considered to be in-flight. When an exception is signalled, all

in-flight instructions and the resources used by them are released immediately. This

behavior enables the decoder to restart issuing instructions as quickly as possible.

The number of in-flight instructions depends on how many resources are needed by

them. The maximum number is 64.

6.4.3

Execution Stages

■ P (priority) — Select an instruction from those that have met the conditions for

execution.

■ B (buffer read) — Read register file, or receive forwarded data from another

pipelines.

■ X (execute) — Execution.

Instructions in reservation stations will be executed when certain conditions are met,

for example, the values of source registers are known, the execution unit is available.

Execution latency varies from one to many, depending on the instruction.

1. An entry in a reservation station is released at the X-stage.

Release 1.0, 1 July 2002

F. Chapter 6

Instructions

Execution Stages for Cache Access

Memory access requests are passed to the cache access pipeline after the target

address is calculated. Cache access stages work the same way as instruction fetch

stages, except for the handling of branch prediction. See Section 6.4.1, Instruction

Fetch Stages, for details. Stages in instruction fetch and cache access correspond as

follows:

Instruction Fetch Stages

Cache Access

When an exception is signalled, fetch ports and store ports used by memory access

instructions are released. The cache access pipeline itself remains working in order to

complete outgoing memory accesses. When data is returned, it is then stored to the

cache.

6.4.4

Completion Stages

■ U (Update) — Update of physical (renamed) register.

■ W (Write) — Update of architectural registers and retire; exception handling.

■ After an out-of-order execution, execution reverts to program order to complete.

Exception handling is done in the completion stages. Exceptions occurring in

execution stages are not handled immediately but are signalled when the

instruction is completed.¹

1. RAS-related exception may be signalled before completion.

SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002

F.CHAPTER

Trap s

Please refer to Chapter 7 of Commonality. Section numbers in this chapter

correspond to those in Chapter 7 of Commonality.

This chap ter adds S P A RC64 V-specific information in the following sections:

■ Processor States, Normal and Special Traps on page 35

■

RED_state on page 36

■

error_state on page 36

■ Trap Catego r i e s on page 37

■

Defe r r e d T r aps on page 37

Reset Traps on page 37

Uses of the Trap Categories on page 37

■ Trap Control on page 38

PIL Control on page 38

■ Trap-Table Entry Addresses on page 38

■

Trap Type (TT) on page 38

■

Details of Supported Traps on page 39

■ Exception and Interrupt Descriptions on page 39

7.1

Processor States, Normal and Special

Traps

Please refer to Section 7.1 of Commonality.

7.1.1

RED_state

RED_state Trap Table

The RED_statetrap vector is located at an implementation-dependent address

referred to as RSTVaddr. The value of RSTVaddris a constant within each

implementation; in SPARC64 V this virtual address is FFFFFFFFF0000000₁₆,

which translates to physical address 000007FFF0000000₁₆in RED_state(impl.

dep. #114).

RED_state Execution Environment

In RED_state, the processor is forced to execute in a restricted environment by

overriding the values of some processor controls and state registers.

Note – The values are overridden, not set, allowing them to be switched atomically.

SPARC64 V has the following implementation-dependent behavior in RED_state

(impl. dep. #115):

■ While in RED_state, all internal ITLB-based translation functions are disabled.

DTLB-based translations are disabled upon entry but may be reenabled by

software while in RED_state. However, ASI-based access functions to the TLBs

are still available.

■ While mTLBs and uTLBs are disabled, all accesses are assumed to be

noncacheable and strongly ordered for data access.

■ XIRerrors are not masked and can cause a trap.

Note – When RED_stateis entered because of component failures, the handler

should attempt to recover from potentially catastrophic error conditions or to disable

the failing components. When RED_stateis entered after a reset, the software

should create the environment necessary to restore the system to a running state.

7.1.2

error_state

The processor enters error_statewhen a trap occurs while the processor is

already at its maximum supported trap level (that is, when TL= MAXTL) (impl. dep.

#39).

SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002

Although the standard behavior of the CPU upon an entry into error_stateis to

internally generate a watchdog_reset (WDR), the CPU optionally stays halted upon an

entry to error_statedepending on a setting in the OPSR register (impl. dep #40,

#254).

7.2

Trap Categories

Please refer to Section 7.2 of Commonality.

An exception or interrupt request can cause any of the following trap types:

■ Precise trap

■ Deferred trap

■ Disrupting trap

■ Reset trap

7.2.2

Deferred Traps

Please refer to Section 7.2.2 of Commonality.

SPARC64 V implements a deferred trap to signal certain error conditions (impl. dep.

#32). Please refer to the description of I_UGE error on “Relation between %tpcand

the instruction that caused the error” row in TABLE P-2 (page 156) for details. See also

Instruction End-Method at ADE Trap on page 170.

7.2.4

7.2.5

Reset Traps

Please refer to Section 7.2.4 of Commonality.

In SPARC64 V, a watchdog reset (WDR) occurs when the processor has not

committed an instruction for 2³³processor clocks.

Uses of the Trap Categories

Please refer to Section 7.2.5 of Commonality.

All exceptions that occur as the result of program execution are precise in

SPARC64 V (impl. dep. #33).

An exception caused after the initial access of a multiple-access load or store

instruction (LDD(A), STD(A), LDSTUB, CASA, CASXA, or SWAP) that causes a

catastrophic exception is precise in SPARC64 V.

Release 1.0, 1 July 2002

F. Chapter 7

Traps

7.3

Trap Control

Please refer to Section 7.3 of Commonality.

7.3.1

PIL Control

SPARC64 V receives external interrupts from the UPA interconnect. They cause an

interrupt_vector_trap (TT = 60₁₆). The interrupt vector trap handler reads the interrupt

information and then schedules SPARC V9-compatible interrupts by writing bits in

the SOFTINTregister. Please refer to Section 5.2.11 of Commonality for details.

During handling of SPARC V9-compatible interrupts by SPARC64 V, the PIL

issuing new instructions, will flush all uncommitted instructions, and then will

vector to the trap handler. The only exception to this process occurs when

SPARC64 V is processing a higher-priority trap.

SPARC64 V takes a normal disrupting trap upon receipt of an interrupt request.

7.4

Trap-Table Entry Addresses

Please refer to Section 7.4 of Commonality.

7.4.2

Trap Type (TT)

Please refer to Section 7.4.2 of Commonality.

SPARC64 V implements all mandatory SPARC V9 and SPARC JPS1 exceptions, as

described in Chapter 7 of Commonality, plus the exception listed in TABLE 7-1, which

is specific to SPARC64 V (impl. dep. #35; impl. dep. #36).

TABLE 7-1

Exceptions Specific to SPARC64 V

Exception or Interrupt Request

Priority

async_data_error

040

SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002

7.4.4

Details of Supported Traps

Please refer to Section 7.4.4 in Commonality.

SPARC64 V Implementation-Specific Traps

SPARC64 V supports the following implementation-specific trap type:

■

async_data_error

7.5

Trap Processing

Please refer to Section 7.5 of Commonality.

7.6

Exception and Interrupt Descriptions

Please refer to Section 7.6 of Commonality.

7.6.4

SPARC V9 Implementation-Dependent, Optional

Traps That Are Mandatory in SPARC JPS1

Please refer to Section 7.6.4 of Commonality.

SPARC64 V implements all six traps that are implementation dependent in SPARC

V9 but mandatory in JPSI (impl. dep. #35). Se Section 7.6.4 of Commonality for

details.

7.6.5

SPARC JPS1 Implementation-Dependent Traps

Please refer to Section 7.6.5 of Commonality.

SPARC64 V implements the following traps that are implementation dependent

(impl. dep. #35).

■

async_data_error [tt= 040 ] (Preemptive or disrupting) (impl. dep. #218) —

SPARC64 V implements the async_data_error exception to signal the following

errors.

Release 1.0, 1 July 2002

F. Chapter 7

Traps

■

Uncorrectable errors in the internal architecture registers (general registers–gr,

floating-point registers–fr, ASR, ASI registers)

Uncorrectable errors in the core pipeline

System data corruption

Watch dog timeout first time

■

TLB access error upon access by an ldxaor stxainstruction

Multiple errors may be reported in a single generation of the async_data_error

exception. Depending on the situation, the async_data_error trap becomes a precise

trap, a disrupting trap, or a preemptive trap upon error detection. The TPCand

TNPCstacked by the exception may indicate the exact instruction, the preceding

instruction, or the subsequent instruction inducing the error. See Appendix P for

details of the async_data_error exception in SPARC64 V.

SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002

F.CHAPTER

Memory Models

The SPARC V9 architecture is a model that specifies the behavior observable by

software on SPARC V9 systems. Therefore, access to memory can be implemented in

any manner, as long as the behavior observed by software conforms to that of the

models described in Chapter 8 of Commonality and defined in Appendix D, Formal

Specification of the Memory Models, also in Commonality.

The SPARC V9 architecture defines three different memory models: Total St ore Order

(TSO), Partial Store Order (PSO), and Relaxed Memory Order (RMO). All SPARC V9

processors must provide Total Store Order (or a more strongly ordered model, for

example, Sequential Consistency) to ensure SPARC V8 compatibility.

Whether the PSO or RMO models are supported by SPARC V9 systems is

implementation dependent; SPARC64 V behaves in a manner that guarantees

adherence to whichever memory model is currently in effect.

This chapter describes the following major SPARC64 V-specific details of memory

models.

■ SPARC V9 Memory Model on page 42

For general information, please see parallel subsections of Chapter 8 in

Commonality. For easier referencing, this chapter follows the organization of

Chapter 8 in Commonality, listing subsections whether or not there are

implementation-specific details.

8.1

Overview

Note – The words “hardware memory model” denote the underlying hardware

memory models as differentiated from the “SPARC V9 memory model,” which is the

memory model the programmer selects in PSTATE.MM.

SPARC64 V supports only one mode of memory handling to guarantee correct

operation under any of the three SPARC V9 memory ordering models (impl. dep.

#113):

■ Total Store Order — All loads are ordered with respect to loads, and all stores are

ordered with respect to loads and stores. This behavior is a superset of the

requirements for the SPARC V9 memory models TSO, PSO, and RMO. When

PSTATE.MMselects TSO or PSO, SPARC64 V operates in this mode. Since

programs written for PSO (or RMO) will always work if run under Total Store

Order, this behavior is safe but does not take advantage of the reduced restrictions

of PSO.

8.4

SPARC V9 Memory Model

Please refer to Section 8.4 of Commonality.

In addition, this section describes SPARC64 V-specific details about the processor/

memory interface model.

8.4.5

Mode Control

SPARC64 V implements Total Store Ordering for all PSTATE.MM. Writing 11₂into

PSTATE.MMalso causes the machine to use TSO (impl. dep. #119). However, the

encoding 11₂should not be used, since future version of SPARC64 V may use this

encoding for a new memory model.

8.4.6

Synchronizing Instruction and Data Memory

All caches in a SPARC64 V-based system (uniprocessor or multiprocessor) have a

unified cache consistency protocol and implement strong coherence between

instruction and data caches. Writes to any data cache cause invalidations to the

SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002

corresponding locations in all instruction caches; references to any instruction cache

cause corresponding modified data to be flushed and corresponding unmodified

data to be invalidated from all data caches. The flush operation is still operative in

SPARC64 V, however.

Since the FLUSHinstruction synchronizes the processor, the total latency varies

depending on the situation in SPARC64 V. Assuming all prior instructions are

completed, the latency of FLUSHis 18 CPU cycles.

Release 1.0, 1 July 2002

F. Chapter 8

Memory Models

SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002

F.APPENDIX

Instruction Definitions:

SPARC64 V Extensions

This ap pendix describes the SPARC64 V-specific implementation of the instructions

in Appendix A of Commonality. If an instruction is not described in this appendix,

then no SPARC64 V implementation-dependency applies.

■ See TABLE A-1 of Commonality for the location at which general information about

the instruction can be found.

■ Section numbers refer to the parallel section numbers in Appen dix A of

Commonality.

TABLE A-1 lists four instructions that are unique to SPARC64 V.

TABLE A-1 Implementation-Specific Instructions

Operation

Name

Page

V9 Ext?

■

FMADD(s,d)

FMSUB(s,d)

FNMADD(s,d)

FNMSUB(s,d)

Floating-point multiply add

Floating-point multiply subtract

Floating-point multiply negate add

Floating-point multiply negate subtract

page 50

■

Each instruction definition consists of these parts:

1. A table of the opcodes defined in the subsection with the values of the field(s)

that uniquely identify the instruction(s).

2. An illustration of the applicable instruction format(s). In these illustrations a dash

(—) indicates that the field is reserved for future versions of the architecture and

shall be 0 in any instance of the instruction. If a conforming SPARC V9

implementation encounters nonzero values in these fields, its behavior is

undefined.

3. A list of the suggested assembly language syntax, as described in Appendix G,

Assembly Language Syntax.

4. A description of the features, restrictions, and exception-causing conditions.

5. A list of exceptions that can occur as a consequence of attempting to execute the

instruction(s). Exceptions due to an instruction_access_error,

instruction_access_exception, fast_instruction_access_MMU_miss, async_data_error,

ECC_error, and interrupts are not listed because they can occur on any instruction.

Also, any instruction that is not implemented in hardware shall generate an

illegal_instruction exception (or fp_exception_other exception with

ftt= unimplemented_FPop for floating-point instructions) when it is executed.

The illegal_instruction trap can occur during chip debug on any instruction that has

been programmed into the processor’s IIU_INST_TRAP(ASI = 60₁₆, VA = 0).

These traps are also not listed under each instruction.

The following traps never occur in SPARC64 V:

■

instruction_access_MMU_miss

data_access_MMU_miss

data_access_protection

unimplemented_LDD

unimplemented_STD

LDQF_mem_address_not_aligned

STQF_mem_address_not_aligned

internal_processor_error

fp_exception_other (ft t= invalid_fp_register)

This appendix does not include any timing information (in either cycles or clock

time).

The following S P A RC64 V-specific extensions are described.

■ Block Load and Store Instructions (VIS I) on page 47

■ Call and Link on page 49

■ Implementation-Dependent Instructions on page 49

■ J u mp and L ink on page 53

■ Load Quadword, Atomic [Physical] on page 54

■ Memory Barrier on page 55

■ Partial Store (VIS I) on page 57

■ Prefetch Data on page 57

■ Read State Register on page 58

■ SHUTDOWN (VIS I) on page 58

■ Write State Register on page 59

■ Deprecated Instructions on page 59

SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002

A.4

Block Load and Store Instructions (VIS I)

The following notes summarize behavior of block load/ store instructions in

SPARC64 V.

1. Block load and store operations are not atomic, in that they are internally

decomposed into eight independent, 8-byte load/ store operations in SPARC64 V.

Each load/ store is always issued and performed in the RMO memory model and

obeys all prior MEMBARand atomic instruction-imposed ordering constraints.

2. Block load/ store instructions are out of the scope of V9 memory models, meaning

that self-consistency of memory reference instruction is not always maintained if

block load/ store instructions are involved in the execution flow. The following

table describes the implemented ordering constraints for block load/ store

instructions with respect to the other memory reference instructions with an

operand address conflict in SPARC64 V:

Program Order for conflicting bld/bst/ld/st

Ordered/

first

Out-of-Order

store

blockstore

blockload

blockstore

blockload

store

Ordered

store

Ordered

load

Ordered

load

Ordered

blockstore

blockload

Out-of-Order

Ordered

load

blockstore

blockload

store

load

Ordered

blockstore

blockload

Ordered

To maintain the memory ordering even for the memory address conflicts, MEMBAR

instructions shall be inserted into appropriate location in the program.

Although self-consistency with respect to the block load/ store and the other

memory reference instructions is not maintained in some cases, register conflicts

between the other instructions and block load/ store instructions are maintained

in SPARC64 V. The read-after-write, write-after-read, and write-after-write

obstructions between a block load/ store instruction and the other arithmetic

instructions are detected and handled appropriately.

3. Block load instruction operate on the cache if the operand is present.

Release 1.0, 1 July 2002

F. Chapter A

Instruction Definitions: SPARC64 V Extensions

4. The block store with commit instruction always stores the operand in main

storage and invalidates the line in the L1D cache if it is present. The invalidation

is performed through an S_INV_REQtransaction through UPA by the system

controller.

5. The block store instruction stores the operand into main storage if it is not present

in the operand cache and the status of the line is invalid, shared, or owned. In

case the line is not present in the L1D cache and is exclusive or modified on the

L2 cache, the block store instruction modifies only the line in L2 cache. If the line

is present in the operand cache and the status is either clean/ shared or clean/

owned, the line is stored in main storage. If the line is present in the operand

cache and the status is clean/ exclusive, the line in the operand cache is

invalidated and the operand is stored in the L2 cache. If the line is in the operand

cache and the status is modified/ modified, the operand is stored in the operand

cache. The following table summarizes each cache status before block store and

the results of the block store. Blank cells mean that no action occurred in the

corresponding cache or memory, and the data, if it exists, is unchanged.

Storage

Status

Invalid

I, S, O

Valid

Cache status

before bst

E, M

—

S, O

—

invalidate

update

—

Action

update

—

update

Memory

update

—

update

Exceptions

fp_disabled

PA_watchpoint

VA_watchpoint

illegal_instruction (misaligned rd)

mem_address_not_aligned (see Block Load and Store ASIs on page 120)

data_access_exception (see Block Load and Store ASIs on page 120)

LDDF_mem_address_not_aligned (see Block Load and Store ASIs on page 120)

data_access_error

fast_data_access_MMU_miss

fast_data_access_protection

SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002

A.12 Call and Link

SPARC64 V clears the upper 32 bits of the PCvalue in r[15]when PSTATE.AMis

set (impl. dep. #125). The value written into r[15]is visible to the instruction in the

delay slot.

SPARC64 V has a special hardware table, called the return address stack, to predict

the return address from a subroutine. Though the return prediction stack achieves

better performance in normal cases, there is a special use of the CALLinstruction

(call.+8) that may have an undesirable effect on the return address stack. In this

case, the CALLinstruction is used to read the PCcontents, not to call a subroutine. In

SPARC64 V, the return address of the CALL(PC+8) is not stored in its return

address stack, to avoid a detrimental performance effect. When a retor retlis

executed, the value in the return address stack is used to predict the return address.

A.24 Implementation-Dependent Instructions

Opcode

op3

Operation

IMPDEP1

IMPDEP2

11 0110

11 0111

Implementation-Dependent Instruction 1

Implementation -Dependent Instruction 2

The IMPDEP1and IMPDEP2instructions are completely implementation dependent.

Implementation-dependent aspects include their operation, the interpretation of bits

29–25 and 18–0 in their encodings, and which (if any) exceptions they may cause.

SPARC64 V uses IMPDEP1to encode VIS instructions (impl. dep. #106).

SPARC64 V uses IMPDEP2Bto encode the Floating-Point Multiply Add/ Subtract

instructions (impl. dep. #106). See Section A.24.1, Floating-Point Multiply-Add/

Subtract, on page 50 for details.

See I.1.2, Implementation-Dependent and Reserved Opcodes, in Commonality for

information about extending the SPARC V9 instruction set by means of the

implementation-dependent instructions.

Compatibility Note – These instructions replace the CPopn instructions in

SPARC V8.

Exceptions

implementation-dependent (IMPDEP2)

Release 1.0, 1 July 2002

F. Chapter A

Instruction Definitions: SPARC64 V Extensions

A.24.1

Floating-Point Multiply-Add/ Subtract

SPARC64 V uses IMPDEP2Bopcode space to encode the Floating-Point Multiply

Add/ Subtract instructions.

Opcode

Variation

Size†

Operation

FMADDs

FMADDd

FMSUBs

FMSUBd

FNMADDs

FNMADDd

FNMSUBs

FNMSUBd

Multiply-Add Single

Multiply-Add Double

Multiply-Subtract Single

Multiply-Subtract Double

Negative Multiply-Add Single

Negative Multiply-Add Double

Negative Multiply-Subtract Single

Negative Multiply-Subtract Double

† 11 is reserved for quad.

Format (5)

110111

25 24

rs1

rs3

var size

7 6

rs2

31 30 29

19 18

14 13

9 8

5 4

Operation

Implementation

Multiply-Add

Multiply-Subtract

rd ← ^rs1× ^rs2+ ^rs3

rd ← ^rs1× ^rs2− ^rs3

Negative Multiply-Subtract

Negative Multiple-Add

rd ← − (^rs1× ^rs2− ^rs3)

rd ← − (^rs1× ^rs2+ ^rs3)

Assembly Language Syntax

fmadds

fmaddd

fmsubs

fmsubd

fnmadds

fnmaddd

fnmsubs

fnmsubd

freg , freg , freg , freg

rs1 rs2 rs3

freg , freg , freg , freg

rs1

rs2

rs3

freg , freg , freg , freg

rs1

rs2

rs3

freg , freg , freg , freg

rs1

rs2

rs3

freg , freg , freg , freg

rs1

rs2

rs3

freg , freg , freg , freg

rs1

rs2

rs3

freg , freg , freg , freg

rs1

rs2

rs3

freg , freg , freg , freg

rs1

rs2

rs3

SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002

Description

The Floating-point Multiply-Add instructions multiply the registers specified by the

rs1field times the registers specified by the rs2field, add that product to the

registers specified by the rs3field, then write the result into the registers specified

by the rdfield.

The Floating-point Multiply-Subtract instructions multiply the registers specified by

the rs1field times the registers specified by the rs2field, subtract from that

product the registers specified by the rs3field, and then write the result into the

registers specified by the rdfield.

The Floating-point Negative Multiply-Add instructions multiply the registers

specified by the rs1field times the registers specified by the rs2field, negate the

product, subtract from that negated value the registers specified by the rs3 field, and

then write the result into the registers specified by the rdfield.

The Floating-point Negative Multiply-Subtract instructions multiply the registers

specified by the rs1field times the registers specified by the rs2field, negate the

product, add that negated product to the registers specified by the rs3field, and

then write the result into the registers specified by the rdfield.

All of the operations above are treated as separate multiply and add/ subtract

operations in SPARC64 V. That is, a multiply operation is first performed with a

complete rounding step (as if it were a single multiply operation), and then an add/

subtract operation is performed with a complete rounding step (as if it were a single

add/ subtract operation). Consequently, at most two rounding errors can be

incurred.¹

Special behaviors in han dling traps are generated in a Floating-point Multiply-Add/

Subtract instruction in SPARC64 V because of its implementation characteristics. If

any trapping exception is detected in the multiply part in the process of a Floating-

point Multiply-Add/ Subtract instruction, the execution of the instruction is aborted,

the exception condition is recorded in FSR.cexcand FSR.aexc, and the CPU traps

with the exception condition. The add/ subtract part of the instruction is only

performed when the multiply-part of the instruction does not have any trapping

exceptions.

As described in the TABLE A-2, if there are trapping IEEE754 exception conditions in

either of the operations FMULor FADD/SUB, only the trapping exception condition is

recorded in the cexc, and the aexcis not modified. If there are no trapping IEEE754

exception conditions, every nontrapping exception condition is ORed into the cexc

and the cexcis accumulated into the aexc. The boundary conditions of an

unfinished_FPop trap for Floating-point Multiply-Add/ Subtract instructions are

exactly same as for FMULand FADD/SUBinstructions; if either of the operations

1. Note that this implementation differs from previous SPARC64 implementations, which incurred at most one

rounding error.

Release 1.0, 1 July 2002

F. Chapter A

Instruction Definitions: SPARC64 V Extensions

detects any conditions for an unfinished_FPop trap, the Floating-point Multiply-Add/

Subtract instruction generates the unfinished_FPop exception. In this case, none of rd,

cexc, or aexcare modified.

TABLE A-2

Exceptions in Floating-Point Multiply-Add/ Subtract Instructions

FMUL

IEEE754 trap

No trap

FADD/SUB

cexc

—

IEEE754 trap

Exception condition of FMUL Exception condition of FADD Logical or of the nontrapping exception

conditions of FMULand FADD/SUB

aexc

No change

Logical OR of the cexc(above) and the

aexc

Detailed contents of cexcand aexcdepending on the various conditions are

described in TABLE A-3 and TABLE A-4. The following terminology is used: uf, of, inv,

and nx are nontrapping IEEE exception conditions—underflow, overflow, invalid

operation, and inexact, respectively.

TABLE A-3 Non-Trapping cexcWhen FSR.NS= 0

FADD

none

of nx

uf of nx

—

inv

none

inv

inv nx

inv of nx

uf inv nx

inv

FMUL

of nx

uf nx

inv

of nx

uf nx

inv

of nx

uf nx

—

TABLE A-4 Non-Trapping aexcWhen FSR.NS= 1

FADD

none

of nx

—

uf nx

—

inv

none

inv

inv nx

inv of nx

uf inv nx

inv

FMUL

of nx

uf nx

inv

of nx

uf nx

inv

of nx

—

In the tables, the conditions in the shaded columns are all reported as an

unfinished_FPop trap by SPARC64 V. In addition, the conditions with “—” do not

exist.

SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002

Programming Note – The Multiply Add/ Subtract instructions are encoded in the

SPARC V9 IMPDEP2opcode space, and they are specific to the SPARC64 V

implementation. They cannot be used in any programs that will be executed on any

other SPARC V9 processor, unless that implementation exactly matches the

SPARC64 V use for the IMPDEP2opcode.

Exceptions

fp_disabled

fp_exception_ieee_754 (NV, NX, OF, UF)

illegal_instruction (size = 00 or 11 ) (fp_disabled is not checked for these encodings)

fp_exception_other (unfinished_FPop)

A.29 Jump and Link

SPARC64 V clears the upper 32 bits of the PCvalue in r[rd] when PSTATE.AMis set

(impl. dep. #125). The value written into r[rd]is visible to the instruction in the

delay slot.

If either of the low-order two bits of the jump address is nonzero, a

mem_address_not_aligned exception occurs. However, when the JMPLinstruction

causes a mem_address_not_aligned trap, DSFSRand DSFARare not updated.

If the JMPLinstruction has r[rd]= 15, SPARC64 V stores PC + 8 in a hardware table

called return address stack (RAS). When a ret (jmpl %i7+8, %g0) or retl (jmpl

%o7+8, %g0) is executed, the value in the RAS is used to predict the return address.

JMPLwith rd= 0can be used to return from a subroutine. The typical return

address is “r[31] + 8” if a nonleaf routine (one that uses the SAVEinstruction) is

entered by a CALLinstruction, or “r[15] + 8” if a leaf routine (one that does not

use the SAVEinstruction) is entered by a CALLinstruction or by a JMPLinstruction

with rd= 15.

Release 1.0, 1 July 2002

F. Chapter A

Instruction Definitions: SPARC64 V Extensions

A.30 Load Quadword, Atomic [Physical]

The Load Quadword ASIs in this section are specific to SPARC64 V, as an extension

to SPARC JPS1.

Format (3) LDDA

Description

ASIs 34₁₆and 3C₁₆are used with the LDDAinstruction to atomically read a 128-bit

SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002

■ TTE.NFO= 0

■ TTE.CP = 1

■ TTE.CV = 0

■ TTE.E = 0

■ TTE.P = 1

■ TTE.W = 0

Note – TTE.IEdepends on the endianness of the ASI. When the ASI is 034₁₆,

TTE.IE = 0; TTE.IE = 1 when the ASI is 03C₁₆

Therefore, the atomic quad load physical instruction can only be applied to a

cacheable memory area. Semantically, ASI_QUAD_LDD_PHYS{_L} (034₁₆and

03C₁₆) is a combination of ASI_NUCLEUS_QUAD_LDDand ASI_PHYS_USE_EC.

With respect to little endian memory, a Load Quadword Atomic instruction behaves

as if it comprises two 64-bit loads, each of which is byte-swapped independently

before being written into its respective destination register.

Exceptions:

privileged_action

PA_watchpoint (recognized on only the first 8 bytes of a transfer)

illegal_instruction (misaligned rd)

mem_address_not_aligned

data_access_exception

data_access_error

fast_data_access_MMU_miss

fast_data_access_protection

A.35 Memory Barrier

Format (3)

i=1

cmask

mmask

op3

0 1111

—

31 30 29

25 24

19 18

14 13 12

4 3

Assembly Language Syntax

membar

membar_mask

Release 1.0, 1 July 2002

F. Chapter A

Instruction Definitions: SPARC64 V Extensions

Description

The memory barrier instruction, MEMBAR, has two complementary functions: to

express order constraints between memory references and to provide explicit control

of memory-reference completion. The membar_maskfield in the suggested assembly

language is the concatenation of the cmaskand mmaskinstruction fields.

The mmaskfield is encoded in bits 3 through 0 of the instruction. TABLE A-5 specifies

the order constraint that each bit of mmask(selected when set to 1) imposes on

memory references appearing before and after the MEMBAR. From zero to four mask

bits can be selected in the mmaskfield.

TABLE A-5

Mask Bit

Order Constraints Imposed by mmaskBits

Name

Description

mmask<3>

#StoreStore

The effects of all stores appearing before the MEMBARinstruction must be

visible to all processors before the effect of any stores following the MEMBAR.

Equivalent to the deprecated STBARinstruction. Has no effect on SPARC64 V

since all stores are performed in program order.

mmask<2>

#LoadStore

All loads appearing before the MEMBARinstruction must have been performed

before the effects of any stores following the MEMBARare visible to any other

processor. Has no effect on SPARC64 V since all stores are performed in

program order and must occur after performance of any load.

mmask<1>

#StoreLoad

#LoadLoad

The effects of all stores appearing before the MEMBARinstruction must be

visible to all processors before loads following the MEMBARmay be performed.

mmask<0>

All loads appearing before the MEMBARinstruction must have been performed

before any loads following the MEMBARmay be performed. Has no effect on

SPARC64 V since all loads are performed after any prior loads.

The cmaskfield is encoded in bits 6 through 4 of the instruction. Bits in the cmask

field, described in TABLE A-6, specify additional constraints on the order of memory

references and the processing of instructions. If cmask is zero, then MEMBARenforces

the partial ordering specified by the mmaskfield; if cmaskis nonzero, then

completion and partial order constraints are applied.

TABLE A-6

Mask Bit

Bits in the cmaskField

Function

Name

Description

cmask<2>

Synchronization #Sync

barrier

All operations (including nonmemory reference operations)

appearing before the MEMBARmust have been performed, and

the effects of any exceptions become visible before any

instruction after the MEMBARmay be initiated.

cmask<1>

Memory issue

barrier

#MemIssue

All memory reference operations appearing before the MEMBAR

must have been performed before any memory operation after

the MEMBARmay be initiated. Equivalent to #Syncin

SPARC64 V.

cmask<0>

Lookaside

barrier

#Lookaside

A store appearing before the MEMBARmust complete before

any load following the MEMBARreferencing the same address

can be initiated. Equivalent to #Syncin SPARC64 V.

SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002

A.42 Partial Store (VIS I)

Please refer A.42 in Commonality for general details.

Watchpoint exceptions on partial store instructions occur conservatively on

SPARC64 V. The DCUCRData Watchpoint masks are only checked for nonzero value

(watchpoint enabled). The byte store mask (r[rs2]) in the partial store instruction

is ignored, and a watchpoint exception can occur even if the mask is zero (that is, no

store will take place) (impl. dep. #249).

For a partial store instruction with mask = 0, SPARC64 V still issues a UPA

transaction with zero-byte mask.

Exceptions:

fp_disabled

PA_watchpoint

VA_watchpoint

illegal_instruction (misaligned rd)

mem_address_not_aligned (see Partial Store ASIs on page 120)

data_access_exception (see Partial Store ASIs on page 120)

LDDF_mem_address_not_aligned (see Partial Store ASIs on page 120)

data_access_error

fast_data_access_MMU_miss

fast_data_access_protection

A.49 Prefetch Data

Please refer to Section A.49, Prefetch Data, of Commonality for principal information.

The prefetchainstruction of SPARC64 V works for the following ASIs.

■ ASI_PRIMARY(080₁₆), ASI_PRIMARY_LITTLE(088₁₆

■ ASI_SECONDARY(081₁₆), ASI_SECONDARY_LITTLE(089₁₆)

■ ASI_NUCLEUS(04₁₆), ASI_NUCLEUS_LITTLE(0C₁₆

■ ASI_PRIMARY_AS_IF_USER(010₁₆), ASI_PRIMARY_AS_IF_USER_LITTLE

)

(018₁₆

)

■ ASI_SECONDARY_AS_IF_USER(011₁₆), ASI_SECONDARY_AS_IF_USER_LITTLE

( 019₁₆)

If an ASI other than the above is specified, prefetchais executed as a nop.

Release 1.0, 1 July 2002

F. Chapter A

Instruction Definitions: SPARC64 V Extensions

TABLE A-7 describes prefetch variants implemented in SPARC64 V.

TABLE A-7 Prefetch Variants

fcn

Fetch to:

L1D

Status

Description

L1D

—

NOP

5-15

16-19

reserved (SPARC V9)

illegal_instruction exception is signalled.

implementation

NOP

dependent.

L1D

If an access causes an mTLB miss,

fast_data_access_MMU_miss exception is signalled.

If an access causes an mTLB miss,

fast_data_access_MMU_miss exception is signalled.

L1D

If an access causes an mTLB miss,

fast_data_access_MMU_miss exception is signalled.

If an access causes an mTLB miss,

fast_data_access_MMU_miss exception is signalled.

24-31

implementation

dependent

NOP

A.51 Read State Register

In SPARC64 V, an RDPCRinstruction will generate a privileged_action exception if

PSTATE.PRIV= 0 and PCR.PRIV= 1. If PSTATE.PRIV= 0 and PCR.PRIV= 0,

RDPCRwill not cause any access privilege violation exception (impl. dep. #250).

A.70 SHUTDOWN (VIS I)

In SPARC64 V, SHUTDOWNacts as a NOPin privileged mode (impl. dep. #206).

SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002

A.70 Write State Register

In SPARC64 V, a WRPCRinstruction will cause a privileged_action exception if

PSTATE.PRIV= 0 and PCR.PRIV= 1. If PSTATE.PRIV= 0 and PCR.PRIV= 0,

WRPCRcauses a privileged_action exception only when an attempt is made to change

(that is, write 1 to) PCR.PRIV(impl. dep. #250).

A.71 Deprecated Instructions

The deprecated instructions in A.71 of Commonality are provided only for

compatibility with previous versions of the architecture. They should not be used in

new software.

A.71.10 Store Barrier

In SPARC64 V, STBARbehaves as NOP since the hardware memory models always

enforce the semantics of these MEMBARs for all memory accesses.

Release 1.0, 1 July 2002

F. Chapter A

Instruction Definitions: SPARC64 V Extensions

SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002

F.APPENDIX

IEEE Std 754-1985 Requirements for

SPARC V9

The IEEE Std 754-1985 floating-point standard contains a number of implementation

dependencies.

Please see Appendix B of Commonality for choices for these implementation

dependencies, to ensure that SPARC V9 implementations are as consistent as

possible.

Following is information specific to the SPARC64 V implementation of SPARC V9 in

these sections:

■ Traps Inhibiting Results on page 61

■ Floating-Point Nonstandard Mode on page 61

B.1

B.6

Traps Inhibiting Results

Please refer to Section B.1 of Commonality.

The SPARC64 V hardware, in conjunction with kernel or emulation code, produces

the results described in this section.

Floating-Point Nonstandard Mode

In this section, the hardware boundary conditions for the unfinished_FPop exception

and the nonstandard mode of SPARC64 V floating-point hardware are discussed.

SPARC64 V floating-point hardware has its specific range of computation. If either

the values of input operands or the value of the intermediate result shows that the

computation may not fall in the range that hardware provides, SPARC64 V generates

an fp_exception_other exception (tt= 022₁₆) with FSR.ftt= 02₁₆(unfinished_FPop)

and the operation is taken over by software.

The kernel emulation routine completes the remaining floating-point operation in

accordance with the IEEE 754-1985 floating-point standard (impl. dep. #3).

SPARC64 V implements a nonstandard mode, enabled when FSR.NSis set (see

FSR_nonstandard_fp (NS) on page 18). Depending on the setting in FSR.NS, the

behavior of SPARC64 V with respect to the floating-point computation varies.

B.6.1

fp_exception_other Exception (ftt=unfinished_FPop)

SPARC64 V may invoke an fp_exception_other (tt= 022₁₆) exception with FSR.ftt=

unfinished_FPop (ftt= 02₁₆) in FsTOd, FdTOs, FADD(s,d), FSUB(s,d),

FsMULd(s,d), FMUL(s,d), FDIV(s,d), FSQRT(s,d) floating-point instructions. In

addition, Floating-point Multiply-Add/ Subtract instructions generate the exception,

since the instruction is the combination of a multiply and an add/ subtract operation:

FMADD(s,d), FMSUB(s,d), FNMADD(s,d), and FNMADD(s,d).

The following basic policies govern the detection of boundary conditions:

1. When one of the operands is a denormalized number and the other operand is a

normal non-zero floating-point number (except for a NaN or an infinity), an

fp_exception_other with unfinished_FPop condition is signalled. The cases in which

the result is a zero or an overflow are excluded.

2. When both operands are denormalized numbers, except for the cases in which the

result is a zero or an overflow, an fp_exception_other with unfinished_FPop condition

is signalled.

3. When both operands are normal, the result before rounding is a denormalized

number and TEM.UFM = 0, and fp_exception_other with unfinished_FPop condition

is signalled, except for the cases in which the result is a zero.

When the result is expected to be a constant, such as an exact zero or an infinity, and

an insignificant computation will furnish the result, SPARC64 V tries to calculate the

result without signalling an unfinished_FPop exception.

SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002

Implementation Note – Detecting the exact bou ndary conditions requires a large

amount of hardware. SPARC64 V detects approximate boundary conditions by

calculating the exponent intermediate result (the exponent before rounding) from

input operands, to avoid the hardware cost. Since the computation of the boundary

conditions is approximate, the detection of a zero result or an overflow result shall

be pessimistic. SPARC64 V generates an unfinished_FPop exception pessimistically.

The equations to calculate the result exponent to detect the boundary conditions

from the input exponents are presented in TABLE B-1, where Er is the approximation

of the biased result exponent before rounding and is calculated only from the input

exponents (esrc1, esrc2). Er is to be used for detecting the boundary condition for an

unfinished_FPop.

TABLE B-1 Result Exponent Approximation for Detecting unfinished_FPop Boundary

Conditions

Operation

fmuls

fmuld

fdivs

fdivd

Formula

Er = esrc1 + esrc2 − 126

Er = esrc1 + esrc2 − 1022

Er = esrc1 - esrc2 + 126

Er = esrc1 - esrc2 + 1022

esrc1 and esrc2 are the biased exponents of the input operands. When the

corresp onding input operand is a denormalized number, the value is 0.

From Er, eres is calculated. eres is a biased result exponent, after mantissa alignment

and before rounding, where the appropriate adjustment of the exponent is applied to

the result mantissa: left-shifting or right-shifting the mantissa to the implicit 1 at the

left of the binary point, subtracting or adding the shift-amount to the exponent. The

result mantissa is assumed to be 1.xxxx in calculating eres. If the result is a

denormalized number, eres is less than zero.

TABLE B-2 describes the boundary condition of each floating-point instruction that

generates an unfinished_FPop exception.

TABLE B-2

unfinished_FPop Boundary Conditions

Operation

FdTOs

Boundary Conditions

−25 < eres < 1 and TEM.UFM= 0.

Second operand (rs2) is a denormalized number.

FsTOd

FADDs, FSUBs,

FADDd, FSUBd

1. One of the operands is a denormalized number, and the other operand is a normal,

nonzero floating-point number (except for a NaN and an infinity) .

2. Both operands are denormalized numbers.

3. Both operands are normal nonzero floating-point numbers (except for a NaN and

an infinity), eres < 1, and TEM.UFM= 0.

Release 1.0, 1 July 2002

F. Chapter B

IEEE Std 754-1985 Requirements for SPARC V9

TABLE B-2

Operation

unfinished_FPop Boundary Conditions (Continued)

Boundary Conditions

FMULs, FMULd

1. One of the operands is a denormalized number, the other operand is a normal,

nonzero floating-point number (except for a NaN and an infinity), and

single precision: -25 < Er

double precision: -54 < Er

2. Both operands are normal, nonzero floating-point numbers (except for a NaN and

an infinity), TEM.UFM= 0, and

single precision: −25 < eres < 1

double precision: −54 < eres < 1

FsMULd

1. One of the operands is a denormalized number, and the other operand is a normal,

nonzero floating-point number (except for a NaN and an infinity).

2. Both operands are denormalized numbers.

FDIVs, FDIVd

1. The dividend (operand1; rs1) is a normal, nonzero floating-point number (except

for a NaN and an infinity), the divisor (operand2; rs2) is a denormalized number,

and

single precision: Er < 255

double precision: Er < 2047

2. The dividend (operand1; rs1) is a denormalized number, the divisor (operand2;

rs2) is a normal, nonzero floating-point number (except for a NaN and an infinity),

and

single precision: −25 < Er

double precision: −54 < Er

3. Both operands are denormalized numbers.

4. Both operands are normal, nonzero floating-point numbers (except for a NaN and

an infinity), TEM.UFM= 0 and

single precision: −25 < eres < 1

double precision: −54 < eres < 1

FSQRTs, FSQRTd

The input operand (operand2; rs2) is a positive nonzero and is a denormalized

number.

1. Operation of 0 and denormalized number generates a result in accordance with the IEEE754-1985 standard.

Pessimistic Zero

If a condition in TABLE B-3 is true, SPARC64 V generates the result as a pessimistic

zero, meaning that the result is a denormalized minimum or a zero, depending on

the rounding mode (FSR.RD).

SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002

TABLE B-3 Conditions for a Pessimistic Zero

Conditions

Operations

One operand is denormalized¹

Both are denormalized

Both are normal fp-number²

FdTOs

always

—

eres ≤ -25

FMULs,

FMULd

single precision: Er ≤−25

double precision: Er ≤−54

single precision: Er ≤−25

double precision: Er ≤−54

Always

single precision: eres ≤−25

double precision: eres ≤−54

single precision: eres ≤−25

double precision: eres ≤−54

FDIVs,

FDIVd

Never

1. Both operands are non-zero, non-NaN, and non-infinity numbers.

2. Both may be zero, but both are non-NaN and non-infinity numbers.

Pessimistic Overflow

If a condition in TABLE B-4 is true, SPARC64 V regards the operation as having an

overflow condition.

TABLE B-4 Pessimistic Overflow Conditions

Operations

FDIVs

Conditions

The divisor (operand2; rs2) is a denormalized number and, Er ≥ 255.

The divisor (operand2; rs2) is a denormalized number and, E ≥ 2047.

FDIVd

B.6.2

Operation Under FSR.NS = 1

When FSR.NS= 1 (nonstandard mode), SPARC64 V zeroes all the input

denormalized operands before the operation and signals an inexact exception if

enabled. If the operation generates a denormalized result, SPARC64 V zeroes the

result and also signals an inexact exception if enabled. The following list defines the

operation in detail.

■ If either operand is a denormalized number and both operands are non-zero, non-

NaN, and non-infinity numbers, the input denormalized operand is replaced with

a zero with same sign, and the operation is performed. If enabled, inexact

exception is signalled; an fp_exception_ieee_754 (tt= 021₁₆) is generated, with

nxc=1 in FSR.cexc(FSR.ftt=01₁₆; IEEE754_exception). However, if the

operation is FDIV(s,d) and either a division_by_zero or an invalid_operation

condition is detected, or if the operation is FSQRT(s,d) and an invalid_operation

condition is detected, the inexact condition is not reported.

■ If the result before rounding is a denormalized number, the result is flushed to a

zero with a same sign and signals either an underflow exception or an inexact

exception, depending on FSR.TEM.

As observed from the preceding, when FSR.NS = 1, SPARC64 V generates neither

an unfinished_FPop exception nor a denormalized number as a result. TABLE B-5

Release 1.0, 1 July 2002

F. Chapter B

IEEE Std 754-1985 Requirements for SPARC V9

summarizes the behavior of SPARC64 V floating-point hardware depending on

FSR.NS.

Note – The result and behavior of SPARC64 V of the shaded column in the tables

Table B-5 and Table B-6 conform to IEEE754-1985 standard.

Note – Throughout Table B-5 and Table B-6, lowercase exception conditions such as

nx, uf, of, dv and nv are nontrapping IEEE 754 exceptions. Uppercase exception

conditions such as NX, UF, OF, DZ and NV are trapping IEEE 754 exceptions.

TABLE B-5

SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002

TABLE B-6 describes how SPARC64 V behaves when FSR.NS= 1 (nonstandard mode).

TABLE B-6

Nonarithmetic Operations Under FSR.NS= 1

op2=

Operations op1= denorm denorm

UFM

NXM

DVM

—

NVM

—

Result

FsTOd

—

Yes

—

nx, a signed zero

FdTOs

—

uf + nx, a signed zero

FADDs,

FSUBs,

FADDd,

FSUBd

Yes

—

Yes

—

nx, op2

—

nx, op1

nx, a signed zero

FMULs,

FMULd,

FsMULd

—

nx, a signed zero

Yes

nx, a signed zero

FDIVs,

FDIVd

Yes

—

nx, a signed zero

—

dz, a signed infinity

—

nv, dNaN

FSQRTs,

FSQRTd

Yes and op2

> 0

—

nx, zero

Yes and op2

< 0

—

nv, dNaN

1. A single precision dNaN is 7FFF.FFFF and a double precision dNaN is 7FFF.FFFF.FFFF.FFFF

16,

Release 1.0, 1 July 2002

F. Chapter B

IEEE Std 754-1985 Requirements for SPARC V9

SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002

F.APPENDIX

I m p l ementation Dependencies

This appendix summarizes implementation dependencies. In SPARC V9 and SPARC

JPS1, the notation “IMPL. DEP. #nn:” identifies the definition of an implementation

dependency; the notation “(impl. dep. #nn)” identifies a reference to an

implementation dependency. These dependencies are described by their number nn

in TABLE C-1 on page 70. These numbers have been removed from the body of this

document for SPARC64 V to make the document more readable. TABLE C-1 has been

modified to include descriptions of the manner in which SPARC64 V has resolved

each implementation dependency.

Note – SPARC International maintains a document, Implementation Characteristics of

Current SPARC-V9-based Products, Revision 9.x, that describes the implementation-

dependent design features of all SPARC V9-compliant implementations. Contact

SPARC International for this document at

home page: www.sparc.org

email: info@sparc.org

C.1

Definition of an Implementation

Dependency

Please refer to Section C.1 of Commonality.

C.2

C.3

C.4

Hardware Characteristics

Please refer to Section C.2 of Commonality.

Implementation Dependency Categories

Please refer to Section C.3 of Commonality.

List of Implementation Dependencies

TABLE C-1 provides a complete list of how each implementation dependency is

treated in the SPARC64 V implementation.

TABLE C-1 SPARC64 V Implementation Dependencies (1 of 11)

Nbr

SPARC64 V Implementation Notes

Page

Software emulation of instructions

—

The operating system emulates all instructions that generate

illegal_instruction or unimplemented_FPop exceptions.

Number of IU registers

SPARC64 V supports eight register windows (NWINDOWS= 8).

—

SPARC64 V supports an additional two global register sets (Interrupt

globals and MMU globals) for a total of 160 integer registers.

Incorrect IEEE Std 754-1985 results

See Section B.6, Floating-Point Nonstandard Mode, on page 61 for details.

4–5

Reserved.

I/O registers privileged status

This dependency is beyond the scope of this publication. It should be

defined in each system that uses SPARC64 V.

—

I/O register definitions

This dependency is beyond the scope of this publication. It should be

defined in each system that uses SPARC64 V.

RDASR/WRASR target registers

See A.50 and A.70 in Commonality for details of implementation-dependent

RDASR/ WRASRinstructions.

SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002

TABLE C-1 SPARC64 V Implementation Dependencies (2 of 11)

Nbr

SPARC64 V Implementation Notes

Page

RDASR/WRASR privileged status

—

See A.50 and A.70 in Commonality for details of implementation-dependent

RDASR/ WRASRinstructions.

10–12 Reserved.

VER.impl

VER.impl= 5 for the SPARC64 V processor.

14–15 Reserved.

—

IU deferred-trap queue

SPARC64 V neither has nor needs an IU deferred-trap queue.

Reserved.

—

Nonstandard IEEE 754-1985 results

18, 62

SPARC64 V flushes denormal operands and results to zero when

FSR.NS= 1. For the treatment of denormalized numbers, please refer to

Section B.6, Floating-Point Nonstandard Mode, on page 61 for details.

FPU version, FSR.ver

FSR.ver= 0 for SPARC64 V.

20–21 Reserved.

FPU TEM, cexc, and aexc

SPARC64 V implements all bits in the TEM, cexc, and aexcfields in

hardware.

Floating-point traps

In SPARC64 V floating-point traps are always precise; no FQ is needed.

FPU deferred-trap queue (FQ)

SPARC64 V neither has nor needs a floating-point deferred-trap queue.

RDPR of FQ with nonexistent FQ

Attempting to execute an RDPRof the FQcauses an illegal_instruction

exception.

26–28 Reserved.

—

Address space identifier (ASI) definitions

The ASIs that are supported by SPARC64 V are defined in Appendix L,

Address Space Identifiers.

ASI address decoding

SPARC64 V supports all of the listed ASIs.

117

138

Catastrophic error exceptions

SPARC64 V contains a watchdog timer that times out after no instruction

has been committed for a specified number of cycles. If the timer times out,

the CPU tries to invoke an async_data_error trap. If the counter continues to

count to reach 2³³, the processor enters error_state. Upon an entry to

error_state, the processor optionally generates a WDR reset to recover

from error_state.

Release 1.0, 1 July 2002

F. Chapter C

Implementation Dependencies

TABLE C-1 SPARC64 V Implementation Dependencies (3 of 11)

Nbr

SPARC64 V Implementation Notes

Page

Deferred traps

37, 149

SPARC64 V signals a deferred trap in a few of its severe error conditions.

SPARC64 V does not contain a deferred trap queue.

Trap precision

There are no deferred traps in SPARC64 V other than the trap caused by a

few severe error conditions. All traps that occur as the result of program

execution are precise.

Interrupt clearing

For details of interrupt handling see Appendix N, Interrupt Handling.

—

Implementation-dependent traps

SPARC64 V supports the following traps that are implementation

dependent:

39, 39

• interrupt_vector_trap (tt= 060

)

• PA_watchpoint (tt= 061

)

• VA_watchpoint (tt= 062

)

• ECC_error (tt= 063

)

• fast_instruction_access_MMU_miss (tt= 064 through 067

)

• fast_data_access_MMU_miss (tt= 068 through 06B

)

16 16

• fast_data_access_protection (tt= 06C through 06F

)

• async_data_error (tt= 040₁₆

)

Trap priorities

SPARC64 V’s implementation-dependent traps have the following

priorities:

• interrupt_vector_trap (priority =16)

• PA_watchpoint (priority =12)

• VA_watchpoint (priority = 1)

• ECC_error (p riority = 33)

• fast_instruction_access_MMU_miss (priority = 2)

• fast_data_access_MMU_miss (priority = 12)

• fast_data_access_protection (priority = 12)

• async_data_error (priority = 2)

Reset trap

SPARC64 V implements power-on reset (POR) and watchdog reset.

Effect of reset trap on implementation-dependent registers

141

See Section O.3, Processor State after Reset and in RED_state, on page 141.

Entering error_state on implementation-dependent errors

CPU watchdog timeout at 2³³ticks, a normal trap, or an SIR at TL= MAXTL

causes the CPU to enter error_state.

Error_state processor state

SPARC64 V optionally takes a watchdog reset trap after entry to

error_state. Most error-logging register state will be preserved. (See also

impl. dep. #254.)

Reserved.

SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002

TABLE C-1 SPARC64 V Implementation Dependencies (4 of 11)

Nbr

SPARC64 V Implementation Notes

Page

FLUSH instruction

—

SPARC64 V implements the FLUSHinstruction in hardware.

Reserved.

Data access FPU trap

—

The destination register(s) are unchanged if an access error occurs.

45–46 Reserved.

RDASR

—

See A.50, Read State Register, in Commonality for details.

WRASR

See A.70, Write State Register, in Commonality for details.

49–54 Reserved.

55 Floating-point underflow detection

See FSR_underflow in Section 5.1.7 of Commonality for details.

56–100 Reserved.

—

101

Maximum trap level

MAXTL= 5.

102

Clean windows trap

—

SPARC64 V generates a clean_window exception; register windows are

cleaned in software.

103

Prefetch instructions

—

SPARC64 V i m p l ements PREFETC H variations 0 –3 and 20–23 with the

following implementation-dependent characteristics:

• The prefetches have observable effects in privileged code.

• Prefetch variants 0–3 do not cause a fast_data_access_MMU_miss trap,

because the prefetch is dropped when a fast_data_access_MMU_miss

condition happens. On the other hand, prefetch variants 20–23 cause

data_access_MMU_miss traps on TLB misses.

• All prefetches are for 64-byte cache lines, which are aligned on a 64-byte

boundary.

• See Section A.49, Prefetch Data, on page 57, for implemented variations

and their characteristics.

• Prefetches will work normally if the ASI is ASI_PRIMARY,

ASI_SECONDARY, or ASI_NUCLEUS, ASI_PRIMARY_AS_IF_USER,

ASI_SECONDARY_AS_IF_USER, and their little-endian pairs.

104

105

VER.manuf

VER.manuf= 0004 . The least significant 8 bits are Fujitsu’s JEDEC

manufacturing code.

TICK register

SPARC64 V implements 63 bits of the TICKregister; it increments on every

clock cycle.

Release 1.0, 1 July 2002

F. Chapter C

Implementation Dependencies

TABLE C-1 SPARC64 V Implementation Dependencies (5 of 11)

Nbr

SPARC64 V Implementation Notes

Page

106

IMPDEPn instructions

SPARC64 V uses the IMPDEP2opcode for the Multiply Add/ Subtract

instructions. SPARC64 V also conforms to Sun’s specification for VIS-1 and

VIS-2.

107

108

109

Unimplemented LDD trap

SPARC64 V implements LDDin hardware.

—

Unimplemented STD trap

SPARC64 V implements STDin hardware.

LDDF_mem_address_not_aligned

If the address is word aligned but not doubleword aligned, SPARC64 V

generates the LDDF_mem_address_not_aligned exception. The trap handler

software emulates the instruction.

110

111

112

STDF_mem_address_not_aligned

—

If the address is word aligned but not doubleword aligned, SPARC64 V

generates the STDF_mem_address_not_aligned exception. The trap handler

software emulates the instruction.

LDQF_mem_address_not_aligned

SPARC64 V generates an illegal_instruction exception for all LDQFs. The

processor does not perform the check for fp_disabled . The trap handler

software emulates the instruction.

STQF_mem_address_not_aligned

SPARC64 V generates an illegal_instruction exception for all STQFs. The

processor does not perform the check for fp_disabled. The trap handler

software emulates the instruction.

113

114

Implemented memory models

SPARC64 V implements Total Store Order (TSO) for all the memory models

specified in PSTATE.MM. See Chapter 8, Memory Models, for details.

RED_state trap vector address (RSTVaddr)

RSTVaddris a constant in SPARC64 V, where:

VA= FFFF FFFF F000 0000₁₆and

PA=07FF F000 0000₁₆

115

116

RED_state processor state

See RED_state on page 36 for details of implementation-specific actions in

RED_state.

SIR_enable control flag

—

See Section A.60 SIRin Commonality for details.

117

118

MMU disabled prefetch behavior

Prefetch and nonfaulting Load always succeed when the MMU is disabled.

Identifying I/O locations

—

This dependency is beyond the scope of this publication. It should be

defined in a system that uses SPARC64 V.

SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002

TABLE C-1 SPARC64 V Implementation Dependencies (6 of 11)

Nbr

SPARC64 V Implementation Notes

Page

119

Unimplemented values for PSTATE.MM

Writing 11₂into PSTATE.MMcauses the machine to use the TSO memory

model. H oweve r , the encoding 1 1₂should not be used, since future versions

of SPARC64 V may use this encoding for a new memory model.

120

121

Coherence and atomicity of memory operations

—

Although SPARC64 V implements the UPA-based cache coherency

mechanism, this dependency is beyond the scope of this publication. It

should be defined in a system that uses SPARC64 V.

Implementation-dependent memory model

SPARC64 V implements TSO, PSO, and RMO memory models. See

Chapter 8, Memory Models, for details.

Accesses to pages with the E(Volatile) bit of their MMU page table entry set

are also made in program order.

122

123

FLUSH latency

—

Since the FLUSHinstruction synchronizes the processor, its total latency

varies depending on many portions of the SPARC64 V processor ’s state.

Assuming that all prior instructions are completed, the latency of FLUSHis

18 processor cycles.

Input /output (I/O) semantics

This dependency is beyond the scope of this publication. It should be

defined in a system that uses SPARC64 V.

124

125

Implicit ASI when TL > 0

See Section 5.1.7 of Commonality for details.

—

Address masking

29, 49, 53

When PSTATE.AM = 1, SPARC64 V does mask out the high-order 32 bits of

the PCwhen transmitting it to the destination register.

126

—

NWINDOWSfor SPARC64 V is 8; therefore, only 3 bits are implemented for

the following registers: CWP, CANSAVE, CANRESTORE, OTHERWIN. If an

attempt is made to write a value greater than NWINDOWS − 1 to any of these

registers, the extraneous upper bits are discarded. The CLEANWINregister

contains 3 bits.

127–201 Reserved.

202

fast_ECC_error trap

—

fast_ECC_error trap is not implemented in SPARC64 V.

203

204

205

Dispatch Control Register bits 13:6 and 1

SPARC64 V does not implement DCR.

DCR bits 5:3 and 0

SPARC64 V does not implement DCR.

Instruction Trap Register

SPARC64 V implements the Instruction Trap Register.

Release 1.0, 1 July 2002

F. Chapter C

Implementation Dependencies

TABLE C-1 SPARC64 V Implementation Dependencies (7 of 11)

Nbr

SPARC64 V Implementation Notes

Page

206

SHUTDOWN instruction

In privileged mode the SHUTDOWNinstruction executes as a NOP in

SPARC64 V.

207

PCR register bits 47:32, 26:17, and bit 3

20, 21,

201

SPARC64 V uses these bits for the following purposes:

• Bits 47:32 for set/ clear/ show status of overflow (OVF).

• Bit 26 for validity of OVFfield (OVRO).

• Bits 24:22 for number of counter pair (NC).

• Bits 20:18 for counter selector (SC).

• Bit 3 for validity of SU/ SLfield (ULRO).

Other implementation-dependent bits are read as 0 and writes to them are

ignored.

208

Ordering of errors captured in instruction execution

The order in which errors are captured during instruction execution is

implementation dependent. Ordering can be in program order or in order of

detection.

—

209

210

211

212

Software intervention after instruction-induced error

Precision of the trap to signal an instruction-induced error for which

recovery requires software intervention is implementation dependent.

—

ERROR output signal

The causes and the semantics of ERROR output signal are implementation

dependent.

Error logging registers’ information

The information that the error logging registers preserves beyond the reset

induced by an ERROR signal is implementation dependent.

Trap with fatal error

Generation of a trap along with ERROR signal assertion upon detection of a

fatal error is implementation dependent.

213

214

215

AFSR.PRIV

—

SPARC64 V does not implement the AFSR.PRIVbit.

Enable/disable control for deferred traps

SPARC64 V does not implement a control feature for deferred traps.

Error barrier

DONEand RETRYinstructions may implicitly provide an error barrier

function as MEMBAR #Sync. Whether DONEand RETRYinstructions provide

an error barrier is implementation dependent.

216

217

data_access_error trap precision

data_access_error trap is always precise in SPARC64 V.

—

instruction_access_error trap precision

instruction_access_error trap is always precise in SPARC64 V.

SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002

TABLE C-1 SPARC64 V Implementation Dependencies (8 of 11)

Nbr

SPARC64 V Implementation Notes

Page

218

async_data_error

async_data_error trap is implemented in SPARC64 V, using tt= 40 . See

Appendix P for details.

219

Asynchronous Fault Address Register (AFAR) allocation

177, 178

SPARC64 V implements two AFARs:

• VA = 00 for an error occurring in D1 cache.

• VA = 08 for an error occurring in U2 cache.

220

221

Addition of logging and control registers for error handling

SPARC64 V implements various features for sustaining reliability. See

Appendix P for details.

—

Special/signalling ECCs

The method to generate “special” or “signalling” ECCs and whether

processor-ID is embedded into the data associated with special/ signalling

ECCs is implementation dependent.

222

TLB organization

SPARC64 V has the following TLB organization:

• Level-2 micro ITLB (uITLB), 32-way fully associative

• Level-1 micro DTLB (uDTLB), 32-way fully associative

• Level-2 IMMU-TLB—consisting of sITLB (set-associative Instruction TLB)

and fITLB (fully associative Instruction TLB).

• Level-2 DMMU-TLB—consisting of sDTLB (set-associative Data TLB) and

fDTLB (fully associative Data TLB).

223

224

TLB multiple-hit detection

On SPARC64 V, TLB multiple hit detection is supported. However, the

multiple hit is not detected at every TLB reference. When the micro-TLB

(uTLB), which is the cache of sTLB and fTLB, matches the virtual address,

the multiple hit in sTLB and fTLB is not detected. The multiple hit is

detected only when the micro-TLB mismatches and the main TLB is

referenced.

MMU physical address width

The SPARC64 V MMU implements 43-bit physical addresses. The PAfield of

the TTEholds a 43-bit physical address. Bits 46:43 of each TTE always read

as 0 and writes to them are ignored. The MMU translates virtual addresses

into 43-bit physical addresses. Each cache tag holds bits 42:6 of physical

addresses.

225

226

TLB locking of entries

In SPARC64 V, when a TTE with its lock bit set is written into TLB through

the Data In register, the TTE is automatically written into the corresponding

fully associative TLB and locked in the TLB. Otherwise, the TTE is written

into the corresponding sTLB of fTLB, depending on its page size.

TTE support for CV bit

SPARC64 V does not support the CVbit in TTE. Since I1 and D1 are

virtually indexed caches, unaliasing is supported by SPARC64 V. See also

impl. dep. #232.

Release 1.0, 1 July 2002

F. Chapter C

Implementation Dependencies

TABLE C-1 SPARC64 V Implementation Dependencies (9 of 11)

Nbr

SPARC64 V Implementation Notes

Page

227

TSB number of entries

SPARC64 V supports a maximum of 16 million entries in the common TSB

and a maximum of 32 million lines the Split TSB.

228

229

TSB_Hash supplied from TSB or context-ID register

TSB_Hashis generated from the context-ID register in SPARC64 V.

TSB_Base address generation

SPARC64 V generates the TSB_Baseaddress directly from the TLB

Extension Registers. By maintaining compatibility with UltraSPARC I/ II,

SPARC64 V provides mode flag MCNTL.JPS1_TSBP. When

MCNTL.JPS1_TSBP= 0, the TSB_Baseregister is used.

230

231

232

233

data_access_exception trap

SPARC64 generates data_access_exception only for the causes listed in

Section 7.6.1 of Commonality.

MMU physical address variability

SPARC64 V supports both 41-bit and 43-bit physical address mode. The

initial width of the physical address is controlled by OPSR.

DCU Control Register CP and CV bits

SPARC64 V does not implement CPand CVbits in the DCU Control

23, 91

TSB_Hash field

SPARC64 V does not implement TSB_Hash.

234

235

TLB replacement algorithm

For fTLB, SPARC64 V implements a pseudo-LRU. For sTLB, LRU is used.

TLB data access address assignment

The MMU TLB data-access address assignment and the purpose of the

address are implementation dependent.

236

TSB_Size field width

In SPARC64 V, TSB_Sizeis 4 bits wide, occupying bits 3:0 of the TSB

entries).

237

238

DSFAR/DSFSR for JMPL/RETURN mem_address_not_aligned

A mem_address_not_aligned exception that occurs during a JMPLor RETURN

instruction does not update either the D-SFARor D-SFSRregister.

89, 97

TLB page offset for large page sizes

On SPARC64 V, even for a large page, written data for TLB Data Register is

preserved for bits representing an offset in a page, so the data previously

written is returned regardless of the page size.

239

In SPARC64 V, VA<63:19> of IMMU ASI 55₁₆and DMMU ASI 5D₁₆are

ignored. An access to virtual addresses 40000₁₆to 60FF8₁₆is treated as an

access 00000₁₆to 20FF8₁₆

SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002

TABLE C-1 SPARC64 V Implementation Dependencies (10 of 11)

Nbr

SPARC64 V Implementation Notes

Page

240

DCU Control Register bits 47:41

SPARC64 V uses bit 41 for WEAK_SPCA, which enables/ disables memory

access in speculative paths.

241

242

Address Masking and DSFAR

SPARC64 V writes zeroes to the more significant 32 bits of DSFAR.

—

TLB lock bit

In SPARC64 V, only the fITLB and the fDTLB support the lock bit. The lock

bit in sITLB and sDTLB is read as 0 and writes to it are ignored.

243

Interrupt Vector Dispatch Status Register BUSY/NACK pairs

136

In SPARC64 V, 32 BUSY/ NACK pairs are implemented in the Interrupt

Vector Dispatch Status Register.

244

245

Data Watchpoint Reliability

No implementation-dependent features of SPARC64 V reduce the reliability

of data watchpoints.

Call/Branch displacement encoding in I-Cache

In SPARC64 V, the least significant 11 bits (bits 10:0) of a CALLor branch

(BPcc, FBPfcc, Bicc, BPr) instruction in an instruction cache are identical

to the architectural encoding (as they appear in main memory).

246

247

248

249

VA<38:29> for Interrupt Vector Dispatch Register Access

SPARC64 V ignores all 10 bits of VA<38:29> when the Interrupt Vector

Dispatch Register is written.

136

Interrupt Vector Receive Register SID fields

SPARC64 V obtains the interrupt source identifier SID_Lfrom the UPA

packet.

Conditions for fp_exception_other with unfinished_FPop

SPARC64 V triggers fp_exception_other with trap type unfinished_FPop

under the standard conditions described in Commonality Section 5.1.7.

Data watchpoint for Partial Store instruction

Watchpoint exceptions on Partial Store instructions occur conservatively on

SPARC64 V. The DCUCRData Watchpoint masks are only checked for

nonzero value (watchpoint enabled). The byte store mask (r[rs2]) in the

Partial Store instruction is ignored, and a watchpoint exception can occur

even if the mask is zero (that is, no store will take place).

250

PCR accessibility when PSTATE.PRIV = 0

20, 22, 58

In SPARC64 V, the accessibility of PCRwhen PSTATE.PRIV= 0 is

determined by PCR.PRIV. If PSTATE.PRIV= 0 and PCR.PRIV= 1, an

attempt to execute either RDPCRor WRPCRwill cause a privileged_action

exception. If PSTATE.PRIV= 0 and PCR.PRIV= 0, RDPCRoperates without

privilege violation and WRPCRgenerates a privileged_action exception only

when an attempt is made to change (that is, write 1 to) PCR.PRIV.

251

Reserved.

—

Release 1.0, 1 July 2002

F. Chapter C

Implementation Dependencies

TABLE C-1 SPARC64 V Implementation Dependencies (11 of 11)

Nbr

SPARC64 V Implementation Notes

Page

252

DCUCR.DC (Data Cache Enable)

SPARC64 V does not implement DCUCR.DC.

253

254

DCUCR.IC (Instruction Cache Enable)

SPARC64 V does not implement DCUCR.IC.

Means of exiting error_state

37, 146

The standard behavior of a SPARC64 V CPU upon entry into

error_stateis to reset itself by internally generating a watchdog_reset

(WDR). However, OPSRcan be set so that when error_state is entered, the

processor remains halted in error_stateinstead of generating a

watchdog_reset.

255

256

LDDFA with ASI E0 or E1 and misaligned destination register number

No exception is generated based on the destination register rd.

120

LDDFA with ASI E0₁₆or E1₁₆and misaligned memory address

For LDDFAwith ASI E0₁₆or E1₁and a memory address aligned on a 2 -byte

boundary, a SPARC64 V processor behaves as follows:

n ≥ 3 (≥ 8-byte alignment): no exception related to memory address

alignment is generated.

n = 2 (4-byte alignment): LDDF_mem_address_not_aligned exception is

generated.

n ≤ 1 (≤ 2-byte alignment): mem_address_not_aligned exception is

generated.

LDDFA with ASI C0₁₆–C5₁₆or C8₁₆–CD₁₆and misaligned memory address

120

257

For LDDFAwith C0₁₆–C5₁₆or C8₁₆–CD₁₆and a memory address aligned on

a 2 -byte boundary, a SPARC64 V processor behaves as follows:

n ≥ 3 (≥ 8-byte alignment): no exception related to memory address

alignment is generated.

n = 2 (4-byte alignment): LDDF_mem_address_not_aligned exception is

generated.

n ≤ 1 (≤ 2-byte alignment): mem_address_not_aligned exception is

generated.

ASI_SERIAL_ID

119

258

SPARC64 V provides an identification code for each processor.

SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002

F.APPENDIX

Formal Specification of the Memory

Models

Please refer to Appendix D of Commonality.

SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002

F.APPENDIX

Opcode Maps

Please refer to Appendix E in Commonality. TABLE E-1 lists the opcode map for the

SPARC64 V IMPDEP2instruction.

TABLE E-1 IMPDEP2(op = 2, op3 = 37₁₆)

var (instruction <8:7>)

(not used — reserved)

FMADDs

FMADDd

FMSUBs

FMSUBd

FNMADDs

SNMSUBd

FNMADDs

FNMSUBd

size

(instruction<6:5>)

(reserved for quad operations)

SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002

F.APPENDIX

Memory Management Unit

The Memory Management Unit (MMU) architecture of SPARC64 V conforms to the

MMU architecture defined in Appendix F of Commonality but with some model

dependency. See Appendix F in Commonality for the basic definitions of the

SPARC64 V MMU.

Section numbers in this appendix correspond to those in Appendix F of

Commonality. Figures and tables, however, are numbered consecutively.

This appendix describes the implementation dependencies and other additional

information about the SPARC64 V MMU. For SPARC64 V implementations, we first

list the implementation dependency as given in TABLE C-1 of Commonality, then

describe the SPARC64 V implementation.

F.1

Virtual Address Translation

IMPL. DEP. #222: TLB organization is JPS1 implementation dependent.

SPARC64 V has the following TLB organization:

■

Level-1 micro ITLB (uITLB), 32-way fully associative

Level-1 micro DTLB (uDTLB), 32-w ay fully associative

Level-2 IMMU-TLB consists of sITLB (set-associative Instruction TLB) and

fITLB (fully associative Instruction TLB).

■

Level-2 DMMU-TLB consists of sDTLB (set-associative Data TLB) and fDTLB

(fully associative Data TLB).

TABLE F-1 shows the organization of SPARC64 V TLBs.

Hardware contains micro-ITLB and micro-DTLB as the temporary memory of the

main TLBs, as shown in TABLE F-1. In contrast to the micro-TLBs, sTLB and fTLB

are called main TLBs.

The micro-TLBs are coherent to main TLBs and are not visible to software, with

the exception of TLB multiple hit detection. Hardware maintains the consistency

between micro-TLBs and main TLBs.

No other details on micro-TLB are provided because software cannot execute

direct operations to micro-TLB and its configuration is invisible to software.

TABLE F-1 Organization of SPARC64 V TLBs

Feature

sITLB and sDTLB

2048

fITLB and fDTLB

Entries

Associativity

2-way set associative

8 KB/ 4MB

Fully associative

8 KB/ 64 KB/ 512 KB/ 4 MB

Supported

Page size supported

Locked translation entry

Unlocked translation entry

Not supported

Supported

IMPL. DEP. #223: Whether TLB multiple-hit detections are supported in JPS1 is

implementation dependent.

On SPARC64 V, TLB multiple hit detection is supported. However, the multiple

hit is not detected at every TLB reference. When the micro-TLB (uTLB), which is

the cache of sTLB and fTLB, matches the virtual address, the multiple hit in sTLB

and fTLB is not detected. The multiple hit is detected only when the micro-TLB

mismatches and main TLB is referenced.

F.2

Translation Table Entry (TTE)

IMPL DEP. in Commonality TABLE F-1: TTE_Data bits 46–43 are implementation

dependent.

On SPARC64 V, TTE_Databits 46:43 are reserved.

IMPL. DEP. #224: Physical address width support by the MMU is implementation

dependent in JPS1; minimum PAwidth is 43 bits.

The SPARC64 V MMU implements 43-bit physical addresses. The PAfield of the

TTEholds a 43-bit physical address. The MMU translates virtual addresses into

43-bit physical addresses. Each cache tag holds bits 42:6 of physical addresses.

Bits 46:43 of each TTE always read as 0 and writes to them are ignored.

A cacheable access for a physical address ≥ 400 0000 0000₁₆always causes the

cache miss for the U2 cache and generates a UPA request for the cacheable access.

The urgent error ASI_UGESR.SDCis signalled after the UPA cacheable access is

requested.

SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002

The physical address length to be passed to the UPA interface is 41 bits or 43 bits,

as designated in the ASI_UPA_CONFIG.AMfield. When the 41-bit PAis specified

in ASI_UPA_CONFIG.AM, the most significant 2 bits of the CPU internal physical

address are discarded and only the remaining least significant 41 bits are passed

to the UPA address bus. If the discarded most significant 2 bits are not 0, the

urgent error ASI_UGESR.SDC is detected after the invalid address transfer to the

UPA interface. Otherwise, when the 43-bit PA is specified in

ASI_UPA_CONFIG.AM,the entire 43 bits of CPU internal physical address are

passed to the UPA address bus.

IMPL. DEP. #238: When page offset bits for larger page size (PA<15:13>, PA<18:13>,

and PA<21:13> for 64-Kbyte, 512-Kbyte, and 4-Mbyte pages, respectively) are stored

in the TLB, it is implementation dependent whether the data returned from those

fields by a Data Access read are zero or the data previously written to them.

On SPARC64 V, the data returned from PA<15:13>, PA<18:13>, and PA<21:13> for

64-Kbyte, 512-Kbyte, and 4-Mbyte pages, respectively, by a Data Access read are

the data previously written to them.

IMPL. DEP. #225: The mechanism by which entries in TLB are locked is

implementation dependent in JPS1.

In SPARC64 V, when a TTE with its lock bit set is written into TLB through the

Data In register, the TTE is automatically written into the corresponding fully

associative TLB and locked in the TLB. Otherwise, the TTE is written into the

corresponding sTLB or fTLB, depending on its page size.

IMPL. DEP. #242: An implementation containing multiple TLBs may implement the L

(lock) bit in all TLBs but is only required to implement a lock bit in one TLB for each

page size. If the lock bit is not implemented in a particular TLB, it is read as 0 and

writes to it are ignored.

In SPARC64 V, only the fITLB and the fDTLB support the lock bit as described in

TABLE F-1. The lock bit in sITLB and sDTLB is read as 0 and writes to it are

ignored.

IMPL. DEP. #226: Whether the CVbit is supported in TTEis implementation

dependent in JPS1. When the CVbit in TTEis not provided and the implementation

has virtually indexed caches, the implementation should support hardware

unaliasing for the caches.

In SPARC64 V, no TLB supports the CVbit in TTE. SPARC64 V supports hardware

unaliasing for the caches. The CVbit in any TLBentry is read as 0 and writes to it

are ignored.

Release 1.0, 1 July 2002

F. Chapter F

Memory Management Unit

F.3.3

F.4.2

TSB Organization

IMPL. DEP. #227: The maximum number of entries in a TSB is implementation

dependent in JPS1. See impl. dep. #228 for the limitation of TSB_sizein TSB

registers.

SPARC64 V supports a maximum of 16 million lines in the common TSB and a

maximum 32 million lines in the split TSB. The maximum number N in

FIGURE F-4 of Commonality is 16 million (16 * 2²⁰).

TSB Pointer Formation

IMPL. DEP. #228: Whether TSB_Hashis supplied from a TSB Extension Register or

from a context-ID register is implementation dependent in JPS1. Only for cases of

direct hash with context-ID can the width of the TSB_sizefield be wider than 3

bits.

On SPARC64 V, TSB_Hashis supplied from a context-ID register. The width of

the TSB_sizefield is 4 bits.

IMPL. DEP. #229: Whether the implementation generates the TSB Base address by

exclusive-ORing the TSB Base Register and a TSB Extension Register or by taking the

TSB_Basefield directly from the TSB Extension Register is implementation

dependent in JPS1. This implementation dependency is only to maintain

compatibility with the TLB miss handling software of UltraSPARC I/ II.

On SPARC64 V, when ASI_MCNTL.JPS1_TSBP= 1, the TSB Base address is

generated by taking TSB_Basefield directly from the TSB Extension Register.

TSB Pointer Formation

On SPARC64 V, the number N in the following equations ranges from 0 to 15; N is

defined to be the TSB_Sizefield of the TSB Base or TSB Extension Register.

SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002

8K_POINTER = TSB_Extension[63:14+N] 0 (VA[21+N:13] ⊕ TSB_Hash)

0000

64K_POINTER = TSB_Extension[63:14+N]

TSB_Hash) 0000

(VA[24+N:16] ⊕

Value of TSB_Hash for both a shared TSB and a split TSB

When 0 <= N <= 4,

TSB_Hash= context_register[N+8:0]

Otherwise, when 5 <= N <= 15,

TSB_Hash[ 12:0 ] = context_register[ 12:0 ]

TSB_Hash[ N+8:13 ] = 0 ( N-4 bits zero )

F.5

Faults and Traps

IMPL. DEP. #230: The cause of a data_access_exception trap is implementation

dependent in JPS1, but there are several mandatory causes of data_access_exception

trap.

SPARC64 V signals a data_access_exception for the causes, as defined in F.5 in

Commonality. However, caution is needed to deal with an invalid ASI. See

Section F.10.9 for details.

IMPL. DEP. #237: Whether the fault status and/ or address (DSFSR/ DSFAR) are

captured when mem_address_not_aligned is generated during a JMPLor RETURN

instruction is implementation dependent.

On SPARC64 V, the fault status and address (DSFSR/ DSFAR) are not captured

when a mem_address_not_aligned exception is generated during a JMPLor RETURN

instruction.

Additional information: On SPARC64 V, the two precise traps—

instruction_access_error and data_access_error—are recorded by the MMU in addition

to those in TABLE F-2 of Commonality. A modification (the two traps are added) of

that table is shown below.

TABLE F-2

MMU Trap Types, Causes, and Stored State Register Update Policy

Registers Updated

(Stored State in MMU)

I-MMU

Tag

D-MMU

D-SFSR, Tag

Ref #Trap Name

Trap Cause

I-SFSR Access SFAR

Access Trap Type

fast_instruction_access_MMU_miss

I-TLB miss

64₁₆–67₁₆

Release 1.0, 1 July 2002

F. Chapter F

Memory Management Unit

Nokia 3589i User Manual
Motorola IN VEHICLE PHONE M930 User Manual
Melissa Take 2 ME2TMBCHR User Manual
LG Electronics LG Lifes Good Cell Phone 800G User Manual
Kambrook KCR30 User Manual
Hamilton Beach 42884 User Manual
Fagor America MQC A10 US User Manual
Echo Bear Cat 71125 User Manual
Black Box NetPower 26542 User Manual
Apple Webcam User Manual