SPARC JPS1
Implementation Supplement:
Fujitsu SPARC64 V
Fujitsu Limited
Release 1.0, 1 July 2002
Fujitsu Limited
4-1-1 Kamikodanaka
Nahahara-ku, Kawasaki, 211-8588
Japan
Part No. 806-6755-1.0
3
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002
F.CHAPTER
Component Overview 4
Execution Unit (EU) 6
Storage Unit (SU) 7
4. Data Formats 15
Tick (TICK) Register 19
Privileged Registers 19
Trap State (TSTATE) Register 19
Version (VER) Register 20
Ancillary State Registers (ASRs) 20
Registers Referenced Through ASIs 22
i
Floating-Point Deferred-Trap Queue (FQ) 24
IU Deferred-Trap Queue 24
Processor Pipeline 31
Instruction Fetch Stages 31
Issue Stages 33
Execution Stages 33
Completion Stages 34
Trap-Table Entry Addresses 38
Trap Type (TT) 38
Details of Supported Traps 39
Trap Processing 39
Exception and Interrupt Descriptions 39
SPARC V9 Implementation-Dependent, Optional Traps That Are
Mandatory in SPARC JPS1 39
ii
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002
SPARC JPS1 Implementation-Dependent Traps 39
8. Memory Models 41
Overview 42
SPARC V9 Memory Model 42
Mode Control 42
Synchronizing Instruction and Data Memory 42
Read State Register 58
SHUTDOWN (VIS I) 58
Write State Register 59
Deprecated Instructions 59
Store Barrier 59
B. IEEE Std 754-1985 Requirements for SPARC V9 61
Traps Inhibiting Results 61
Floating-Point Nonstandard Mode 61
fp_exception_other Exception (ftt=unfinished_FPop) 62
Operation Under FSR.NS = 1 65
C. Implementation Dependencies 69
Definition of an Implementation Dependency 69
Hardware Characteristics 70
Implementation Dependency Categories 70
List of Implementation Dependencies 70
Release 1.0, 1 July 2002
F. Chapter
Contents
iii
E. Opcode Maps 83
Reset, Disable, and RED_state Behavior 91
Internal Registers and ASI operations 92
I/ D TLB Data In, Data Access, and Tag Read Registers 93
MMU Bypass 104
K. Programming with the Memory Models 115
L. Address Space Identifiers 117
SPARC64 V ASI Assignments 117
Special Memory Access ASIs 119
Barrier Assist for Parallel Processing 121
Interface Definition 121
ASI Registers 122
M. Cache Organization 125
Cache Types 125
Level-1 Instruction Cache (L1I Cache) 126
iv
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002
Cache Control/ Status Instructions 128
Flush Level-1 Instruction Cache (ASI_FLUSH_L1I) 129
Level-2 Cache Control Register (ASI_L2_CTRL) 130
L2 Diagnostics Tag Read (ASI_L2_DIAG_TAG_READ) 130
L2 Diagnostics Tag Read Registers (ASI_L2_DIAG_TAG_READ_REG) 131
Interrupt Global Registers 136
Interrupt-Related ASR Registers 136
Interrupt Vector Dispatch Register 136
Interrupt Vector Dispatch Status Register 136
Interrupt Vector Receive Register 136
CPU Fatal Error state 141
Processor State after Reset and in RED_state 141
Operating Status Register (OPSR) 146
Hardware Power-On Reset Sequence 147
Firmware Initialization Sequence 147
P. Error Handling 149
Error Classification 149
Fatal Error 149
Release 1.0, 1 July 2002
F. Chapter
Contents
v
error_state Transition Error 150
Urgent Error 150
Restrainable Error 152
Action and Error Control 153
Action of async_data_error (ADE) Trap 168
Instruction End-Method at ADE Trap 170
Expected Software Handling of ADE Trap 171
Instruction Access Errors 173
Data Access Errors 173
Restrainable Errors 174
ASI_ASYNC_FAULT_STATUS (ASI_AFSR) 174
ASI_ASYNC_FAULT_ADDR_D1 177
ASI_ASYNC_FAULT_ADDR_U2 178
Expected Software Handling of Restrainable Errors 179
Handling of Internal Register Errors 181
Register Error Handling (Excluding ASRs and ASI Registers) 181
ASR Error Handling 182
ASI Register Error Handling 183
Cache Error Handling 188
Handling of a Cache Tag Error 188
Handling of an I1 Cache Data Error 190
Handling of a D1 Cache Data Error 190
Handling of a U2 Cache Data Error 192
Automatic Way Reduction of I1 Cache, D1 Cache, and U2 Cache 193
vi
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002
Handling of TLB Entry Errors 195
Automatic Way Reduction of sTLB 196
Handling of Extended UPA Bus Interface Error 197
Handling of Extended UPA Address Bus Error 197
Handling of Extended UPA Data Bus Error 197
Trap-Related Statistics 206
MMU Event Counters 207
Cache Event Counters 208
UPA Event Counters 210
UPA PortID Register 214
UPA Config Register 215
S. Summary of Differences between SPARC64 V and UltraSPARC-III 219
Bibliography 223
General References 223
Index 225
Release 1.0, 1 July 2002
F. Chapter
Contents
vii
viii
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002
F.CHAPTER
1
Overview
1.1
Navigating the SPARC64 V
Programming Specification as follows.
1. Familiarize yourself with the SPARC64 V processor and its components by
reading these sections:
■ The SPARC64 V processor on page 2
■ Processor Pipeline on page 31
2. Study the terminology in Chapter 2, Definitions:
3. For details of architectural changes, see the remaining chapters in this
Implementation Supplement as your interests direct.
For this revision, we added new appendixes: Appendix R, UPA Programmer’s Model,
and Appendix S, Summary of Differences between SPARC64 V and UltraSPARC-III.
1.2
Fonts and Notational Conventions
Please refer to Section 1.2 of Commonality for font and notational conventions.
1
1.3
The SPARC64 V processor
The SPARC64 V processor is a high-performance, high-reliability, and high-integrity
processor that fully implements the instruction set architecture that conforms to
SPARC V9, as described in JPS1 Commonality. In addition, the SPARC64 V processor
implements the following features:
■ 64-bit virtual address space and 43-bit physical address space
■ Advanced RAS features that enable high-integrity error handling
Microarchitecture for High Performance
The SPARC64 V is an out-of-order execution superscalar processor that issues up to
four instructions per cycle. Instructions in the predicted path are issued in program
order and are stored temporarily in reservation stations until they are dispatched out
of program order to appropriate execution units. Instructions commit in program
order when no exceptional conditions occur during execution and all prior
instructions commit (that is, the result of the instruction execution becomes visible).
Out-of-order execution in SPARC64 V contributes to high performance.
SPARC64 V implements a large branch history buffer to predict its instruction path.
The history buffer is large enough to sustain a good prediction rate for large-scale
programs such as DBMS and to support the advanced instruction fetch mechanism
of SPARC64 V. This instruction fetch scheme predicts the execution path beyond the
multiple conditional branches in accordance with the branch history. It then tries to
prefetch instructions on the predicted path as much as possible to reduce the effect
of the performance penalty caused by instruction cache misses.
High Integration
SPARC64 V integrates an on-board, associative, level-2 cache. The level-2 cache is
unified for instruction and data. It is the lowest layer in the cache hierarchy.
This integration contributes to both performance and reliability of SPARC64 V. It
enables shorter access time and more associativity and thus contributes to higher
performance. It contributes to higher reliability by eliminating the external
connections for level-2 cache.
High Reliability and High Integrity
SPARC64 V implements the following advanced RAS features for reliability and
integrity beyond that of ordinary microprocessors.
2
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002
1. Advanced RAS features for caches
■ Strong cache error protection:
■
ECC protection for D1 (Data level 1) cache data, U2 (unified level 2) cache data,
and the U2 cache tag.
■
■
Parity protection for I1 (Instruction level 1) cache data.
Parity protection and duplication for the I1 cache tag and the D1 cache tag.
■ Automatic correction of all types of single-bit error:
■
■
■
Automatic single-bit error correction for the ECC protected data.
Invalidation and refilling of I1 cache data for the I1 cache data parity error.
Copying from duplicated tag for I1 cache tag and D1 cache tag parity errors.
■ Dynamic way reduction while cache consistency is maintained.
■ Error marking for cacheable data uncorrectable errors:
■
Special error-marking pattern for cacheable data with uncorrectable errors. The
identification of the module that first detects the error is embedded in the
special pattern.
■
Error-source isolation with faulty module identification in the special error-
marking. The identification information enables the processor to avoid
repetitive error logging for the same error cause.
2. Advanced RAS features for the core
■ Strong error protection:
■
Parity protection for all data paths.
■
Parity protection for most of software-visible registers and internal temporary
registers.
■
Parity prediction or residue checking for the accumulator output.
■ Hardware instruction retry
■ Support for software instruction retry (after failure of hardware instruction retry)
■ Error isolation for software recovery:
■
■
■
Error indication for each programmable register group.
Indication of retryability of the trapped instruction.
Use of different error traps to differentiate degrees of adverse effects on the
CPU and the system.
3. Extended RAS interface to software
■ Error classification according to the severity of the effect on program execution:
■
Urgent error (nonmaskable): Unable to continue execution without OS
intervention; reported through a trap.
■
Restrainable error (maskable): OS controls whether the error is reported
through a trap, so error does not directly affect program execution.
■ Isolated error indication to determine the effect on software
Release 1.0, 1 July 2002
F. Chapter 1
Overview
3
■ Asynchronous data error (ADE) trap for additional errors:
■
Relaxed instruction end method (precise, retryable, not retryable) for the
async_data_error exception to indicate how the instruction should end; depends
on the executing instruction and the detected error.
■
■
Some ADE traps that are deferred but retryable.
Simultaneous reporting of all detected ADE errors at the error barrier for correct
handling of retryability.
1.3.1
The SPARC64 V processor contains these components.
■ Instruction control Unit (IU)
■ Execution Unit (EU)
■ Storage Unit (SU)
■ Secondary cache and eXternal access Unit (SXU)
FIGURE 1-1 illustrates the major units; the following subsections describe them.
4
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002
Extended UPA Bus
E-Unit
SX-Unit
UPA interface logic
MoveOut buffer
MoveIn buffer
U2$
tag
U2$ data
2M 4-way
ALUs
ALU
Input
Registers
and
Output
Registers
EXA
EXB
FLA
FLB
EAGA
S-Unit interface
S-Unit
SX interface
EAGB
SX order queue Store queue
GUB
GPR
FUB
FPR
I-TLB
tag
data
D-TLB
2048
+ 32
entry
tag
data
2048
+ 32
entry
Level-1 I cache
128 KB, 2-way
Level-1 D cache
128 KB, 2-way
I-Unit
E-unit
Commit stack entry
Reservation stations
Instruction Instruction
PC
nPC
CCR
FSR
fetch
buffer
control
logic
pipeline
Branch
history
FIGURE 1-1 SPARC64 V Major Units
Release 1.0, 1 July 2002
F. Chapter 1
Overview
5
1.3.2
Instruction Control Unit (IU)
The IU predicts the instruction execution path, fetches instructions on the predicted
path, distributes the fetched instructions to appropriate reservation stations, and
dispatches the instructions to the execution pipeline. The instructions are executed
out of order, and the IU commits the instructions in order. Major blocks are defined
in TABLE 1-1.
TABLE 1-1
Name
Instruction Control Unit Major Blocks
Description
Instruction fetch pipeline Five stages: fetch address generation, iTLB access, iTLB match,
I-Cache fetch, and a write to I-buffer.
Branch history
16K entries, 4-way set associative.
Six entries, 32 bytes/ entry.
Instruction buffer
Reservation station
Six reservation stations to hold instructions until they can
execute: RSBRfor branch and the other control-transfer
instructions; RSAfor load/ store instructions; RSEAand RSEBfor
integer arithmetic instructions; RSFAand RSFBfor floating-point
arithmetic and VIS instructions.
Commit stack entries
information about instructions issued but not yet committed.
PC, nPC, CCR, FSR
Program-visible registers for instruction execution control.
1.3.3
Execution Unit (EU)
The EU carries out execution of all integer arithmetic, logical, shift instructions, all
floating-point instructions, and all VIS graphic instructions. TABLE 1-2 describes the
EU major blocks.
TABLE 1-2
Name
Execution Unit Major Blocks
Description
General register (gr) renaming
register file (GUB: gr update
buffer)
Thirty-two entries, 8 read ports, 2 write ports
Gr architecture register file (GPR) 160 entries, 1 read port, 2 write ports
Floating-point (fr) renaming
register file (FUB: fr update
buffer)
Thirty-two entries, 8 read ports, 2 write ports
Fr architecture register file (FPR) Thirty-two entries,
6 read ports, 2 write ports
EU control logic
Controls the instruction execution stages: instruction
selection, register read, and execution.
6
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002
TABLE 1-2
Name
Execution Unit Major Blocks (Continued)
Description
Interface registers
Input/ output registers to other units.
Two integer execution pipelines 64-bit ALU and shifters.
(EXA, EXB)
Two floating-point and graphics Each floating-point execution pipeline can execute floating
execution pipelines (FLA, FLB) point multiply, floating point add/ sub, floating-point
multiply and add, floating point div/ sqrt, and floating-
point graphics instruction.
Two virtual address adders for
memory access pipeline (EAGA,
EAGB)
Two 64-bit virtual addresses for load/ store.
1.3.4
Storage Unit (SU)
The SU handles all sourcing and sinking of data for load and store instructions.
TABLE 1-3 describes the SU major blocks.
TABLE 1-3
Name
Storage Unit Major Blocks
Description
Instruction level-1 cache
Data level-1 cache
128-Kbyte, 2-way associative, 64-byte line; provides low latency
instruction source
128-Kbyte, 2-way associative, 64-byte line, writeback; provides
Instruction Translation
Buffer
1024 entries, 2-way associative TLB for 8-Kbyte pages,
1
1024 entries, 2-way associative TLB for 4-Mbyte pages ,
32 entries, fully associative TLB for unlocked 64-Kbyte, 512-
Kbyte, 4-Mbyte1 pages and locked pages in all sizes.
Data Translation Buffer
1024 entries, 2-way associative TLB for 8-Kbyte pages,
1024 entries, 2-way associative TLB for 4-Mbyte pages1,
32 entries, fully associative TLB for unlocked 64-Kbyte, 512-
Kbyte, 4-Mbyte1 pages and locked pages in all sizes.
Store queue
Decouples the pipeline from the latency of store operations.
Allows the pipeline to continue flowing while the store waits for
data, and eventually writes into the data level 1 cache.
1. Unloced 4-Mbyte page entry is stored either in 2-way associative TLB or fully associative
TLB exclusively, depending on the setting.
Release 1.0, 1 July 2002
F. Chapter 1
Overview
7
1.3.5
Secondary Cache and External Access Unit (SXU)
The SXU controls the operation of unified level-2 caches and the external data access
interface (extended UPA interface). TABLE 1-4 describes the major blocks of the SXU.
TABLE 1-4
Name
Secondary Cache and External Access Unit Major Blocks
Description
Unified level-2 cache
Movein buffer
2-Mbyte, 4-way associative, 64-byte line, writeback; provides low
latency data source for both instruction level-1 cache and data
level-1 cache.
Sixteen entries, 64-bytes/ entry; catches returning data from
memory system in response to the cache line read request. A
maximum of 16 outstanding cache read operations can be issued.
Moveout buffer
Eight entries, 64-bytes/ entry; holds writeback data. A maximum
of 8 outstanding writeback requests can be issued.
Extended UPA interface
control logic
Send/ receive transaction packets to/ from Extended UPA
interface connected to the system.
8
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002
F.CHAPTER
2
Definitions
This chapter defines concepts unique to the SPARC64 V, the Fujitsu implementation
of SPARC JPS1. For definition of terms that are common to all implementations,
please refer to Chapter 2 of Commonality.
committed Term applied to an instruction when it has completed without error and all
prior instructions have completed without error and have been committed. When
an instruction is committed, the state of the machine is permanently changed
to reflect the result of the instruction; the previously existing state is no longer
needed and can be discarded.
completed Term applied to an instruction after it has finished, has sent a nonerror status to
the issue unit, and all of its source operands are nonspeculative. Note:
Although the state of the machine has been temporarily altered by completion
of an instruction, the state has not yet been permanently changed and the old
state can be recovered until the instruction has been committed.
executed Term applied to an instruction that has been processed by an execution unit
such as a load unit. An instruction is in execution as long as it is still being
processed by an execution unit.
fetched Term applied to an instruction that is obtained from the I2 instruction cache or
from the on-chip internal cache and sent to the issue unit.
finished Term applied to an instruction when it has completed execution in a functional
unit and has forwarded its result onto a result bus. Results on the result bus are
transferred to the register file, as are the waiting instructions in the instruction
queues.
initiated Term applied to an instruction when it has all of the resources that it needs (for
example, source operands) and has been selected for execution.
instruction dispatch Synonym: instruction initiation.
instruction issued Term applied to an instruction when it has been dispatched to a reservation
station.
9
instruction retired Term applied to an instruction when all machine resources (serial numbers,
renamed registers) have been reclaimed and are available for use by other
instructions. An instruction can only be retired after it has been committed.
instruction stall Term applied to an instruction that is not allowed to be issued. Not every
instruction can be issued in a given cycle. The SPARC64 V implementation
imposes certain issue constraints based on resource availability and program
requirements.
issue-stalling
instruction An instruction that prevents new instructions from being issued until it has
committed.
machine sync The state of a machine when all previously executing instructions have
committed; that is, when no issued but uncommitted instructions are in the
machine.
Memory Management
Unit (MMU) Refers to the address translation hardware in SPARC64 V that translates 64-bit
virtual address into physical address. The MMU is composed of the mITLB,
mDTLB, uITLB, uDTLB, and the ASI registers used to manage address
translation.
mTLB Main TLB. Split into I and D, called mITLB and mDTLB, respectively. Contains
address translations for the uITLB and uDTLB. When the uITLB or uDTLB do
not contain a translation, they ask the mTLB for the translation. If the mTLB
contains the translation, it sends the translation to the respective uTLB. If the
mTLB does not contain the translation, it generates a fast access exception to a
software translation trap handler, which will load the translation information
(TTE) into the mTLB and retry the access. See also TLB.
uDTLB Micro Data TLB. A small, fully associative buffer that contains address
translations for data accesses. Misses in the uDTLB are handled by the mTLB.
uITLB Micro Instruction TLB. A small, fully associative buffer that contains address
translations for instruction accesses. Misses in the uTLB are handled by the
mTLB.
nonspeculative A distribution system whereby a result is guaranteed known correct or an
operand state is known to be valid. SPARC64 V employs speculative
distribution, meaning that results can be distributed from functional units
before the point at which guaranteed validity of the result is known.
reclaimed The status when all instruction-related resources that were held until commit
have been released and are available for subsequent instructions. Instruction
resources are usually reclaimed a few cycles after they are committed.
rename registers A large set of hardware registers implemented by SPARC64 V that are invisible
to the programmer. Before instructions are issued, source and destination
registers are mapped onto this set of rename registers. This allows instructions
that normally would be blocked, waiting for an architected register, to proceed
10
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002
in parallel. When instructions are committed, results in renamed registers are
posted to the architected registers in the proper sequence to produce the correct
program results.
scan A method used to initialize all of the machine state within a chip. In a chip that
has been designed to be scannable, all of the machine state is connected in one
or several loops called “scan rings.” Initialization data can be scanned into the
chip through the scan rings. The state of the machine also can be scanned out
through the scan rings.
reservation station A holding location that buffers dispatched instructions until all input operands
are available. SPARC64 V implements dataflow execution based on operand
availability. When operands are available, the instructions in the reservation
station are scheduled for execution. Reservation stations also contain special
tag-matching logic that captures the appropriate operand data. Reservation
stations are sometimes referred to as queues (for example, the integer queue).
speculative A distribution system whereby a result is not guaranteed as known to be
correct or an operand state is not known to be valid. SPARC64 V employs
speculative distribution, meaning results can be distributed from functional
units before the point at which guaranteed validity of the result is known.
superscalar An implementation that allows several instructions to be issued, executed, and
committed in one clock cycle. SPARC64 V issues up to 4 instructions per clock
cycle.
sync Synonym: machine sync.
syncing instruction An instruction that causes a machine sync. Thus, before a syncing instruction is
issued, all previous instructions (in program order) must have been committed.
At that point, the syncing instruction is issued, executed, completed, and
committed by itself.
TLB Translation lookaside buffer.
Release 1.0, 1 July 2002
F. Chapter 2
Definitions
11
12
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002
14
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002
16
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002
F.CHAPTER
5
Registers
The SPARC64 V processor includes two types of registers: general-purpose—that is,
working, data, control/ status—and ASI registers.
The SPARC V9 architecture also defines two implementation-dependent registers:
the IU Deferred-Trap Queue and the Floating-Point Deferred-Trap Queue (FQ);
instruction execution are precise, and there are several disrupting traps caused by
asynchronous events, such as interrupts, asynchronous error conditions, and
RED_stateentry traps.
Commonality. For easier referencing, this chapter follows the organization of
Chapter 5 in Commonality.
For information on MMU registers, please refer to Section F.10, Internal Registers and
ASI operations, on page 92.
The chapter contains these sections:
■ Nonprivileged Registers on page 17
■ Privileged Registers on page 19
5.1
Nonprivileged Registers
Most of the definitions for the registers are as described in the corresponding
sections of Commonality. Only SPARC64 V-specific features are described in this
section.
17
5.1.7
Floating-Point State Register (FSR)
Please refer to Section 5.1.7 of Commonality for the description of FSR.
The sections below describe SPARC64 V-specific features of the FSRregister.
SPARC V9 defines the FSR.NSbit which, when set to 1, causes the FPU to produce
implementation-dependent results that may not conform to IEEE Std 754-1985.
SPARC64 V implements this bit.
When FSR.NS= 1, denormal input operands and denormal results that would
otherwise trap are flushed to 0 of the same sign and an inexact exception is signalled
(that may be masked by FSR.TEM.NXM). See Section B.6, Floating-Point Nonstandard
Mode, on page 61 for details.
When FSR.NS= 0, the normal IEEE Std 754-1985 behavior is implemented.
FSR_version (ver)
For each SPARC V9 IU implementation (as identified by its VER.implfield), there
may be one or more FPU implementations or none. This field identifies the
values. Consult the SPARC64 V Data Sheet for the setting of FSR.verfor your
chipset.
FSR_floating-point_trap_type (ftt)
The complete conditions under which SPARC64 V triggers fp_exception_other with
trap type unfinished_FPop is described in Section B.6, Floating-Point Nonstandard Mode,
on page 61 (impl. dep. #248).
FSR_current_exception (cexc)
Bits 4 through 0 indicate that one or more IEEE_754 floating-point exceptions were
generated by the most recently executed FPop instruction. The absence of an
exception causes the corresponding bit to be cleared.
In SPARC64 V, the cexcbits are set according to the following pseudocode:
if (<LDFSR or LDXFSR commits>)
<update using data from LDFSR or LDXFSR>;
else if (<FPop commits with ftt = 0>)
<update using value from FPU>
18
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002
else if (<FPop commits with IEEE_754_exception>)
<set one bit in the CEXC field as supplied by FPU>;
else if (<FPop commits with unfinished_FPop error>)
<no change>;
else if (<FPop commits with unimplemented_FPop error>)
<no change>;
else
<no change>;
FSR Conformance
SPARC V9 allows the TEM, cexc, and aexcfields to be implemented in hardware in
either of two ways (both of which comply with IEEE Std 754-1985). SPARC64 V
follows case (1); that is, it implements all three fields in conformance with IEEE Std
754-1985. See FSR Conformance in Section 5.1.7 of Commonality for more
information about other implementation methods.
5.1.9
Tick (TICK) Register
SPARC64 V implements TICK.counterregister as a 63-bit register (impl. dep.
#105).
Implementation Note – On SPARC64 V, the counterpart of the value returned
when the TICKregister is read is the value of TICK.counterwhen the RDTICK
instruction is executed. The difference between the countervalues read from the
TICKregister on two reads reflects the number of processor cycles executed between
the executions of the RDTICKinstructions, not their commits. In longer code
sequences, the difference between this value and the value that would have been
obtained when the instructions are committed would have been small.
5.2
Privileged Registers
Please refer to Section 5.2 of Commonality for the description of privileged registers.
5.2.6
Trap State (TSTATE) Register
SPARC64 V implements only bits 2:0 of the TSTATE.CWPfield. Writes to bits 4 and 3
are ignored, and reads of these bits always return zeroes.
Release 1.0, 1 July 2002
F. Chapter 5
Registers
19
be performed, since it will take the SPARC64 V into RED_statewithout the
required sequencing.
5.2.9
Version (VER) Register
TABLE 5-1 shows the values for the VERregister for SPARC64 V.
TABLE 5-1 VERRegister Encodings
Bits
Field
Value
63:48
47:32
31:24
15:8
4:0
manuf
impl
000416 (impl. dep. #104)
5 (impl. dep. #13)
mask
n (The value of n depends on the processor chip version)
maxtl
maxwin
5
7
The manuffield contains Fujitsu’s 8-bit JEDEC code in the lower 8 bits and zeroes in
the upper 8 bits. The manuf, impl, and maskfields are implemented so that they
may change in future SPARC64 V processor versions. The maskfield is incremented
by 1 any time a programmer-visible revision is made to the processor. See the
SPARC64 V Data Sheet to determine the current setting of the maskfield.
5.2.11
Ancillary State Registers (ASRs)
Please refer to Section 5.2.11 of Commonality for details of the ASRs.
Performance Control Register (PCR) (ASR 16)
SPARC64 V implements the PCRregister as described in SPARC JPS1 Commonality,
In SPARC64 V, the accessibility of PCRwhen PSTATE.PRIV= 0 is determined by
PCR.PRIV. If PSTATE.PRIV= 0 and PCR.PRIV= 1, an attempt to execute either
RDPCRor WRPCRwill cause a privileged_action exception. If PSTATE.PRIV= 0 and
PCR.PRIV= 0, RDPCRoperates without privilege violation and WRPCRcauses a
privileged_action exception only when an attempt is made to change (that is, write 1
to) PCR.PRIV(impl. dep. #250).
See Appendix Q, Performance Instrumentation, for a detailed discussion of the PCR
and PICregister usage and event count definitions.
20
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002
The Performance Control Register in SPARC64 V is illustrated in FIGURE 5-1 and
described in TABLE 5-2.
NC
0
OVF
0
OVRO 0
0
SC
0
SU
0
SL
ULRO UT ST PRIV
63
48 47 32 31 27 26
25 24 22 21
18 17 16 11 10
9
4
3
2
1
0
20
FIGURE 5-1 SPARC64 V Performance Control Register (PCR) (ASR 16)
TABLE 5-2 PCRBit Description
Bit
Field
Description
47:32
OVF
Overflow Clear/ Set/ Status. Used to read counter overflow status (via RDPCR) and clear
or set counter overflow status bits (via WRPCR). PCR.OVFis a SPARC64 V-specific field
(impl. dep. #207).
The following figure depicts the bit layout of SPARC64 V OVFfield for four counter
pairs. Counter status bits are cleared on write of 0 to the appropriate OVFbit.
0
U3 L3 U2 L2 U1 L1 U0 L0
15
7
6
5
4
3
2
1
0
26
OVRO
Overflow read-only. Write-only/ read-as-zero field specifying PCR.OVFupdate behavior
for WRPCR.PCR. The OVROfield is implementation -dependent (impl. dep. #207).
WRPCR.PCRwith PCR.OVRO = 1inhibits updating of PCR.OVFfor the current write
only. The intention of PCR.OVROis to write PCRwhile preserving current PCR.OVF
value. PCR.OVFis maintained internally by hardware, so a subsequent RDPCR.PCR
returns accurate overflow status at the time.
24:22
20:18
NC
SC
Number of counter pairs. Three-bit, read-only field specifying the number of counter
pairs, encoded as 0–7 for 1–8 counter pairs (impl. dep. #207).
For SPARC64 V, the hardcoded value of NCis 3 (indicating presence of 4 counter pairs).
Select PIC. In SPARC64 V, three-bit field specifying which counter pair is currently
selected as PIC(ASR 17) and which SU/ SLvalues are visible to software. On write,
PCR.SCselects which counter pair is updated (unless PCR.ULROis set; see below). On
read, PCR.SCselects which counter pair is to be read through PIC(ASR 17).
16:11
9:4
3
SU
Defined (as S1) in SPARC JPS1 Commonality.
SL
Defined (as S0) in SPARC JPS1 Commonality.
ULRO
Implementation-dependent field (impl. dep. #207) that specifies whether SU/ SLare
read-only. In SPARC64 V, this field is write-only/ read-as-zero, specifying update
behavior of SU/ SLon write. When PCR.ULRO= 1, SU/ SLare considered as read-only;
the values set on PCR.SU/PCR.SLare not written into SU/SL. When PCR.ULRO= 0,
SU/SLare updated. PCR.ULROis intended to switch visible PICby writing PCR.SC,
without affecting current selection of SU/SLof that PIC. On PCRread, PCR.SU/PCR.SL
always shows the current setting of the PICregardless of PCR.ULRO.
2
1
UT
ST
Defined in SPARC JPS1 Commonality.
Defined in SPARC JPS1 Commonality.
Release 1.0, 1 July 2002
F. Chapter 5
Registers
21
TABLE 5-2 PCRBit Description (Continued)
Bit
Field
Description
0
PRIV
Defined in SPARC JPS1 Commonality, with the additional function of controlling PCR
accessibility as described above (impl. dep. #250).
Performance Instrumentation Counter (PIC) Register (ASR
17)
The PICregister is implemented as described in SPARC JPS1 Commonality.
Four PICs are implemented in SPARC64 V. Each is accessed through ASR 17, using
PCR.SCas a select field. Read/ write access to the PICwill access the PICU/ PICL
counter pair selected by PCR. For PICU/ PICLencodings of specific event counters,
see Appendix Q, Performance Instrumentation.
Counter Overflow.On overflow, counters wrap to 0, SOFTINTregister bit 15 is set,
and an interrupt level-15 exception is generated. The counter overflow trap is
triggered on the transition from value FFFFFFFF16 to value 0. If multiple overflows
are generated simultaneously, then multiple overflow status bits will be set. If
overflow status bits are already set, then they remain set on counter overflow.
Overflow status bits are cleared by software writing 0 to the appropriate bit of
PCR.OVFand may be set by writing 1 to the appropriate bit. Setting these bits by
software does not generate a level 15 interrupt.
Dispatch Control Register (DCR) (ASR 18)
The DCRis not implemented in SPARC64 V. Zero is returned on read, and writes to
the register are ignored. The DCRis a privileged register; attempted access by
nonprivileged (user) code generates a privileged_opcode exception.
5.2.12
Registers Referenced Through ASIs
Data Cache Unit Control Register (DCUCR)
ASI 4516 (ASI_DCU_CONTROL_REGISTER), VA = 016
.
The Data Cache Unit Control Register contains fields that control several memory-
related hardware functions. The functions include Instruction, Prefetch, write and
data caches, MMUs, and watchpoint setting. SPARC64 V implements most of
DCUCUR’s functions described in Section 5.2.12 of Commonality.
22
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002
After a power-on reset (POR), all fields of DCUCR, including implementation-
dependent fields, are set to 0. After a WDR, XIR, or SIRreset, all fields of DCUCR,
including implementation-dependent fields, are set to 0.
The Data Cache Unit Control Register is illustrated in FIGURE 5-2 and described in
TABLE 5-3. In the table, bits are grouped by function rather than by strict bit sequence.
—
0
Implementation dependent
WEAK_SPCA PM
VM PR PW VR VW
21 20
DM IM
0
0
0
—
63
50
49 48 47 42
41 40 33 32 25 24 23 22
4
3
2
1
0
FIGURE 5-2 DCU Control Register Access Data Format (ASI 45
)
16
TABLE 5-3
Bits
DCUCR Description
Field
Type
Use — Description
49:48
CP, CV
RW
Not implemented in SPARC64 V (impl. dep. #232). It reads as 0 and writes to
it are ignored.
47:42
41
impl. dep.
Not used. It reads as 0 and writes to it are ignored.
WEAK_SPCA
RW
Used for disabling speculative memory access (impl. dep. #240). When
DCUCR.WEAK_SPCA= 1, the branch history table is cleared and no longer
issues aggressive instruction prefetch.
During DCUCR.WEAK_SPCA= 1, aggressive instruction prefetching is
disabled and any load and store instructions are considered presync
instructions that are executed when all previous instructions are committed.
Because all CTI are considered as not taken, instructions residing beyond 1
Kbyte of a CTI may be fetched and executed.
On entering aggressive instruction Prefetch disable mode, supervisor
software should issue membar #Sync, to make sure all in-flight instructions
in the pipeline are discarded.
During DCUCR.WEAK_SPCA= 1, an L2 cache flush by writing 1 to
ASI_L2_CTRL.U2_FLUSHremains pending internally until
DCUCR.WEAK_SPCAis set to 0. To wait for completion of the cache flush, a
member #Syncmust be issued after DCUCR.WEAK_SPCAis set to 0.
Executing a membar #Syncwhile the DCUCR.WEAK_SPCA= 1 after writing 1
to ASI_L2_CTRL.U2_FLUSHdoes not wait for the cache flush to complete.
40:33
32:25
24, 23
22, 21
20:4
3
PM<7:0>
VM<7:0>
PR, PW
VR, VW
—
Defined in SPARC JPS1 Commonality.
Defined in SPARC JPS1 Commonality.
Defined in SPARC JPS1 Commonality.
Defined in SPARC JPS1 Commonality.
Reserved.
DM
Defined in SPARC JPS1 Commonality.
Defined in SPARC JPS1 Commonality.
2
IM
Release 1.0, 1 July 2002
F. Chapter 5
Registers
23
TABLE 5-3
DCUCR Description (Continued)
Bits
Field
Type
Use — Description
1
DC
RW
Not implemented in SPARC64 V (impl. dep. #252). It reads as 0 and writes to
it are ignored.
0
IC
RW
Not implemented in SPARC64 V (impl. dep. #253). It reads as 0 and writes to
it are ignored.
Data Watchpoint Registers
No implementation-dependent feature of SPARC64 V reduces the reliability of data
watchpoints (impl. dep. #244).
SPARC64 V employs conservative check of PA/ VA watchpoint over partial store
instruction. See Section A.42, Partial Store (VIS I), on page 57 for details.
Instruction Trap Register
SPARC64 V implements the Instruction Trap Register (impl. dep. #205).
In SPARC64 V, the least significant 11 bits (bits 10:0) of a CALLor branch (BPcc,
FBPfcc, Bicc, BPr) instruction in an instruction cache are identical to their
architectural encoding (as it appears in main memory) (impl. dep. #245).
5.2.13
5.2.14
Floating-Point Deferred-Trap Queue (FQ)
SPARC64 V does not contain a Floating-Point Deferred-trap Queue (impl. dep. #24).
An attempt to read FQwith an RDPRinstruction generates an illegal_instruction
exception (impl. dep. #25).
IU Deferred-Trap Queue
SPARC64 V neither has nor needs an IU deferred-trap queue (impl. dep. #16)
24
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002
F.CHAPTER
6
This chapter presents SPARC64 V implementation-specific instruction details and the
processor pipeline information in these subsections:
■ Instruction Execution on page 25
■ Instruction Formats and Fields on page 28
■ Instruction Categories on page 29
■ Processor Pipeline on page 31
For additional, general information, please see parallel subsections of Chapter 6 in
Commonality. For easy referencing, we follow the organization of Chapter 6 in
Commonality.
6.1
Instruction Execution
SPARC64 V is an advanced superscalar implementation of SPARC V9. Several
instructions may be issued and executed in parallel. Although SPARC64 V provides
serial program execution semantics, some of the implementation characteristics
described below are part of the architecture visible to software for correctness and
efficiency. The affected software includes optimizing compilers and supervisor code.
6.1.1
Data Prefetch
SPARC64 V employs speculative (out of program order) execution of instructions; in
most cases, the effect of these instructions can be undone if the speculation proves to
be incorrect.1 However, exceptions can occur because of speculative data
prefetching. Formally, SPARC64 V employs the following rules regarding speculative
prefetching:
1. An async_data_error may be signalled during speculative data prefetching.
25
1. If a memory operation y resolves to a volatile memory address (location[y]),
SPARC64 V will not speculatively prefetch location[y] for any reason; location[y]
will be fetched or stored to only when operation y is commitable.
2. If a memory operation y resolves to a nonvolatile memory address (location[y]),
SPARC64 V may speculatively prefetch location[y] subject, adhering to the
following subrules:
a. If an operation y can be speculatively prefetched according to the prior rule,
operations with store semantics are speculatively prefetched for ownership
only if they are prefetched to cacheable locations. Operations without store
semantics are speculatively prefetched even if they are noncacheable as long as
they are not volatile.
prefetched.
SPARC64 V provides two mechanisms to avoid speculative execution of a load:
1. Avoid speculation by disallowing speculative accesses to certain memory pages or
I/ O spaces. This can be done by setting the E(side-effect) bit in the PTEfor all
memory pages that should not allow speculation. All accesses made to memory
pages that have the Ebit set in their PTEwill be delayed until they are no longer
speculative or until they are cancelled. See Appendix F, Memory Management Unit,
for details.
2. Alternate space load instructions that force program order, such as
ASI_PHYS_BYPASS_WITH_EBIT[_L] (AS I = 1516, 1D16), will not be speculatively
executed.
6.1.2
Instruction Prefetch
The processor prefetches instructions to minimize cases where the processor must
wait for instruction fetch. In combination with branch prediction, prefetching may
cause the processor to access instructions that are not subsequently executed. In
some cases, the speculative instruction accesses will reference data pages.
SPARC64 V does not generate a trap for any exception that is caused by an
instruction fetch until all of the instructions before it (in program order) have been
committed.1
1. Hardware errors and other asynchronous errors may generate a trap even if the instruction that caused the
trap is never committed.
26
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002
6.1.3
Syncing Instructions
SPARC64 V has instructions, called syncing instructions, that stop execution for the
number of cycles it takes to clear the pipeline and to synchronize the processor.
There are two types of synchronization, pre and post. A presyncing instruction waits
for all previous instructions to commit, commits by itself, and then issues successive
instructions. A postsyncing instruction issues by itself and prevents the successive
instructions from issuing until it is committed. Some instructions have both pre- and
postsync attributes.
In SPARC64 V almost all instructions commit in order, but store instruction commit
before becoming globally visible. A few syncing instructions cause the processor to
discard prefetched instructions and to refetch the successive instructions. TABLE 6-1
lists all pre-/ postsync instructions and the effects of instruction execution.
TABLE 6-1 SPARC64 V Syncing Instructions
Presyncing
Wait for
Postsyncing
Discard
Opcode
Sync?
store global
visibility?
Sync?
prefetched
instructions?
ALIGNADDRESS{_LITTLE}
Yes
Yes
Yes
BMASK
DONE
Yes
Yes
FCMP(GT,LE,NE,EQ)(16,32)
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
FLUSH
FMOV(s,d)icc
FMOVr
LDD
Yes
Yes
Yes
LDDA
LDDFA
memory access with
ASI=ASI_PHYS_BYPASS_EC{_LITTLE},
ASI_PHYS_BYPASS_EC_WITH_E_BIT{_LITTLE}
LDFSR, LDXFSR
MEMBAR
MOVfcc
MULScc
PDIST
Yes
Yes
1
Yes
Yes
Yes
Yes
Yes
RDASR
Yes
RETRY
Yes
Yes
Yes
SIAM
STBAR
Yes
Yes
STD
Release 1.0, 1 July 2002
F. Chapter 6
Instructions
27
TABLE 6-1 SPARC64 V Syncing Instructions (Continued)
Presyncing
Wait for
Postsyncing
Discard
Opcode
Sync?
store global
visibility?
Sync?
prefetched
instructions?
STDA
Yes
STDFA
Yes
Yes
Yes
Yes
STFSR, STXFSR
Tcc
Yes
Yes
Yes
2
WRASR
1. When #cmask != 0.
2. WRGSRonly.
6.2
Instruction Formats and Fields
Instructions are encoded in five major 32-bit formats and several minor formats.
Please refer to Section 6.2 of Commonality for illustrations of four major formats.
FIGURE 6-1 illustrates Format 5, unique to SPARC64 V.
Format 5 (op = 2, op3 = 3716): FMADD, FMSUB, FNMADD, and FNMSUB(in place of IMPDEP2B)
op
rd
op3
rs1
rs3
var
size
rs2
31 30 29
25 24
19 18 17
14 13 12 11 10
9
8
7
6
5
4
0
FIGURE 6-1 Summary of Instruction Formats: Format 5
Instruction fields are those shown in Section 6.2 of Commonality. Three additional
fields are implemented in SPARC64 V. They are described in TABLE 6-2.
TABLE 6-2
Instruction Fields Specific to SPARC64 V
Bits
Field
Description
13:9
rs3
This 5-bit field is the address of the third fregister source operand for
the floating-point multiply-add and multiply-subtract instruction.
8.7
6.5
var
This 2-bit field specifies which specific operation (variation) to perform
for the floating-point multiply-add and multiply-subtract instructions
size
This 2-bit field specifies the size of the operands for the floating-point
multiply-add and multiply-subtract instructions.
28
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002
Since size= 00 is not IMPDEP2Band since size= 11 assumed quad operations but
is not implemented in SPARC64 V, the instruction with size= 00 or 11 generates an
illegal_instruction exception in SPARC64 V.
6.3
Instruction Categories
SPARC V9 instructions comprise the categories listed below. All categories are
described in Section 6.3 of Commonality. Subsections in bold face are SPARC64 V
implementation dependencies.
■ Memory access
■ Memory synchronization
■ Integer arithmetic
■ Control transfer (CTI)
■ Conditional moves
■ Register window management
■ State register access
■ Privileged register access
■ Floating-point operate (FPop)
■ Implementation-dependent
6.3.3
Control-Transfer Instructions (CTIs)
These are the basic control-transfer instruction types:
■ Conditional branch (Bicc, BPcc, BPr, FBfcc, FBPfcc)
■ Unconditional branch
■ Call and link (CALL)
■ Jump and link (JMPL, RETURN)
■ Return from trap (DONE, RETRY)
■ Trap (Tcc)
Instructions other than CALLand JMPLare described in their entirety in Section 6.3.2
of Commonality. SPARC64 V implements CALLand JMPLas described below.
CALL and JMPL Instructions
SPARC64 V writes all 64 bits of the PCinto the destination register when
PSTATE.AM= 0. The upper 32 bits of r[15](CALL) or of r[rd](JMPL) are written
as zeroes when PSTATE.AM= 1 (impl. dep. #125).
Release 1.0, 1 July 2002
F. Chapter 6
Instructions
29
SPARC64 V implements JMPLand CALLreturn prediction hardware in a form of
special stack, called the Return Address Stack (RAS). Whenever a CALLor JMPLthat
writes to %o7(r[15]) occurs, SPARC64 V “pushes” the return address (PC+8) onto
the RAS. When either of the synthetic instructions retl (JMPL[%o7+8]) and ret (JMPL
[%i7+8]) are subsequently executed, the return address is predicted to be the
address stored on the top of the RAS and the RAS is “popped.” If the prediction in
the RAS is incorrect, SPARC64 V backs up and starts issuing instructions from the
correct target address. This backup takes a few extra cycles.
Programming Note – For maximum performance, software and compilers must
take into account how the RAS works. For example, tricks that do nonstandard
returns in hopes of boosting performance may require more cycles if they cause the
wrong RAS value to be used for predicting the address of the return. Heavily nested
calls can also cause earlier entries in the RAS to be overwritten by newer entries,
addresses will be mispredicted because of the overflow of the RAS.
6.3.7
Floating-Point Operate (FPop) Instructions
The complete conditions of generating an fp_exception_other exception with
FSR.ftt= unfinished_FPop are described in Section B.6, Floating-Point Nonstandard
Mode on page 61.
The SPARC64 V-specific FMADDand FMSUBinstructions (described below) are also
floating-point operations. They require the floating-point unit to be enabled;
otherwise, an fp_disabled trap is generated. They also affect the FSR, like FPop
instructions. However, these instructions are not included in the FPop category and,
defined in Section 6.3.9 of Commonality.
6.3.8
Implementation-Dependent Instructions
SPARC64 V uses the IMPDEP2instruction to implement the Floating-Point Multiply-
Add/ Subtract and Negative Multiply-Add/ Subtract instructions; these have an op3
field = 37 (IMPDEP2). See Floating-Point Multiply-Add/Subtract on page 50 for fuller
16
definitions of these instructions. Opcode space is reserved in IMPDEP2for the quad-
precision forms of these instructions. However, SPARC64 V does not currently
implement the quad-precision forms, and the processor generates an illegal_instruction
exception if a quad-precision form is specified. Since these instructions are not part
of the required SPARC V9 architecture, the operating system does not supply
software emulation routines for the quad versions of these instructions.
SPARC64 V uses the IMPDEP1instruction to implement the graphics acceleration
instructions.
30
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002
6.4
Processor Pipeline
The pipeline of SPARC64 V consists of fifteen stages, shown in FIGURE 6-2. Each
stage is referenced by one or two letters as follows:
IA
IT
IM
IB
IR
E
D
P
B
X
U
W
Ps
Ts
Ms
Bs
Rs
6.4.1
Instruction Fetch Stages
■ IA (Instruction Address generation) — Calculate fetch target address.
■ IT (Instruction TLB Tag access) — Instruction TLB tag search. Search of BRHIS
and RAS is also started.
■ IM (Instruction TLB tag Match) — Check TLB tag is matched.
The result of BRHIS and RAS search is also available at this stage and is
forwarded to IA stage for subsequent fetch.
■ IB (Instruction cache Buffer read) — Read L1 cache data if TLB is hit.
■ IR (Instruction read Result) — Write to I-Buffer.
IA through IR stages are dedicated to instruction fetch. These stages work in concert
with the cache access unit to supply instructions to subsequent stages. The
instructions fetched from memory or cache are stored in the Instruction Buffer (I-
buffer). The I-buffer has six entries, each of which can hold 32-byte-aligned 32-byte
data (eight instructions).
SPARC64 V has a branch prediction mechanism and resources named BRHIS
(BRanch HIStory) and RAS (Return Address Stack). Instruction fetch stages use these
resources to determine fetch addresses.
Instruction fetch stages are designed so that they work independently of subsequent
stages as much as possible. And they can fetch instructions even when execution
stages stall. These stages fetch until the I-Buffer is full; further fetches are possible by
requesting prefetches to the L1 cache.
Release 1.0, 1 July 2002
F. Chapter 6
Instructions
31
IF EAG
IA
IT
iTLB
L1I
BRHIS
IM
IB
IR
Instruction Buffer
E
IWR
D
P
RSFA
RSA
RSFB
RSEA
RSEB
RSBR
CSE
B
X
FXB
RR
FXA
RR
EXB
RR
EXA
RR
EAGA
EAGB
Ps
Ts
dTLB
FUB
GUB
Ms
L1D
Bs
Rs
LB
LR
U
W
FPR
GPR
ccr fsr PC nPC
FIGURE 6-2 SPARC64 V Pipeline
32
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002
6.4.2
Issue Stages
■ E (Entry) — Instructions are passed from fetch stages.
■ D (Decode) — Assign resources and dispatch to reservation station (RS.)
SPARC64 V is an out-of-order execution CPU. It has six execution units (two of
arithmetic and logic unit, two of floating-point unit, two of load/ store unit). Each
unit except the load/ store unit has its own reservation station. E and D stages are
issue stages that decode instructions and dispatch them to the target RS. SPARC64 V
can issue up to four instructions per cycle.
The resources needed to execute an instruction are assigned in the issue stages. The
resources to be allocated include the following:
■
■
■
■
Commit stack entry (CSE)
Renaming registers of integer (GUB) and floating-point (FUB)
Entries of reservations stations
Memory access ports
Resources needed for an instruction are specific to the instruction, but all resources
must be assigned at these stages. In normal execution, assigned resources are
released at the very last stage of the pipeline, W-stage.1 Instructions between the E-
stage and W-stage are considered to be in-flight. When an exception is signalled, all
in-flight instructions and the resources used by them are released immediately. This
behavior enables the decoder to restart issuing instructions as quickly as possible.
The number of in-flight instructions depends on how many resources are needed by
them. The maximum number is 64.
6.4.3
Execution Stages
■ P (priority) — Select an instruction from those that have met the conditions for
execution.
■ B (buffer read) — Read register file, or receive forwarded data from another
pipelines.
■ X (execute) — Execution.
Instructions in reservation stations will be executed when certain conditions are met,
for example, the values of source registers are known, the execution unit is available.
Execution latency varies from one to many, depending on the instruction.
1. An entry in a reservation station is released at the X-stage.
Release 1.0, 1 July 2002
F. Chapter 6
Instructions
33
Execution Stages for Cache Access
Memory access requests are passed to the cache access pipeline after the target
address is calculated. Cache access stages work the same way as instruction fetch
stages, except for the handling of branch prediction. See Section 6.4.1, Instruction
Fetch Stages, for details. Stages in instruction fetch and cache access correspond as
follows:
Instruction Fetch Stages
Cache Access
IA
IT
Ps
Ts
IM
IB
IR
Ms
Bs
Rs
When an exception is signalled, fetch ports and store ports used by memory access
instructions are released. The cache access pipeline itself remains working in order to
complete outgoing memory accesses. When data is returned, it is then stored to the
cache.
6.4.4
Completion Stages
■ U (Update) — Update of physical (renamed) register.
■ W (Write) — Update of architectural registers and retire; exception handling.
■ After an out-of-order execution, execution reverts to program order to complete.
Exception handling is done in the completion stages. Exceptions occurring in
execution stages are not handled immediately but are signalled when the
instruction is completed.1
1. RAS-related exception may be signalled before completion.
34
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002
F.CHAPTER
7
■
■
■
■
■
Reset Traps on page 37
Uses of the Trap Categories on page 37
■ Trap Control on page 38
PIL Control on page 38
■ Trap-Table Entry Addresses on page 38
■
■
Trap Type (TT) on page 38
■
Details of Supported Traps on page 39
■ Exception and Interrupt Descriptions on page 39
7.1
Processor States, Normal and Special
Traps
Please refer to Section 7.1 of Commonality.
35
7.1.1
RED_state
RED_state Trap Table
The RED_statetrap vector is located at an implementation-dependent address
referred to as RSTVaddr. The value of RSTVaddris a constant within each
implementation; in SPARC64 V this virtual address is FFFFFFFFF000000016,
which translates to physical address 000007FFF000000016 in RED_state(impl.
dep. #114).
RED_state Execution Environment
In RED_state, the processor is forced to execute in a restricted environment by
overriding the values of some processor controls and state registers.
Note – The values are overridden, not set, allowing them to be switched atomically.
SPARC64 V has the following implementation-dependent behavior in RED_state
(impl. dep. #115):
■ While in RED_state, all internal ITLB-based translation functions are disabled.
DTLB-based translations are disabled upon entry but may be reenabled by
software while in RED_state. However, ASI-based access functions to the TLBs
are still available.
■ While mTLBs and uTLBs are disabled, all accesses are assumed to be
noncacheable and strongly ordered for data access.
■ XIRerrors are not masked and can cause a trap.
Note – When RED_stateis entered because of component failures, the handler
should attempt to recover from potentially catastrophic error conditions or to disable
the failing components. When RED_stateis entered after a reset, the software
should create the environment necessary to restore the system to a running state.
7.1.2
error_state
The processor enters error_statewhen a trap occurs while the processor is
already at its maximum supported trap level (that is, when TL= MAXTL) (impl. dep.
#39).
36
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002
Although the standard behavior of the CPU upon an entry into error_stateis to
internally generate a watchdog_reset (WDR), the CPU optionally stays halted upon an
entry to error_statedepending on a setting in the OPSR register (impl. dep #40,
#254).
7.2
Trap Categories
Please refer to Section 7.2 of Commonality.
An exception or interrupt request can cause any of the following trap types:
■ Precise trap
■ Deferred trap
■ Disrupting trap
■ Reset trap
7.2.2
Deferred Traps
Please refer to Section 7.2.2 of Commonality.
SPARC64 V implements a deferred trap to signal certain error conditions (impl. dep.
#32). Please refer to the description of I_UGE error on “Relation between %tpcand
the instruction that caused the error” row in TABLE P-2 (page 156) for details. See also
Instruction End-Method at ADE Trap on page 170.
7.2.4
7.2.5
Reset Traps
Please refer to Section 7.2.4 of Commonality.
In SPARC64 V, a watchdog reset (WDR) occurs when the processor has not
committed an instruction for 233 processor clocks.
Uses of the Trap Categories
Please refer to Section 7.2.5 of Commonality.
All exceptions that occur as the result of program execution are precise in
SPARC64 V (impl. dep. #33).
An exception caused after the initial access of a multiple-access load or store
instruction (LDD(A), STD(A), LDSTUB, CASA, CASXA, or SWAP) that causes a
catastrophic exception is precise in SPARC64 V.
Release 1.0, 1 July 2002
F. Chapter 7
Traps
37
7.3
Trap Control
Please refer to Section 7.3 of Commonality.
7.3.1
PIL Control
SPARC64 V receives external interrupts from the UPA interconnect. They cause an
interrupt_vector_trap (TT = 6016). The interrupt vector trap handler reads the interrupt
information and then schedules SPARC V9-compatible interrupts by writing bits in
the SOFTINTregister. Please refer to Section 5.2.11 of Commonality for details.
During handling of SPARC V9-compatible interrupts by SPARC64 V, the PIL
register is checked. If an interrupt has sufficient priority, SPARC64 V will stop
issuing new instructions, will flush all uncommitted instructions, and then will
vector to the trap handler. The only exception to this process occurs when
SPARC64 V is processing a higher-priority trap.
SPARC64 V takes a normal disrupting trap upon receipt of an interrupt request.
7.4
Trap-Table Entry Addresses
Please refer to Section 7.4 of Commonality.
7.4.2
Trap Type (TT)
Please refer to Section 7.4.2 of Commonality.
SPARC64 V implements all mandatory SPARC V9 and SPARC JPS1 exceptions, as
described in Chapter 7 of Commonality, plus the exception listed in TABLE 7-1, which
is specific to SPARC64 V (impl. dep. #35; impl. dep. #36).
TABLE 7-1
Exceptions Specific to SPARC64 V
Exception or Interrupt Request
TT
Priority
async_data_error
040
2
16
38
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002
7.4.4
Details of Supported Traps
Please refer to Section 7.4.4 in Commonality.
SPARC64 V Implementation-Specific Traps
SPARC64 V supports the following implementation-specific trap type:
■
async_data_error
7.5
Trap Processing
Please refer to Section 7.5 of Commonality.
7.6
Exception and Interrupt Descriptions
Please refer to Section 7.6 of Commonality.
7.6.4
SPARC V9 Implementation-Dependent, Optional
Traps That Are Mandatory in SPARC JPS1
Please refer to Section 7.6.4 of Commonality.
SPARC64 V implements all six traps that are implementation dependent in SPARC
V9 but mandatory in JPSI (impl. dep. #35). Se Section 7.6.4 of Commonality for
details.
7.6.5
SPARC JPS1 Implementation-Dependent Traps
Please refer to Section 7.6.5 of Commonality.
SPARC64 V implements the following traps that are implementation dependent
(impl. dep. #35).
■
async_data_error [tt= 040 ] (Preemptive or disrupting) (impl. dep. #218) —
16
SPARC64 V implements the async_data_error exception to signal the following
errors.
Release 1.0, 1 July 2002
F. Chapter 7
Traps
39
■
Uncorrectable errors in the internal architecture registers (general registers–gr,
floating-point registers–fr, ASR, ASI registers)
Uncorrectable errors in the core pipeline
System data corruption
Watch dog timeout first time
■
■
■
■
TLB access error upon access by an ldxaor stxainstruction
Multiple errors may be reported in a single generation of the async_data_error
exception. Depending on the situation, the async_data_error trap becomes a precise
trap, a disrupting trap, or a preemptive trap upon error detection. The TPCand
TNPCstacked by the exception may indicate the exact instruction, the preceding
instruction, or the subsequent instruction inducing the error. See Appendix P for
details of the async_data_error exception in SPARC64 V.
40
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002
F.CHAPTER
8
Memory Models
The SPARC V9 architecture is a model that specifies the behavior observable by
software on SPARC V9 systems. Therefore, access to memory can be implemented in
any manner, as long as the behavior observed by software conforms to that of the
models described in Chapter 8 of Commonality and defined in Appendix D, Formal
Specification of the Memory Models, also in Commonality.
The SPARC V9 architecture defines three different memory models: Total St ore Order
(TSO), Partial Store Order (PSO), and Relaxed Memory Order (RMO). All SPARC V9
processors must provide Total Store Order (or a more strongly ordered model, for
example, Sequential Consistency) to ensure SPARC V8 compatibility.
Whether the PSO or RMO models are supported by SPARC V9 systems is
implementation dependent; SPARC64 V behaves in a manner that guarantees
adherence to whichever memory model is currently in effect.
This chapter describes the following major SPARC64 V-specific details of memory
models.
■ SPARC V9 Memory Model on page 42
For general information, please see parallel subsections of Chapter 8 in
Commonality. For easier referencing, this chapter follows the organization of
Chapter 8 in Commonality, listing subsections whether or not there are
implementation-specific details.
41
8.1
Overview
Note – The words “hardware memory model” denote the underlying hardware
memory models as differentiated from the “SPARC V9 memory model,” which is the
memory model the programmer selects in PSTATE.MM.
SPARC64 V supports only one mode of memory handling to guarantee correct
operation under any of the three SPARC V9 memory ordering models (impl. dep.
#113):
■ Total Store Order — All loads are ordered with respect to loads, and all stores are
ordered with respect to loads and stores. This behavior is a superset of the
requirements for the SPARC V9 memory models TSO, PSO, and RMO. When
PSTATE.MMselects TSO or PSO, SPARC64 V operates in this mode. Since
programs written for PSO (or RMO) will always work if run under Total Store
Order, this behavior is safe but does not take advantage of the reduced restrictions
of PSO.
8.4
SPARC V9 Memory Model
Please refer to Section 8.4 of Commonality.
In addition, this section describes SPARC64 V-specific details about the processor/
memory interface model.
8.4.5
Mode Control
SPARC64 V implements Total Store Ordering for all PSTATE.MM. Writing 112 into
PSTATE.MMalso causes the machine to use TSO (impl. dep. #119). However, the
encoding 112 should not be used, since future version of SPARC64 V may use this
encoding for a new memory model.
8.4.6
Synchronizing Instruction and Data Memory
All caches in a SPARC64 V-based system (uniprocessor or multiprocessor) have a
unified cache consistency protocol and implement strong coherence between
instruction and data caches. Writes to any data cache cause invalidations to the
42
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002
corresponding locations in all instruction caches; references to any instruction cache
cause corresponding modified data to be flushed and corresponding unmodified
data to be invalidated from all data caches. The flush operation is still operative in
SPARC64 V, however.
Since the FLUSHinstruction synchronizes the processor, the total latency varies
depending on the situation in SPARC64 V. Assuming all prior instructions are
completed, the latency of FLUSHis 18 CPU cycles.
Release 1.0, 1 July 2002
F. Chapter 8
Memory Models
43
44
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002
F.APPENDIX
A
Instruction Definitions:
SPARC64 V Extensions
in Appendix A of Commonality. If an instruction is not described in this appendix,
then no SPARC64 V implementation-dependency applies.
■ See TABLE A-1 of Commonality for the location at which general information about
the instruction can be found.
Commonality.
TABLE A-1 lists four instructions that are unique to SPARC64 V.
TABLE A-1 Implementation-Specific Instructions
Operation
Name
Page
V9 Ext?
■
FMADD(s,d)
FMSUB(s,d)
FNMADD(s,d)
FNMSUB(s,d)
Floating-point multiply add
Floating-point multiply subtract
Floating-point multiply negate add
Floating-point multiply negate subtract
page 50
page 50
page 50
page 50
■
■
■
Each instruction definition consists of these parts:
1. A table of the opcodes defined in the subsection with the values of the field(s)
that uniquely identify the instruction(s).
2. An illustration of the applicable instruction format(s). In these illustrations a dash
(—) indicates that the field is reserved for future versions of the architecture and
shall be 0 in any instance of the instruction. If a conforming SPARC V9
implementation encounters nonzero values in these fields, its behavior is
undefined.
3. A list of the suggested assembly language syntax, as described in Appendix G,
Assembly Language Syntax.
45
4. A description of the features, restrictions, and exception-causing conditions.
5. A list of exceptions that can occur as a consequence of attempting to execute the
instruction(s). Exceptions due to an instruction_access_error,
instruction_access_exception, fast_instruction_access_MMU_miss, async_data_error,
ECC_error, and interrupts are not listed because they can occur on any instruction.
Also, any instruction that is not implemented in hardware shall generate an
illegal_instruction exception (or fp_exception_other exception with
ftt= unimplemented_FPop for floating-point instructions) when it is executed.
The illegal_instruction trap can occur during chip debug on any instruction that has
been programmed into the processor’s IIU_INST_TRAP(ASI = 6016, VA = 0).
These traps are also not listed under each instruction.
The following traps never occur in SPARC64 V:
■
■
■
■
■
■
■
■
■
instruction_access_MMU_miss
data_access_MMU_miss
data_access_protection
unimplemented_LDD
unimplemented_STD
internal_processor_error
time).
■ Load Quadword, Atomic [Physical] on page 54
■ Memory Barrier on page 55
■ Partial Store (VIS I) on page 57
■ Prefetch Data on page 57
■ Read State Register on page 58
■ SHUTDOWN (VIS I) on page 58
■ Write State Register on page 59
■ Deprecated Instructions on page 59
46
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002
A.4
Block Load and Store Instructions (VIS I)
The following notes summarize behavior of block load/ store instructions in
SPARC64 V.
1. Block load and store operations are not atomic, in that they are internally
decomposed into eight independent, 8-byte load/ store operations in SPARC64 V.
Each load/ store is always issued and performed in the RMO memory model and
obeys all prior MEMBARand atomic instruction-imposed ordering constraints.
2. Block load/ store instructions are out of the scope of V9 memory models, meaning
that self-consistency of memory reference instruction is not always maintained if
block load/ store instructions are involved in the execution flow. The following
table describes the implemented ordering constraints for block load/ store
instructions with respect to the other memory reference instructions with an
operand address conflict in SPARC64 V:
Program Order for conflicting bld/bst/ld/st
Ordered/
first
next
Out-of-Order
store
blockstore
blockload
blockstore
blockload
store
Ordered
store
Ordered
load
Ordered
load
Ordered
blockstore
blockstore
blockstore
blockstore
blockload
blockload
blockload
blockload
Out-of-Order
Out-of-Order
Out-of-Order
Out-of-Order
Ordered
load
blockstore
blockload
store
load
Ordered
blockstore
blockload
Ordered
Ordered
To maintain the memory ordering even for the memory address conflicts, MEMBAR
instructions shall be inserted into appropriate location in the program.
Although self-consistency with respect to the block load/ store and the other
memory reference instructions is not maintained in some cases, register conflicts
between the other instructions and block load/ store instructions are maintained
in SPARC64 V. The read-after-write, write-after-read, and write-after-write
obstructions between a block load/ store instruction and the other arithmetic
instructions are detected and handled appropriately.
3. Block load instruction operate on the cache if the operand is present.
Release 1.0, 1 July 2002
F. Chapter A
Instruction Definitions: SPARC64 V Extensions
47
4. The block store with commit instruction always stores the operand in main
storage and invalidates the line in the L1D cache if it is present. The invalidation
is performed through an S_INV_REQtransaction through UPA by the system
controller.
5. The block store instruction stores the operand into main storage if it is not present
in the operand cache and the status of the line is invalid, shared, or owned. In
case the line is not present in the L1D cache and is exclusive or modified on the
L2 cache, the block store instruction modifies only the line in L2 cache. If the line
is present in the operand cache and the status is either clean/ shared or clean/
owned, the line is stored in main storage. If the line is present in the operand
cache and the status is clean/ exclusive, the line in the operand cache is
invalidated and the operand is stored in the L2 cache. If the line is in the operand
cache and the status is modified/ modified, the operand is stored in the operand
cache. The following table summarizes each cache status before block store and
the results of the block store. Blank cells mean that no action occurred in the
corresponding cache or memory, and the data, if it exists, is unchanged.
Storage
Status
L1
Invalid
I, S, O
Valid
Cache status
before bst
L2
E, M
—
E
M
S, O
—
L1
—
invalidate
update
—
Action
L2
update
—
—
update
S
Memory
update
Exceptions
fp_disabled
PA_watchpoint
VA_watchpoint
illegal_instruction (misaligned rd)
mem_address_not_aligned (see Block Load and Store ASIs on page 120)
data_access_exception (see Block Load and Store ASIs on page 120)
LDDF_mem_address_not_aligned (see Block Load and Store ASIs on page 120)
data_access_error
fast_data_access_MMU_miss
fast_data_access_protection
48
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002
A.12 Call and Link
SPARC64 V clears the upper 32 bits of the PCvalue in r[15]when PSTATE.AMis
set (impl. dep. #125). The value written into r[15]is visible to the instruction in the
delay slot.
SPARC64 V has a special hardware table, called the return address stack, to predict
the return address from a subroutine. Though the return prediction stack achieves
better performance in normal cases, there is a special use of the CALLinstruction
(call.+8) that may have an undesirable effect on the return address stack. In this
case, the CALLinstruction is used to read the PCcontents, not to call a subroutine. In
SPARC64 V, the return address of the CALL(PC+8) is not stored in its return
address stack, to avoid a detrimental performance effect. When a retor retlis
executed, the value in the return address stack is used to predict the return address.
A.24 Implementation-Dependent Instructions
Opcode
op3
Operation
IMPDEP1
IMPDEP2
11 0110
11 0111
Implementation-Dependent Instruction 1
The IMPDEP1and IMPDEP2instructions are completely implementation dependent.
Implementation-dependent aspects include their operation, the interpretation of bits
29–25 and 18–0 in their encodings, and which (if any) exceptions they may cause.
SPARC64 V uses IMPDEP1to encode VIS instructions (impl. dep. #106).
SPARC64 V uses IMPDEP2Bto encode the Floating-Point Multiply Add/ Subtract
instructions (impl. dep. #106). See Section A.24.1, Floating-Point Multiply-Add/
Subtract, on page 50 for details.
See I.1.2, Implementation-Dependent and Reserved Opcodes, in Commonality for
information about extending the SPARC V9 instruction set by means of the
implementation-dependent instructions.
Compatibility Note – These instructions replace the CPopn instructions in
SPARC V8.
Exceptions
implementation-dependent (IMPDEP2)
Release 1.0, 1 July 2002
F. Chapter A
Instruction Definitions: SPARC64 V Extensions
49
A.24.1
Floating-Point Multiply-Add/ Subtract
SPARC64 V uses IMPDEP2Bopcode space to encode the Floating-Point Multiply
Add/ Subtract instructions.
Opcode
Variation
00
Size†
01
Operation
FMADDs
FMADDd
FMSUBs
FMSUBd
FNMADDs
FNMADDd
FNMSUBs
FNMSUBd
Multiply-Add Single
00
10
Multiply-Add Double
01
01
Multiply-Subtract Single
Multiply-Subtract Double
Negative Multiply-Add Single
Negative Multiply-Add Double
Negative Multiply-Subtract Single
Negative Multiply-Subtract Double
01
10
11
01
11
10
10
01
10
10
† 11 is reserved for quad.
Format (5)
10
rd
110111
25 24
rs1
rs3
var size
7 6
rs2
31 30 29
19 18
14 13
9 8
5 4
0
Operation
Implementation
Multiply-Add
Multiply-Subtract
rd ← rs1 × rs2 + rs3
rd ← rs1 × rs2 − rs3
Negative Multiply-Subtract
Negative Multiple-Add
rd ← − (rs1 × rs2 − rs3)
rd ← − (rs1 × rs2 + rs3)
Assembly Language Syntax
fmadds
fmaddd
fmsubs
fmsubd
fnmadds
fnmaddd
fnmsubs
fnmsubd
freg , freg , freg , freg
rs1 rs2 rs3
rd
rd
rd
rd
rd
rd
rd
rd
freg , freg , freg , freg
rs1
rs2
rs3
freg , freg , freg , freg
rs1
rs2
rs3
freg , freg , freg , freg
rs1
rs2
rs3
freg , freg , freg , freg
rs1
rs2
rs3
freg , freg , freg , freg
rs1
rs2
rs3
freg , freg , freg , freg
rs1
rs2
rs3
freg , freg , freg , freg
rs1
rs2
rs3
50
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002
Description
The Floating-point Multiply-Add instructions multiply the registers specified by the
rs1field times the registers specified by the rs2field, add that product to the
registers specified by the rs3field, then write the result into the registers specified
by the rdfield.
The Floating-point Multiply-Subtract instructions multiply the registers specified by
the rs1field times the registers specified by the rs2field, subtract from that
product the registers specified by the rs3field, and then write the result into the
registers specified by the rdfield.
The Floating-point Negative Multiply-Add instructions multiply the registers
specified by the rs1field times the registers specified by the rs2field, negate the
product, subtract from that negated value the registers specified by the rs3 field, and
then write the result into the registers specified by the rdfield.
The Floating-point Negative Multiply-Subtract instructions multiply the registers
specified by the rs1field times the registers specified by the rs2field, negate the
product, add that negated product to the registers specified by the rs3field, and
then write the result into the registers specified by the rdfield.
All of the operations above are treated as separate multiply and add/ subtract
operations in SPARC64 V. That is, a multiply operation is first performed with a
complete rounding step (as if it were a single multiply operation), and then an add/
subtract operation is performed with a complete rounding step (as if it were a single
add/ subtract operation). Consequently, at most two rounding errors can be
incurred.1
Subtract instruction in SPARC64 V because of its implementation characteristics. If
any trapping exception is detected in the multiply part in the process of a Floating-
point Multiply-Add/ Subtract instruction, the execution of the instruction is aborted,
the exception condition is recorded in FSR.cexcand FSR.aexc, and the CPU traps
with the exception condition. The add/ subtract part of the instruction is only
performed when the multiply-part of the instruction does not have any trapping
exceptions.
As described in the TABLE A-2, if there are trapping IEEE754 exception conditions in
either of the operations FMULor FADD/SUB, only the trapping exception condition is
recorded in the cexc, and the aexcis not modified. If there are no trapping IEEE754
exception conditions, every nontrapping exception condition is ORed into the cexc
and the cexcis accumulated into the aexc. The boundary conditions of an
unfinished_FPop trap for Floating-point Multiply-Add/ Subtract instructions are
exactly same as for FMULand FADD/SUBinstructions; if either of the operations
1. Note that this implementation differs from previous SPARC64 implementations, which incurred at most one
rounding error.
Release 1.0, 1 July 2002
F. Chapter A
Instruction Definitions: SPARC64 V Extensions
51
detects any conditions for an unfinished_FPop trap, the Floating-point Multiply-Add/
Subtract instruction generates the unfinished_FPop exception. In this case, none of rd,
cexc, or aexcare modified.
TABLE A-2
Exceptions in Floating-Point Multiply-Add/ Subtract Instructions
FMUL
IEEE754 trap
No trap
No trap
FADD/SUB
cexc
—
IEEE754 trap
Exception condition of FMUL Exception condition of FADD Logical or of the nontrapping exception
conditions of FMULand FADD/SUB
aexc
No change
No change
Logical OR of the cexc(above) and the
aexc
Detailed contents of cexcand aexcdepending on the various conditions are
described in TABLE A-3 and TABLE A-4. The following terminology is used: uf, of, inv,
and nx are nontrapping IEEE exception conditions—underflow, overflow, invalid
operation, and inexact, respectively.
TABLE A-3 Non-Trapping cexcWhen FSR.NS= 0
FADD
none
none
nx
nx
of nx
of nx
of nx
of nx
uf of nx
—
inv
none
nx
nx
inv
nx
inv nx
inv of nx
uf inv nx
inv
FMUL
of nx
uf nx
inv
of nx
uf nx
inv
of nx
uf nx
—
TABLE A-4 Non-Trapping aexcWhen FSR.NS= 1
FADD
none
none
nx
nx
of nx
of nx
of nx
of nx
—
uf nx
uf nx
uf nx
—
inv
none
nx
nx
inv
nx
inv nx
inv of nx
uf inv nx
inv
FMUL
of nx
uf nx
inv
of nx
uf nx
inv
of nx
—
—
—
—
—
In the tables, the conditions in the shaded columns are all reported as an
unfinished_FPop trap by SPARC64 V. In addition, the conditions with “—” do not
exist.
52
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002
Programming Note – The Multiply Add/ Subtract instructions are encoded in the
SPARC V9 IMPDEP2opcode space, and they are specific to the SPARC64 V
implementation. They cannot be used in any programs that will be executed on any
other SPARC V9 processor, unless that implementation exactly matches the
SPARC64 V use for the IMPDEP2opcode.
Exceptions
fp_disabled
fp_exception_ieee_754 (NV, NX, OF, UF)
illegal_instruction (size = 00 or 11 ) (fp_disabled is not checked for these encodings)
2
2
fp_exception_other (unfinished_FPop)
A.29 Jump and Link
SPARC64 V clears the upper 32 bits of the PCvalue in r[rd] when PSTATE.AMis set
(impl. dep. #125). The value written into r[rd]is visible to the instruction in the
delay slot.
If either of the low-order two bits of the jump address is nonzero, a
mem_address_not_aligned exception occurs. However, when the JMPLinstruction
causes a mem_address_not_aligned trap, DSFSRand DSFARare not updated.
If the JMPLinstruction has r[rd]= 15, SPARC64 V stores PC + 8 in a hardware table
called return address stack (RAS). When a ret (jmpl %i7+8, %g0) or retl (jmpl
%o7+8, %g0) is executed, the value in the RAS is used to predict the return address.
JMPLwith rd= 0can be used to return from a subroutine. The typical return
address is “r[31] + 8” if a nonleaf routine (one that uses the SAVEinstruction) is
entered by a CALLinstruction, or “r[15] + 8” if a leaf routine (one that does not
use the SAVEinstruction) is entered by a CALLinstruction or by a JMPLinstruction
with rd= 15.
Release 1.0, 1 July 2002
F. Chapter A
Instruction Definitions: SPARC64 V Extensions
53
A.30 Load Quadword, Atomic [Physical]
The Load Quadword ASIs in this section are specific to SPARC64 V, as an extension
to SPARC JPS1.
Format (3) LDDA
Description
ASIs 3416 and 3C16 are used with the LDDAinstruction to atomically read a 128-bit
54
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002
■ TTE.NFO= 0
■ TTE.CP = 1
■ TTE.CV = 0
■ TTE.E = 0
■ TTE.P = 1
■ TTE.W = 0
Note – TTE.IEdepends on the endianness of the ASI. When the ASI is 03416,
TTE.IE = 0; TTE.IE = 1 when the ASI is 03C16
.
Therefore, the atomic quad load physical instruction can only be applied to a
cacheable memory area. Semantically, ASI_QUAD_LDD_PHYS{_L} (03416 and
03C16) is a combination of ASI_NUCLEUS_QUAD_LDDand ASI_PHYS_USE_EC.
With respect to little endian memory, a Load Quadword Atomic instruction behaves
as if it comprises two 64-bit loads, each of which is byte-swapped independently
before being written into its respective destination register.
Exceptions:
privileged_action
PA_watchpoint (recognized on only the first 8 bytes of a transfer)
illegal_instruction (misaligned rd)
mem_address_not_aligned
data_access_exception
data_access_error
fast_data_access_MMU_miss
fast_data_access_protection
A.35 Memory Barrier
Format (3)
i=1
cmask
mmask
10
0
op3
0 1111
—
31 30 29
25 24
19 18
14 13 12
4 3
0
6
7
Assembly Language Syntax
membar
membar_mask
Release 1.0, 1 July 2002
F. Chapter A
Instruction Definitions: SPARC64 V Extensions
55
Description
The memory barrier instruction, MEMBAR, has two complementary functions: to
express order constraints between memory references and to provide explicit control
of memory-reference completion. The membar_maskfield in the suggested assembly
language is the concatenation of the cmaskand mmaskinstruction fields.
The mmaskfield is encoded in bits 3 through 0 of the instruction. TABLE A-5 specifies
the order constraint that each bit of mmask(selected when set to 1) imposes on
memory references appearing before and after the MEMBAR. From zero to four mask
bits can be selected in the mmaskfield.
TABLE A-5
Mask Bit
Order Constraints Imposed by mmaskBits
Name
Description
mmask<3>
#StoreStore
The effects of all stores appearing before the MEMBARinstruction must be
visible to all processors before the effect of any stores following the MEMBAR.
Equivalent to the deprecated STBARinstruction. Has no effect on SPARC64 V
since all stores are performed in program order.
mmask<2>
#LoadStore
All loads appearing before the MEMBARinstruction must have been performed
before the effects of any stores following the MEMBARare visible to any other
processor. Has no effect on SPARC64 V since all stores are performed in
program order and must occur after performance of any load.
mmask<1>
#StoreLoad
#LoadLoad
The effects of all stores appearing before the MEMBARinstruction must be
visible to all processors before loads following the MEMBARmay be performed.
mmask<0>
All loads appearing before the MEMBARinstruction must have been performed
before any loads following the MEMBARmay be performed. Has no effect on
SPARC64 V since all loads are performed after any prior loads.
The cmaskfield is encoded in bits 6 through 4 of the instruction. Bits in the cmask
field, described in TABLE A-6, specify additional constraints on the order of memory
references and the processing of instructions. If cmask is zero, then MEMBARenforces
the partial ordering specified by the mmaskfield; if cmaskis nonzero, then
completion and partial order constraints are applied.
TABLE A-6
Mask Bit
Bits in the cmaskField
Function
Name
Description
cmask<2>
Synchronization #Sync
barrier
All operations (including nonmemory reference operations)
appearing before the MEMBARmust have been performed, and
the effects of any exceptions become visible before any
instruction after the MEMBARmay be initiated.
cmask<1>
Memory issue
barrier
#MemIssue
All memory reference operations appearing before the MEMBAR
must have been performed before any memory operation after
the MEMBARmay be initiated. Equivalent to #Syncin
SPARC64 V.
cmask<0>
Lookaside
barrier
#Lookaside
A store appearing before the MEMBARmust complete before
any load following the MEMBARreferencing the same address
can be initiated. Equivalent to #Syncin SPARC64 V.
56
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002
A.42 Partial Store (VIS I)
Please refer A.42 in Commonality for general details.
Watchpoint exceptions on partial store instructions occur conservatively on
SPARC64 V. The DCUCRData Watchpoint masks are only checked for nonzero value
(watchpoint enabled). The byte store mask (r[rs2]) in the partial store instruction
is ignored, and a watchpoint exception can occur even if the mask is zero (that is, no
store will take place) (impl. dep. #249).
transaction with zero-byte mask.
Exceptions:
fp_disabled
PA_watchpoint
VA_watchpoint
illegal_instruction (misaligned rd)
mem_address_not_aligned (see Partial Store ASIs on page 120)
data_access_exception (see Partial Store ASIs on page 120)
LDDF_mem_address_not_aligned (see Partial Store ASIs on page 120)
data_access_error
fast_data_access_MMU_miss
fast_data_access_protection
A.49 Prefetch Data
Please refer to Section A.49, Prefetch Data, of Commonality for principal information.
The prefetchainstruction of SPARC64 V works for the following ASIs.
■ ASI_PRIMARY(08016), ASI_PRIMARY_LITTLE(08816
■ ASI_SECONDARY(08116), ASI_SECONDARY_LITTLE(08916)
■ ASI_NUCLEUS(0416), ASI_NUCLEUS_LITTLE(0C16
■ ASI_PRIMARY_AS_IF_USER(01016), ASI_PRIMARY_AS_IF_USER_LITTLE
)
)
(01816
)
■ ASI_SECONDARY_AS_IF_USER(01116), ASI_SECONDARY_AS_IF_USER_LITTLE
( 01916)
If an ASI other than the above is specified, prefetchais executed as a nop.
Release 1.0, 1 July 2002
F. Chapter A
Instruction Definitions: SPARC64 V Extensions
57
TABLE A-7 describes prefetch variants implemented in SPARC64 V.
TABLE A-7 Prefetch Variants
fcn
Fetch to:
L1D
L2
Status
S
Description
0
1
S
2
L1D
L2
M
3
M
4
—
—
NOP
5-15
16-19
reserved (SPARC V9)
illegal_instruction exception is signalled.
implementation
NOP
dependent.
20
L1D
S
If an access causes an mTLB miss,
fast_data_access_MMU_miss exception is signalled.
21
L2
S
If an access causes an mTLB miss,
fast_data_access_MMU_miss exception is signalled.
22
L1D
L2
M
M
If an access causes an mTLB miss,
fast_data_access_MMU_miss exception is signalled.
23
If an access causes an mTLB miss,
fast_data_access_MMU_miss exception is signalled.
24-31
implementation
dependent
NOP
A.51 Read State Register
In SPARC64 V, an RDPCRinstruction will generate a privileged_action exception if
PSTATE.PRIV= 0 and PCR.PRIV= 1. If PSTATE.PRIV= 0 and PCR.PRIV= 0,
RDPCRwill not cause any access privilege violation exception (impl. dep. #250).
A.70 SHUTDOWN (VIS I)
In SPARC64 V, SHUTDOWNacts as a NOPin privileged mode (impl. dep. #206).
58
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002
A.70 Write State Register
In SPARC64 V, a WRPCRinstruction will cause a privileged_action exception if
PSTATE.PRIV= 0 and PCR.PRIV= 1. If PSTATE.PRIV= 0 and PCR.PRIV= 0,
WRPCRcauses a privileged_action exception only when an attempt is made to change
(that is, write 1 to) PCR.PRIV(impl. dep. #250).
A.71 Deprecated Instructions
The deprecated instructions in A.71 of Commonality are provided only for
compatibility with previous versions of the architecture. They should not be used in
new software.
A.71.10 Store Barrier
In SPARC64 V, STBARbehaves as NOP since the hardware memory models always
enforce the semantics of these MEMBARs for all memory accesses.
Release 1.0, 1 July 2002
F. Chapter A
Instruction Definitions: SPARC64 V Extensions
59
60
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002
F.APPENDIX
B
IEEE Std 754-1985 Requirements for
SPARC V9
The IEEE Std 754-1985 floating-point standard contains a number of implementation
dependencies.
Please see Appendix B of Commonality for choices for these implementation
dependencies, to ensure that SPARC V9 implementations are as consistent as
possible.
Following is information specific to the SPARC64 V implementation of SPARC V9 in
these sections:
■ Traps Inhibiting Results on page 61
■ Floating-Point Nonstandard Mode on page 61
B.1
B.6
Traps Inhibiting Results
Please refer to Section B.1 of Commonality.
The SPARC64 V hardware, in conjunction with kernel or emulation code, produces
the results described in this section.
Floating-Point Nonstandard Mode
In this section, the hardware boundary conditions for the unfinished_FPop exception
and the nonstandard mode of SPARC64 V floating-point hardware are discussed.
61
SPARC64 V floating-point hardware has its specific range of computation. If either
the values of input operands or the value of the intermediate result shows that the
computation may not fall in the range that hardware provides, SPARC64 V generates
an fp_exception_other exception (tt= 02216) with FSR.ftt= 0216 (unfinished_FPop)
and the operation is taken over by software.
The kernel emulation routine completes the remaining floating-point operation in
accordance with the IEEE 754-1985 floating-point standard (impl. dep. #3).
SPARC64 V implements a nonstandard mode, enabled when FSR.NSis set (see
FSR_nonstandard_fp (NS) on page 18). Depending on the setting in FSR.NS, the
behavior of SPARC64 V with respect to the floating-point computation varies.
B.6.1
fp_exception_other Exception (ftt=unfinished_FPop)
SPARC64 V may invoke an fp_exception_other (tt= 02216) exception with FSR.ftt=
unfinished_FPop (ftt= 0216) in FsTOd, FdTOs, FADD(s,d), FSUB(s,d),
FsMULd(s,d), FMUL(s,d), FDIV(s,d), FSQRT(s,d) floating-point instructions. In
addition, Floating-point Multiply-Add/ Subtract instructions generate the exception,
since the instruction is the combination of a multiply and an add/ subtract operation:
FMADD(s,d), FMSUB(s,d), FNMADD(s,d), and FNMADD(s,d).
The following basic policies govern the detection of boundary conditions:
1. When one of the operands is a denormalized number and the other operand is a
normal non-zero floating-point number (except for a NaN or an infinity), an
fp_exception_other with unfinished_FPop condition is signalled. The cases in which
the result is a zero or an overflow are excluded.
2. When both operands are denormalized numbers, except for the cases in which the
result is a zero or an overflow, an fp_exception_other with unfinished_FPop condition
is signalled.
3. When both operands are normal, the result before rounding is a denormalized
number and TEM.UFM = 0, and fp_exception_other with unfinished_FPop condition
is signalled, except for the cases in which the result is a zero.
When the result is expected to be a constant, such as an exact zero or an infinity, and
an insignificant computation will furnish the result, SPARC64 V tries to calculate the
result without signalling an unfinished_FPop exception.
62
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002
amount of hardware. SPARC64 V detects approximate boundary conditions by
calculating the exponent intermediate result (the exponent before rounding) from
input operands, to avoid the hardware cost. Since the computation of the boundary
conditions is approximate, the detection of a zero result or an overflow result shall
be pessimistic. SPARC64 V generates an unfinished_FPop exception pessimistically.
The equations to calculate the result exponent to detect the boundary conditions
from the input exponents are presented in TABLE B-1, where Er is the approximation
of the biased result exponent before rounding and is calculated only from the input
exponents (esrc1, esrc2). Er is to be used for detecting the boundary condition for an
unfinished_FPop.
TABLE B-1 Result Exponent Approximation for Detecting unfinished_FPop Boundary
Conditions
Operation
fmuls
fmuld
fdivs
fdivd
Formula
Er = esrc1 + esrc2 − 126
Er = esrc1 + esrc2 − 1022
Er = esrc1 - esrc2 + 126
Er = esrc1 - esrc2 + 1022
esrc1 and esrc2 are the biased exponents of the input operands. When the
From Er, eres is calculated. eres is a biased result exponent, after mantissa alignment
and before rounding, where the appropriate adjustment of the exponent is applied to
the result mantissa: left-shifting or right-shifting the mantissa to the implicit 1 at the
left of the binary point, subtracting or adding the shift-amount to the exponent. The
result mantissa is assumed to be 1.xxxx in calculating eres. If the result is a
denormalized number, eres is less than zero.
TABLE B-2 describes the boundary condition of each floating-point instruction that
generates an unfinished_FPop exception.
TABLE B-2
unfinished_FPop Boundary Conditions
Operation
FdTOs
Boundary Conditions
−25 < eres < 1 and TEM.UFM= 0.
Second operand (rs2) is a denormalized number.
FsTOd
FADDs, FSUBs,
FADDd, FSUBd
1. One of the operands is a denormalized number, and the other operand is a normal,
1
nonzero floating-point number (except for a NaN and an infinity) .
2. Both operands are denormalized numbers.
3. Both operands are normal nonzero floating-point numbers (except for a NaN and
an infinity), eres < 1, and TEM.UFM= 0.
Release 1.0, 1 July 2002
F. Chapter B
IEEE Std 754-1985 Requirements for SPARC V9
63
TABLE B-2
Operation
unfinished_FPop Boundary Conditions (Continued)
Boundary Conditions
FMULs, FMULd
1. One of the operands is a denormalized number, the other operand is a normal,
nonzero floating-point number (except for a NaN and an infinity), and
single precision: -25 < Er
double precision: -54 < Er
2. Both operands are normal, nonzero floating-point numbers (except for a NaN and
an infinity), TEM.UFM= 0, and
single precision: −25 < eres < 1
double precision: −54 < eres < 1
FsMULd
1. One of the operands is a denormalized number, and the other operand is a normal,
nonzero floating-point number (except for a NaN and an infinity).
2. Both operands are denormalized numbers.
FDIVs, FDIVd
1. The dividend (operand1; rs1) is a normal, nonzero floating-point number (except
for a NaN and an infinity), the divisor (operand2; rs2) is a denormalized number,
and
single precision: Er < 255
double precision: Er < 2047
2. The dividend (operand1; rs1) is a denormalized number, the divisor (operand2;
rs2) is a normal, nonzero floating-point number (except for a NaN and an infinity),
and
single precision: −25 < Er
double precision: −54 < Er
3. Both operands are denormalized numbers.
4. Both operands are normal, nonzero floating-point numbers (except for a NaN and
an infinity), TEM.UFM= 0 and
single precision: −25 < eres < 1
FSQRTs, FSQRTd
The input operand (operand2; rs2) is a positive nonzero and is a denormalized
number.
1. Operation of 0 and denormalized number generates a result in accordance with the IEEE754-1985 standard.
Pessimistic Zero
If a condition in TABLE B-3 is true, SPARC64 V generates the result as a pessimistic
zero, meaning that the result is a denormalized minimum or a zero, depending on
the rounding mode (FSR.RD).
64
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002
TABLE B-3 Conditions for a Pessimistic Zero
Conditions
Operations
One operand is denormalized1
Both are denormalized
Both are normal fp-number2
FdTOs
always
—
eres ≤ -25
FMULs,
FMULd
single precision: Er ≤−25
double precision: Er ≤−54
single precision: Er ≤−25
double precision: Er ≤−54
Always
single precision: eres ≤−25
double precision: eres ≤−54
single precision: eres ≤−25
double precision: eres ≤−54
FDIVs,
FDIVd
Never
1. Both operands are non-zero, non-NaN, and non-infinity numbers.
2. Both may be zero, but both are non-NaN and non-infinity numbers.
Pessimistic Overflow
If a condition in TABLE B-4 is true, SPARC64 V regards the operation as having an
overflow condition.
TABLE B-4 Pessimistic Overflow Conditions
Operations
FDIVs
Conditions
The divisor (operand2; rs2) is a denormalized number and, Er ≥ 255.
The divisor (operand2; rs2) is a denormalized number and, E ≥ 2047.
FDIVd
B.6.2
Operation Under FSR.NS = 1
When FSR.NS= 1 (nonstandard mode), SPARC64 V zeroes all the input
denormalized operands before the operation and signals an inexact exception if
enabled. If the operation generates a denormalized result, SPARC64 V zeroes the
result and also signals an inexact exception if enabled. The following list defines the
operation in detail.
■ If either operand is a denormalized number and both operands are non-zero, non-
NaN, and non-infinity numbers, the input denormalized operand is replaced with
a zero with same sign, and the operation is performed. If enabled, inexact
exception is signalled; an fp_exception_ieee_754 (tt= 02116) is generated, with
nxc=1 in FSR.cexc(FSR.ftt=0116; IEEE754_exception). However, if the
condition is detected, or if the operation is FSQRT(s,d) and an invalid_operation
condition is detected, the inexact condition is not reported.
■ If the result before rounding is a denormalized number, the result is flushed to a
zero with a same sign and signals either an underflow exception or an inexact
exception, depending on FSR.TEM.
As observed from the preceding, when FSR.NS = 1, SPARC64 V generates neither
an unfinished_FPop exception nor a denormalized number as a result. TABLE B-5
Release 1.0, 1 July 2002
F. Chapter B
IEEE Std 754-1985 Requirements for SPARC V9
65
summarizes the behavior of SPARC64 V floating-point hardware depending on
FSR.NS.
Note – The result and behavior of SPARC64 V of the shaded column in the tables
Table B-5 and Table B-6 conform to IEEE754-1985 standard.
Note – Throughout Table B-5 and Table B-6, lowercase exception conditions such as
nx, uf, of, dv and nv are nontrapping IEEE 754 exceptions. Uppercase exception
conditions such as NX, UF, OF, DZ and NV are trapping IEEE 754 exceptions.
TABLE B-5
66
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002
TABLE B-6 describes how SPARC64 V behaves when FSR.NS= 1 (nonstandard mode).
TABLE B-6
Nonarithmetic Operations Under FSR.NS= 1
op2=
Operations op1= denorm denorm
UFM
NXM
1
DVM
—
—
—
—
—
—
—
—
—
—
—
—
—
—
NVM
—
—
—
—
—
—
—
—
—
—
—
—
—
—
Result
FsTOd
—
Yes
Yes
—
NX
0
nx, a signed zero
FdTOs
—
1
0
—
1
UF
NX
0
uf + nx, a signed zero
FADDs,
FSUBs,
FADDd,
FSUBd
Yes
No
Yes
Yes
—
No
Yes
Yes
—
1
NX
0
nx, op2
—
1
NX
0
nx, op1
1
NX
0
nx, a signed zero
FMULs,
FMULd,
FsMULd
1
NX
—
—
0
nx, a signed zero
Yes
No
Yes
Yes
1
NX
0
nx, a signed zero
FDIVs,
FDIVd
Yes
No
Yes
1
—
—
1
—
—
—
—
1
NX
0
nx, a signed zero
—
—
—
—
1
DZ
0
dz, a signed infinity
NV
—
—
—
—
—
—
1
0
nv, dNaN
FSQRTs,
FSQRTd
Yes and op2
> 0
—
—
1
NX
—
—
0
nx, zero
NV
Yes and op2
< 0
—
—
0
nv, dNaN
1. A single precision dNaN is 7FFF.FFFF and a double precision dNaN is 7FFF.FFFF.FFFF.FFFF
16,
.
16
Release 1.0, 1 July 2002
F. Chapter B
IEEE Std 754-1985 Requirements for SPARC V9
67
68
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002
F.APPENDIX
C
This appendix summarizes implementation dependencies. In SPARC V9 and SPARC
JPS1, the notation “IMPL. DEP. #nn:” identifies the definition of an implementation
dependency; the notation “(impl. dep. #nn)” identifies a reference to an
implementation dependency. These dependencies are described by their number nn
in TABLE C-1 on page 70. These numbers have been removed from the body of this
document for SPARC64 V to make the document more readable. TABLE C-1 has been
modified to include descriptions of the manner in which SPARC64 V has resolved
each implementation dependency.
Note – SPARC International maintains a document, Implementation Characteristics of
Current SPARC-V9-based Products, Revision 9.x, that describes the implementation-
dependent design features of all SPARC V9-compliant implementations. Contact
SPARC International for this document at
home page: www.sparc.org
email: info@sparc.org
C.1
Definition of an Implementation
Dependency
Please refer to Section C.1 of Commonality.
69
C.2
C.3
C.4
Hardware Characteristics
Please refer to Section C.2 of Commonality.
Implementation Dependency Categories
Please refer to Section C.3 of Commonality.
List of Implementation Dependencies
TABLE C-1 provides a complete list of how each implementation dependency is
treated in the SPARC64 V implementation.
TABLE C-1 SPARC64 V Implementation Dependencies (1 of 11)
Nbr
SPARC64 V Implementation Notes
Page
1
Software emulation of instructions
—
The operating system emulates all instructions that generate
illegal_instruction or unimplemented_FPop exceptions.
2
3
Number of IU registers
SPARC64 V supports eight register windows (NWINDOWS= 8).
—
SPARC64 V supports an additional two global register sets (Interrupt
globals and MMU globals) for a total of 160 integer registers.
Incorrect IEEE Std 754-1985 results
62
See Section B.6, Floating-Point Nonstandard Mode, on page 61 for details.
4–5
Reserved.
6
I/O registers privileged status
This dependency is beyond the scope of this publication. It should be
defined in each system that uses SPARC64 V.
—
—
—
7
8
I/O register definitions
This dependency is beyond the scope of this publication. It should be
defined in each system that uses SPARC64 V.
RDASR/WRASR target registers
See A.50 and A.70 in Commonality for details of implementation-dependent
RDASR/ WRASRinstructions.
70
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002
TABLE C-1 SPARC64 V Implementation Dependencies (2 of 11)
Nbr
SPARC64 V Implementation Notes
Page
9
RDASR/WRASR privileged status
—
See A.50 and A.70 in Commonality for details of implementation-dependent
RDASR/ WRASRinstructions.
10–12 Reserved.
13
VER.impl
20
VER.impl= 5 for the SPARC64 V processor.
14–15 Reserved.
—
16
IU deferred-trap queue
24
SPARC64 V neither has nor needs an IU deferred-trap queue.
17
18
Reserved.
—
Nonstandard IEEE 754-1985 results
18, 62
SPARC64 V flushes denormal operands and results to zero when
FSR.NS= 1. For the treatment of denormalized numbers, please refer to
Section B.6, Floating-Point Nonstandard Mode, on page 61 for details.
19
FPU version, FSR.ver
FSR.ver= 0 for SPARC64 V.
18
19
20–21 Reserved.
22
FPU TEM, cexc, and aexc
SPARC64 V implements all bits in the TEM, cexc, and aexcfields in
hardware.
23
24
25
Floating-point traps
In SPARC64 V floating-point traps are always precise; no FQ is needed.
24
24
24
SPARC64 V neither has nor needs a floating-point deferred-trap queue.
RDPR of FQ with nonexistent FQ
Attempting to execute an RDPRof the FQcauses an illegal_instruction
exception.
26–28 Reserved.
—
—
29
Address space identifier (ASI) definitions
The ASIs that are supported by SPARC64 V are defined in Appendix L,
Address Space Identifiers.
30
31
ASI address decoding
SPARC64 V supports all of the listed ASIs.
117
138
Catastrophic error exceptions
SPARC64 V contains a watchdog timer that times out after no instruction
has been committed for a specified number of cycles. If the timer times out,
the CPU tries to invoke an async_data_error trap. If the counter continues to
count to reach 233, the processor enters error_state. Upon an entry to
error_state, the processor optionally generates a WDR reset to recover
from error_state.
Release 1.0, 1 July 2002
F. Chapter C
Implementation Dependencies
71
TABLE C-1 SPARC64 V Implementation Dependencies (3 of 11)
Nbr
SPARC64 V Implementation Notes
Page
32
Deferred traps
37, 149
SPARC64 V signals a deferred trap in a few of its severe error conditions.
SPARC64 V does not contain a deferred trap queue.
33
Trap precision
37
There are no deferred traps in SPARC64 V other than the trap caused by a
few severe error conditions. All traps that occur as the result of program
execution are precise.
34
35
Interrupt clearing
For details of interrupt handling see Appendix N, Interrupt Handling.
—
Implementation-dependent traps
SPARC64 V supports the following traps that are implementation
dependent:
39, 39
• interrupt_vector_trap (tt= 060
)
16
• PA_watchpoint (tt= 061
)
16
• VA_watchpoint (tt= 062
)
16
• ECC_error (tt= 063
)
16
• fast_instruction_access_MMU_miss (tt= 064 through 067
)
16
16
• fast_data_access_MMU_miss (tt= 068 through 06B
)
16 16
• fast_data_access_protection (tt= 06C through 06F
)
16
16
• async_data_error (tt= 04016
)
36
Trap priorities
38
SPARC64 V’s implementation-dependent traps have the following
priorities:
• interrupt_vector_trap (priority =16)
• PA_watchpoint (priority =12)
• VA_watchpoint (priority = 1)
• fast_instruction_access_MMU_miss (priority = 2)
• fast_data_access_MMU_miss (priority = 12)
• fast_data_access_protection (priority = 12)
• async_data_error (priority = 2)
37
38
Reset trap
37
SPARC64 V implements power-on reset (POR) and watchdog reset.
Effect of reset trap on implementation-dependent registers
141
See Section O.3, Processor State after Reset and in RED_state, on page 141.
39
40
Entering error_state on implementation-dependent errors
CPU watchdog timeout at 233 ticks, a normal trap, or an SIR at TL= MAXTL
causes the CPU to enter error_state.
36
36
Error_state processor state
SPARC64 V optionally takes a watchdog reset trap after entry to
error_state. Most error-logging register state will be preserved. (See also
impl. dep. #254.)
41
Reserved.
72
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002
TABLE C-1 SPARC64 V Implementation Dependencies (4 of 11)
Nbr
SPARC64 V Implementation Notes
Page
42
FLUSH instruction
—
SPARC64 V implements the FLUSHinstruction in hardware.
43
44
Reserved.
Data access FPU trap
—
The destination register(s) are unchanged if an access error occurs.
45–46 Reserved.
47
RDASR
—
—
See A.50, Read State Register, in Commonality for details.
48
WRASR
See A.70, Write State Register, in Commonality for details.
49–54 Reserved.
55 Floating-point underflow detection
See FSR_underflow in Section 5.1.7 of Commonality for details.
56–100 Reserved.
—
101
Maximum trap level
20
MAXTL= 5.
102
Clean windows trap
—
SPARC64 V generates a clean_window exception; register windows are
cleaned in software.
103
Prefetch instructions
—
following implementation-dependent characteristics:
• The prefetches have observable effects in privileged code.
• Prefetch variants 0–3 do not cause a fast_data_access_MMU_miss trap,
because the prefetch is dropped when a fast_data_access_MMU_miss
condition happens. On the other hand, prefetch variants 20–23 cause
data_access_MMU_miss traps on TLB misses.
• All prefetches are for 64-byte cache lines, which are aligned on a 64-byte
boundary.
• See Section A.49, Prefetch Data, on page 57, for implemented variations
and their characteristics.
• Prefetches will work normally if the ASI is ASI_PRIMARY,
ASI_SECONDARY, or ASI_NUCLEUS, ASI_PRIMARY_AS_IF_USER,
ASI_SECONDARY_AS_IF_USER, and their little-endian pairs.
104
105
VER.manuf
20
19
VER.manuf= 0004 . The least significant 8 bits are Fujitsu’s JEDEC
manufacturing code.
16
TICK register
SPARC64 V implements 63 bits of the TICKregister; it increments on every
clock cycle.
Release 1.0, 1 July 2002
F. Chapter C
Implementation Dependencies
73
TABLE C-1 SPARC64 V Implementation Dependencies (5 of 11)
Nbr
SPARC64 V Implementation Notes
Page
106
IMPDEPn instructions
49
SPARC64 V uses the IMPDEP2opcode for the Multiply Add/ Subtract
instructions. SPARC64 V also conforms to Sun’s specification for VIS-1 and
VIS-2.
107
108
109
Unimplemented LDD trap
SPARC64 V implements LDDin hardware.
—
—
—
Unimplemented STD trap
SPARC64 V implements STDin hardware.
LDDF_mem_address_not_aligned
If the address is word aligned but not doubleword aligned, SPARC64 V
generates the LDDF_mem_address_not_aligned exception. The trap handler
software emulates the instruction.
110
111
112
STDF_mem_address_not_aligned
—
—
—
If the address is word aligned but not doubleword aligned, SPARC64 V
generates the STDF_mem_address_not_aligned exception. The trap handler
software emulates the instruction.
LDQF_mem_address_not_aligned
SPARC64 V generates an illegal_instruction exception for all LDQFs. The
software emulates the instruction.
STQF_mem_address_not_aligned
SPARC64 V generates an illegal_instruction exception for all STQFs. The
processor does not perform the check for fp_disabled. The trap handler
software emulates the instruction.
113
114
Implemented memory models
SPARC64 V implements Total Store Order (TSO) for all the memory models
specified in PSTATE.MM. See Chapter 8, Memory Models, for details.
42
36
RED_state trap vector address (RSTVaddr)
RSTVaddris a constant in SPARC64 V, where:
VA= FFFF FFFF F000 000016 and
PA=07FF F000 000016
115
116
RED_state processor state
See RED_state on page 36 for details of implementation-specific actions in
RED_state.
36
SIR_enable control flag
—
See Section A.60 SIRin Commonality for details.
117
118
MMU disabled prefetch behavior
Prefetch and nonfaulting Load always succeed when the MMU is disabled.
91
Identifying I/O locations
—
This dependency is beyond the scope of this publication. It should be
defined in a system that uses SPARC64 V.
74
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002
TABLE C-1 SPARC64 V Implementation Dependencies (6 of 11)
Nbr
SPARC64 V Implementation Notes
Page
119
Unimplemented values for PSTATE.MM
42
Writing 112 into PSTATE.MMcauses the machine to use the TSO memory
of SPARC64 V may use this encoding for a new memory model.
120
121
Coherence and atomicity of memory operations
—
—
Although SPARC64 V implements the UPA-based cache coherency
mechanism, this dependency is beyond the scope of this publication. It
should be defined in a system that uses SPARC64 V.
Implementation-dependent memory model
SPARC64 V implements TSO, PSO, and RMO memory models. See
Chapter 8, Memory Models, for details.
Accesses to pages with the E(Volatile) bit of their MMU page table entry set
are also made in program order.
122
123
FLUSH latency
—
—
Since the FLUSHinstruction synchronizes the processor, its total latency
varies depending on many portions of the SPARC64 V processor ’s state.
Assuming that all prior instructions are completed, the latency of FLUSHis
18 processor cycles.
Input /output (I/O) semantics
This dependency is beyond the scope of this publication. It should be
defined in a system that uses SPARC64 V.
124
125
Implicit ASI when TL > 0
See Section 5.1.7 of Commonality for details.
—
Address masking
29, 49, 53
When PSTATE.AM = 1, SPARC64 V does mask out the high-order 32 bits of
the PCwhen transmitting it to the destination register.
126
Register Windows State Registers width
—
NWINDOWSfor SPARC64 V is 8; therefore, only 3 bits are implemented for
the following registers: CWP, CANSAVE, CANRESTORE, OTHERWIN. If an
attempt is made to write a value greater than NWINDOWS − 1 to any of these
registers, the extraneous upper bits are discarded. The CLEANWINregister
contains 3 bits.
127–201 Reserved.
202
fast_ECC_error trap
—
fast_ECC_error trap is not implemented in SPARC64 V.
203
204
205
Dispatch Control Register bits 13:6 and 1
SPARC64 V does not implement DCR.
22
22
24
DCR bits 5:3 and 0
SPARC64 V does not implement DCR.
Instruction Trap Register
SPARC64 V implements the Instruction Trap Register.
Release 1.0, 1 July 2002
F. Chapter C
Implementation Dependencies
75
TABLE C-1 SPARC64 V Implementation Dependencies (7 of 11)
Nbr
SPARC64 V Implementation Notes
Page
206
SHUTDOWN instruction
58
In privileged mode the SHUTDOWNinstruction executes as a NOP in
SPARC64 V.
207
PCR register bits 47:32, 26:17, and bit 3
20, 21,
201
SPARC64 V uses these bits for the following purposes:
• Bits 47:32 for set/ clear/ show status of overflow (OVF).
• Bit 26 for validity of OVFfield (OVRO).
• Bits 24:22 for number of counter pair (NC).
• Bits 20:18 for counter selector (SC).
• Bit 3 for validity of SU/ SLfield (ULRO).
Other implementation-dependent bits are read as 0 and writes to them are
ignored.
208
Ordering of errors captured in instruction execution
The order in which errors are captured during instruction execution is
implementation dependent. Ordering can be in program order or in order of
detection.
—
209
210
211
212
Software intervention after instruction-induced error
Precision of the trap to signal an instruction-induced error for which
recovery requires software intervention is implementation dependent.
—
—
—
—
ERROR output signal
The causes and the semantics of ERROR output signal are implementation
dependent.
Error logging registers’ information
The information that the error logging registers preserves beyond the reset
induced by an ERROR signal is implementation dependent.
Trap with fatal error
Generation of a trap along with ERROR signal assertion upon detection of a
fatal error is implementation dependent.
213
214
215
AFSR.PRIV
—
—
—
SPARC64 V does not implement the AFSR.PRIVbit.
Enable/disable control for deferred traps
SPARC64 V does not implement a control feature for deferred traps.
Error barrier
DONEand RETRYinstructions may implicitly provide an error barrier
function as MEMBAR #Sync. Whether DONEand RETRYinstructions provide
an error barrier is implementation dependent.
216
217
data_access_error trap precision
data_access_error trap is always precise in SPARC64 V.
—
—
instruction_access_error trap precision
instruction_access_error trap is always precise in SPARC64 V.
76
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002
TABLE C-1 SPARC64 V Implementation Dependencies (8 of 11)
Nbr
SPARC64 V Implementation Notes
Page
218
async_data_error
39
async_data_error trap is implemented in SPARC64 V, using tt= 40 . See
16
Appendix P for details.
219
Asynchronous Fault Address Register (AFAR) allocation
177, 178
SPARC64 V implements two AFARs:
• VA = 00 for an error occurring in D1 cache.
16
• VA = 08 for an error occurring in U2 cache.
16
220
221
Addition of logging and control registers for error handling
SPARC64 V implements various features for sustaining reliability. See
Appendix P for details.
—
—
Special/signalling ECCs
The method to generate “special” or “signalling” ECCs and whether
processor-ID is embedded into the data associated with special/ signalling
ECCs is implementation dependent.
222
TLB organization
85
SPARC64 V has the following TLB organization:
• Level-2 micro ITLB (uITLB), 32-way fully associative
• Level-1 micro DTLB (uDTLB), 32-way fully associative
• Level-2 IMMU-TLB—consisting of sITLB (set-associative Instruction TLB)
and fITLB (fully associative Instruction TLB).
• Level-2 DMMU-TLB—consisting of sDTLB (set-associative Data TLB) and
fDTLB (fully associative Data TLB).
223
224
TLB multiple-hit detection
86
86
On SPARC64 V, TLB multiple hit detection is supported. However, the
multiple hit is not detected at every TLB reference. When the micro-TLB
(uTLB), which is the cache of sTLB and fTLB, matches the virtual address,
the multiple hit in sTLB and fTLB is not detected. The multiple hit is
detected only when the micro-TLB mismatches and the main TLB is
referenced.
MMU physical address width
The SPARC64 V MMU implements 43-bit physical addresses. The PAfield of
the TTEholds a 43-bit physical address. Bits 46:43 of each TTE always read
as 0 and writes to them are ignored. The MMU translates virtual addresses
into 43-bit physical addresses. Each cache tag holds bits 42:6 of physical
addresses.
225
226
TLB locking of entries
87
87
In SPARC64 V, when a TTE with its lock bit set is written into TLB through
the Data In register, the TTE is automatically written into the corresponding
fully associative TLB and locked in the TLB. Otherwise, the TTE is written
into the corresponding sTLB of fTLB, depending on its page size.
TTE support for CV bit
SPARC64 V does not support the CVbit in TTE. Since I1 and D1 are
virtually indexed caches, unaliasing is supported by SPARC64 V. See also
impl. dep. #232.
Release 1.0, 1 July 2002
F. Chapter C
Implementation Dependencies
77
TABLE C-1 SPARC64 V Implementation Dependencies (9 of 11)
Nbr
SPARC64 V Implementation Notes
Page
227
TSB number of entries
88
SPARC64 V supports a maximum of 16 million entries in the common TSB
and a maximum of 32 million lines the Split TSB.
228
229
TSB_Hash supplied from TSB or context-ID register
TSB_Hashis generated from the context-ID register in SPARC64 V.
88
88
TSB_Base address generation
SPARC64 V generates the TSB_Baseaddress directly from the TLB
Extension Registers. By maintaining compatibility with UltraSPARC I/ II,
SPARC64 V provides mode flag MCNTL.JPS1_TSBP. When
MCNTL.JPS1_TSBP= 0, the TSB_Baseregister is used.
230
231
232
233
data_access_exception trap
SPARC64 generates data_access_exception only for the causes listed in
Section 7.6.1 of Commonality.
89
91
MMU physical address variability
SPARC64 V supports both 41-bit and 43-bit physical address mode. The
initial width of the physical address is controlled by OPSR.
DCU Control Register CP and CV bits
SPARC64 V does not implement CPand CVbits in the DCU Control
Register. See also impl. dep. #226.
23, 91
92
TSB_Hash field
SPARC64 V does not implement TSB_Hash.
234
235
TLB replacement algorithm
For fTLB, SPARC64 V implements a pseudo-LRU. For sTLB, LRU is used.
93
94
TLB data access address assignment
The MMU TLB data-access address assignment and the purpose of the
address are implementation dependent.
236
TSB_Size field width
97
In SPARC64 V, TSB_Sizeis 4 bits wide, occupying bits 3:0 of the TSB
register. The maximum number of TSBentries is, therefore, 512 × 215 (16M
entries).
237
238
DSFAR/DSFSR for JMPL/RETURN mem_address_not_aligned
A mem_address_not_aligned exception that occurs during a JMPLor RETURN
instruction does not update either the D-SFARor D-SFSRregister.
89, 97
87
TLB page offset for large page sizes
On SPARC64 V, even for a large page, written data for TLB Data Register is
preserved for bits representing an offset in a page, so the data previously
written is returned regardless of the page size.
239
Register access by ASIs 5516 and 5D16
92
In SPARC64 V, VA<63:19> of IMMU ASI 5516 and DMMU ASI 5D16 are
ignored. An access to virtual addresses 4000016 to 60FF816 is treated as an
access 0000016 to 20FF816
78
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002
TABLE C-1 SPARC64 V Implementation Dependencies (10 of 11)
Nbr
SPARC64 V Implementation Notes
Page
240
DCU Control Register bits 47:41
23
SPARC64 V uses bit 41 for WEAK_SPCA, which enables/ disables memory
access in speculative paths.
241
242
Address Masking and DSFAR
SPARC64 V writes zeroes to the more significant 32 bits of DSFAR.
—
TLB lock bit
87
In SPARC64 V, only the fITLB and the fDTLB support the lock bit. The lock
bit in sITLB and sDTLB is read as 0 and writes to it are ignored.
243
Interrupt Vector Dispatch Status Register BUSY/NACK pairs
136
In SPARC64 V, 32 BUSY/ NACK pairs are implemented in the Interrupt
Vector Dispatch Status Register.
244
245
Data Watchpoint Reliability
No implementation-dependent features of SPARC64 V reduce the reliability
of data watchpoints.
24
24
Call/Branch displacement encoding in I-Cache
In SPARC64 V, the least significant 11 bits (bits 10:0) of a CALLor branch
(BPcc, FBPfcc, Bicc, BPr) instruction in an instruction cache are identical
to the architectural encoding (as they appear in main memory).
246
247
248
249
VA<38:29> for Interrupt Vector Dispatch Register Access
SPARC64 V ignores all 10 bits of VA<38:29> when the Interrupt Vector
Dispatch Register is written.
136
136
18
Interrupt Vector Receive Register SID fields
SPARC64 V obtains the interrupt source identifier SID_Lfrom the UPA
packet.
Conditions for fp_exception_other with unfinished_FPop
SPARC64 V triggers fp_exception_other with trap type unfinished_FPop
under the standard conditions described in Commonality Section 5.1.7.
Data watchpoint for Partial Store instruction
57
Watchpoint exceptions on Partial Store instructions occur conservatively on
SPARC64 V. The DCUCRData Watchpoint masks are only checked for
nonzero value (watchpoint enabled). The byte store mask (r[rs2]) in the
Partial Store instruction is ignored, and a watchpoint exception can occur
even if the mask is zero (that is, no store will take place).
250
PCR accessibility when PSTATE.PRIV = 0
20, 22, 58
In SPARC64 V, the accessibility of PCRwhen PSTATE.PRIV= 0 is
determined by PCR.PRIV. If PSTATE.PRIV= 0 and PCR.PRIV= 1, an
attempt to execute either RDPCRor WRPCRwill cause a privileged_action
exception. If PSTATE.PRIV= 0 and PCR.PRIV= 0, RDPCRoperates without
privilege violation and WRPCRgenerates a privileged_action exception only
when an attempt is made to change (that is, write 1 to) PCR.PRIV.
251
Reserved.
—
Release 1.0, 1 July 2002
F. Chapter C
Implementation Dependencies
79
TABLE C-1 SPARC64 V Implementation Dependencies (11 of 11)
Nbr
SPARC64 V Implementation Notes
Page
252
DCUCR.DC (Data Cache Enable)
24
SPARC64 V does not implement DCUCR.DC.
253
254
DCUCR.IC (Instruction Cache Enable)
SPARC64 V does not implement DCUCR.IC.
24
Means of exiting error_state
37, 146
The standard behavior of a SPARC64 V CPU upon entry into
error_stateis to reset itself by internally generating a watchdog_reset
(WDR). However, OPSRcan be set so that when error_state is entered, the
processor remains halted in error_stateinstead of generating a
watchdog_reset.
255
256
LDDFA with ASI E0 or E1 and misaligned destination register number
No exception is generated based on the destination register rd.
120
120
16
16
LDDFA with ASI E016 or E116 and misaligned memory address
n
For LDDFAwith ASI E016 or E11 and a memory address aligned on a 2 -byte
boundary, a SPARC64 V processor behaves as follows:
n ≥ 3 (≥ 8-byte alignment): no exception related to memory address
alignment is generated.
n = 2 (4-byte alignment): LDDF_mem_address_not_aligned exception is
generated.
n ≤ 1 (≤ 2-byte alignment): mem_address_not_aligned exception is
generated.
LDDFA with ASI C016–C516 or C816–CD16 and misaligned memory address
120
257
For LDDFAwith C016–C516 or C816–CD16 and a memory address aligned on
n
a 2 -byte boundary, a SPARC64 V processor behaves as follows:
n ≥ 3 (≥ 8-byte alignment): no exception related to memory address
alignment is generated.
n = 2 (4-byte alignment): LDDF_mem_address_not_aligned exception is
generated.
n ≤ 1 (≤ 2-byte alignment): mem_address_not_aligned exception is
generated.
ASI_SERIAL_ID
119
258
SPARC64 V provides an identification code for each processor.
80
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002
82
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002
F.APPENDIX
E
Opcode Maps
Please refer to Appendix E in Commonality. TABLE E-1 lists the opcode map for the
SPARC64 V IMPDEP2instruction.
TABLE E-1 IMPDEP2(op = 2, op3 = 3716)
var (instruction <8:7>)
00
01
10
11
(not used — reserved)
00
01
10
11
FMADDs
FMADDd
FMSUBs
FMSUBd
FNMADDs
SNMSUBd
FNMADDs
FNMSUBd
size
(instruction<6:5>)
(reserved for quad operations)
83
84
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002
F.APPENDIX
F
Memory Management Unit
The Memory Management Unit (MMU) architecture of SPARC64 V conforms to the
MMU architecture defined in Appendix F of Commonality but with some model
dependency. See Appendix F in Commonality for the basic definitions of the
SPARC64 V MMU.
Section numbers in this appendix correspond to those in Appendix F of
Commonality. Figures and tables, however, are numbered consecutively.
This appendix describes the implementation dependencies and other additional
information about the SPARC64 V MMU. For SPARC64 V implementations, we first
list the implementation dependency as given in TABLE C-1 of Commonality, then
describe the SPARC64 V implementation.
F.1
Virtual Address Translation
IMPL. DEP. #222: TLB organization is JPS1 implementation dependent.
SPARC64 V has the following TLB organization:
■
■
■
Level-1 micro ITLB (uITLB), 32-way fully associative
Level-2 IMMU-TLB consists of sITLB (set-associative Instruction TLB) and
fITLB (fully associative Instruction TLB).
■
Level-2 DMMU-TLB consists of sDTLB (set-associative Data TLB) and fDTLB
(fully associative Data TLB).
TABLE F-1 shows the organization of SPARC64 V TLBs.
Hardware contains micro-ITLB and micro-DTLB as the temporary memory of the
main TLBs, as shown in TABLE F-1. In contrast to the micro-TLBs, sTLB and fTLB
are called main TLBs.
85
The micro-TLBs are coherent to main TLBs and are not visible to software, with
the exception of TLB multiple hit detection. Hardware maintains the consistency
between micro-TLBs and main TLBs.
No other details on micro-TLB are provided because software cannot execute
direct operations to micro-TLB and its configuration is invisible to software.
TABLE F-1 Organization of SPARC64 V TLBs
Feature
sITLB and sDTLB
2048
fITLB and fDTLB
32
Entries
Associativity
2-way set associative
8 KB/ 4MB
Fully associative
8 KB/ 64 KB/ 512 KB/ 4 MB
Supported
Page size supported
Locked translation entry
Unlocked translation entry
Not supported
Supported
Supported
IMPL. DEP. #223: Whether TLB multiple-hit detections are supported in JPS1 is
implementation dependent.
On SPARC64 V, TLB multiple hit detection is supported. However, the multiple
hit is not detected at every TLB reference. When the micro-TLB (uTLB), which is
the cache of sTLB and fTLB, matches the virtual address, the multiple hit in sTLB
and fTLB is not detected. The multiple hit is detected only when the micro-TLB
mismatches and main TLB is referenced.
F.2
Translation Table Entry (TTE)
IMPL DEP. in Commonality TABLE F-1: TTE_Data bits 46–43 are implementation
dependent.
On SPARC64 V, TTE_Databits 46:43 are reserved.
IMPL. DEP. #224: Physical address width support by the MMU is implementation
dependent in JPS1; minimum PAwidth is 43 bits.
The SPARC64 V MMU implements 43-bit physical addresses. The PAfield of the
TTEholds a 43-bit physical address. The MMU translates virtual addresses into
43-bit physical addresses. Each cache tag holds bits 42:6 of physical addresses.
Bits 46:43 of each TTE always read as 0 and writes to them are ignored.
A cacheable access for a physical address ≥ 400 0000 000016 always causes the
cache miss for the U2 cache and generates a UPA request for the cacheable access.
The urgent error ASI_UGESR.SDCis signalled after the UPA cacheable access is
requested.
86
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002
The physical address length to be passed to the UPA interface is 41 bits or 43 bits,
as designated in the ASI_UPA_CONFIG.AMfield. When the 41-bit PAis specified
in ASI_UPA_CONFIG.AM, the most significant 2 bits of the CPU internal physical
address are discarded and only the remaining least significant 41 bits are passed
to the UPA address bus. If the discarded most significant 2 bits are not 0, the
urgent error ASI_UGESR.SDC is detected after the invalid address transfer to the
UPA interface. Otherwise, when the 43-bit PA is specified in
ASI_UPA_CONFIG.AM,the entire 43 bits of CPU internal physical address are
passed to the UPA address bus.
IMPL. DEP. #238: When page offset bits for larger page size (PA<15:13>, PA<18:13>,
and PA<21:13> for 64-Kbyte, 512-Kbyte, and 4-Mbyte pages, respectively) are stored
in the TLB, it is implementation dependent whether the data returned from those
fields by a Data Access read are zero or the data previously written to them.
On SPARC64 V, the data returned from PA<15:13>, PA<18:13>, and PA<21:13> for
64-Kbyte, 512-Kbyte, and 4-Mbyte pages, respectively, by a Data Access read are
the data previously written to them.
IMPL. DEP. #225: The mechanism by which entries in TLB are locked is
implementation dependent in JPS1.
In SPARC64 V, when a TTE with its lock bit set is written into TLB through the
Data In register, the TTE is automatically written into the corresponding fully
corresponding sTLB or fTLB, depending on its page size.
IMPL. DEP. #242: An implementation containing multiple TLBs may implement the L
(lock) bit in all TLBs but is only required to implement a lock bit in one TLB for each
page size. If the lock bit is not implemented in a particular TLB, it is read as 0 and
writes to it are ignored.
In SPARC64 V, only the fITLB and the fDTLB support the lock bit as described in
TABLE F-1. The lock bit in sITLB and sDTLB is read as 0 and writes to it are
ignored.
IMPL. DEP. #226: Whether the CVbit is supported in TTEis implementation
dependent in JPS1. When the CVbit in TTEis not provided and the implementation
has virtually indexed caches, the implementation should support hardware
unaliasing for the caches.
In SPARC64 V, no TLB supports the CVbit in TTE. SPARC64 V supports hardware
unaliasing for the caches. The CVbit in any TLBentry is read as 0 and writes to it
are ignored.
Release 1.0, 1 July 2002
F. Chapter F
Memory Management Unit
87
F.3.3
F.4.2
TSB Organization
IMPL. DEP. #227: The maximum number of entries in a TSB is implementation
dependent in JPS1. See impl. dep. #228 for the limitation of TSB_sizein TSB
registers.
SPARC64 V supports a maximum of 16 million lines in the common TSB and a
maximum 32 million lines in the split TSB. The maximum number N in
FIGURE F-4 of Commonality is 16 million (16 * 220).
TSB Pointer Formation
IMPL. DEP. #228: Whether TSB_Hashis supplied from a TSB Extension Register or
from a context-ID register is implementation dependent in JPS1. Only for cases of
direct hash with context-ID can the width of the TSB_sizefield be wider than 3
bits.
On SPARC64 V, TSB_Hashis supplied from a context-ID register. The width of
the TSB_sizefield is 4 bits.
IMPL. DEP. #229: Whether the implementation generates the TSB Base address by
exclusive-ORing the TSB Base Register and a TSB Extension Register or by taking the
TSB_Basefield directly from the TSB Extension Register is implementation
dependent in JPS1. This implementation dependency is only to maintain
compatibility with the TLB miss handling software of UltraSPARC I/ II.
On SPARC64 V, when ASI_MCNTL.JPS1_TSBP= 1, the TSB Base address is
generated by taking TSB_Basefield directly from the TSB Extension Register.
TSB Pointer Formation
On SPARC64 V, the number N in the following equations ranges from 0 to 15; N is
defined to be the TSB_Sizefield of the TSB Base or TSB Extension Register.
88
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002
8K_POINTER = TSB_Extension[63:14+N] 0 (VA[21+N:13] ⊕ TSB_Hash)
0000
64K_POINTER = TSB_Extension[63:14+N]
TSB_Hash) 0000
1
(VA[24+N:16] ⊕
Value of TSB_Hash for both a shared TSB and a split TSB
When 0 <= N <= 4,
TSB_Hash= context_register[N+8:0]
Otherwise, when 5 <= N <= 15,
TSB_Hash[ 12:0 ] = context_register[ 12:0 ]
TSB_Hash[ N+8:13 ] = 0 ( N-4 bits zero )
F.5
IMPL. DEP. #230: The cause of a data_access_exception trap is implementation
dependent in JPS1, but there are several mandatory causes of data_access_exception
trap.
SPARC64 V signals a data_access_exception for the causes, as defined in F.5 in
Commonality. However, caution is needed to deal with an invalid ASI. See
Section F.10.9 for details.
IMPL. DEP. #237: Whether the fault status and/ or address (DSFSR/ DSFAR) are
captured when mem_address_not_aligned is generated during a JMPLor RETURN
instruction is implementation dependent.
On SPARC64 V, the fault status and address (DSFSR/ DSFAR) are not captured
when a mem_address_not_aligned exception is generated during a JMPLor RETURN
instruction.
Additional information: On SPARC64 V, the two precise traps—
instruction_access_error and data_access_error—are recorded by the MMU in addition
to those in TABLE F-2 of Commonality. A modification (the two traps are added) of
that table is shown below.
TABLE F-2
MMU Trap Types, Causes, and Stored State Register Update Policy
Registers Updated
(Stored State in MMU)
I-MMU
Tag
D-MMU
D-SFSR, Tag
Ref #Trap Name
Trap Cause
I-SFSR Access SFAR
Access Trap Type
fast_instruction_access_MMU_miss
1.
I-TLB miss
X2
X
6416–6716
Release 1.0, 1 July 2002
F. Chapter F
Memory Management Unit
89
|