Sun Microsystems Server X4140 User Manual

Sun Fire™ X4140, X4240, and X4440  
Servers Diagnostics Guide  
Sun Microsystems, Inc.  
Part No. 820-3067-11  
August 2008, Revision A  
Submit comments about this document at: http://www.sun.com/hwdocs/feedback  
Download from Www.Somanuals.com. All Manuals Search And Download.  
vi  
Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008  
Download from Www.Somanuals.com. All Manuals Search And Download.  
Preface  
The Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide contains information  
and procedures for using available tools to diagnose problems with the servers.  
Before You Read This Document  
It is important that you review the safety guidelines in the Sun Fire X4140, X4240,  
and X4440 Safety and Compliance Guide.  
vii  
Download from Www.Somanuals.com. All Manuals Search And Download.  
   
Related Documentation  
The document set for the Sun Fire X4140, X4240, and X4440 Servers is described in  
the Where To Find Sun Fire X4140, X4240, and X4440 Servers Documentation sheet that  
is packed with your system. You can also find the documentation at  
Translated versions of some of these documents are available at  
http://docs.sun.com. Select a language from the drop-down list and navigate to  
the Sun Fire X4140, X4240, and X4440 Servers document collection using the Product  
category link. Available translations for the Sun Fire X4140, X4240, and X4440  
Servers include Simplified Chinese, Traditional Chinese, French, Japanese, and  
Korean.  
English documentation is revised more frequently and might be more up-to-date  
than the translated documentation. For all Sun documentation, go to the following  
URL:  
viii  
Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008  
Download from Www.Somanuals.com. All Manuals Search And Download.  
 
Typographic ConventionsThird-Party  
*
Typeface  
Meaning  
Examples  
AaBbCc123  
The names of commands, files,  
and directories; onscreen  
computer output  
Edit your.loginfile.  
Use ls-ato list all files.  
% You have mail.  
AaBbCc123  
What you type, when contrasted % su  
with onscreen computer output  
Password:  
Book titles, new words or terms, Read Chapter 6 in the User’s Guide.  
AaBbCc123  
words to be emphasized.  
Replace command-line variables  
with real names or values.  
These are called class options.  
You must be superuser to do this.  
To delete a file, type rmfilename.  
*
The settings on your browser might differ from these settings.  
Web Sites  
Sunis not responsible for the availability of third-party web sites mentioned in this  
document. Sun does not endorse and is not responsible or liable for any content,  
advertising, products, or other materials that are available on or through such sites  
or resources. Sun will not be responsible or liable for any actual or alleged damage  
or loss caused by or in connection with the use of or reliance on any such content,  
goods, or services that are available on or through such sites or resources.  
Preface  
ix  
Download from Www.Somanuals.com. All Manuals Search And Download.  
Sun Welcomes Your Comments  
Sun is interested in improving its documentation and welcomes your comments and  
suggestions. You can submit your comments by going to:  
Please include the title and part number of your document with your feedback:  
Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide, part number 820-3067-11  
x
Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008  
Download from Www.Somanuals.com. All Manuals Search And Download.  
 
CHAPTER  
1
Initial Inspection of the Server  
This chapter includes the following topics:  
Service Troubleshooting Flowchart  
Use the following flowchart as a guideline for using the subjects in this book to  
troubleshoot the server.  
TABLE 1-1  
Troubleshooting Flowchart  
To perform this task  
Refer to this section  
Gather initial service information.  
Investigate any powering-on  
problems.  
Perform external visual inspection  
and internal visual inspection.  
View BIOS event logs and POST  
messages.  
1
Download from Www.Somanuals.com. All Manuals Search And Download.  
   
TABLE 1-1  
Troubleshooting Flowchart (Continued)  
To perform this task  
Refer to this section  
View service processor logs and  
sensor information...  
...or view service processor logs and “Using IPMItool to View System Information” on  
sensor information.  
page 55  
Run SunVTS diagnostics  
Gathering Service Information  
The first step in determining the cause of a problem with the server is to gather  
information from the service-call paperwork or the onsite personnel. Use the  
following general guideline steps when you begin troubleshooting.  
To gather service information:  
1. Collect information about the following items:  
Events that occurred prior to the failure  
Whether any hardware or software was modified or installed  
Whether the server was recently installed or moved  
How long the server exhibited symptoms  
The duration or frequency of the problem  
2. Document the server settings before you make any changes.  
If possible, make one change at a time in order to isolate potential problems. In  
this way, you can maintain a controlled environment and reduce the scope of  
troubleshooting.  
3. Take note of the results of any change that you make. Include any errors or  
informational messages.  
4. Check for potential device conflicts before you add a new device.  
5. Check for version dependencies, especially with third-party software.  
2
Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008  
Download from Www.Somanuals.com. All Manuals Search And Download.  
   
System Inspection  
Controls that have been improperly set and cables that are loose or improperly  
connected are common causes of problems with hardware components.  
Troubleshooting Power Problems  
If the server will power on, skip this section and go to “Externally Inspecting the  
If the server will not power on, check the following:  
1. Check that AC power cords are attached firmly to the server’s power supplies  
and to the AC sources.  
2. Check that the main cover is firmly in place.  
There is an intrusion switch on the motherboard that automatically shuts down  
the server power to standby mode when the cover is removed.  
Externally Inspecting the Server  
To perform a visual inspection of the external system:  
1. Inspect the external status indicator LEDs, which can indicate component  
malfunction.  
For the LED locations and descriptions of their behavior, see “External Status  
2. Verify that nothing in the server environment is blocking air flow or making a  
contact that could short out power.  
3. If the problem is not evident, continue with the next section, “Internally  
Chapter 1 Initial Inspection of the Server  
Download from Www.Somanuals.com. All Manuals Search And Download.  
3
         
Internally Inspecting the Server  
To perform a visual inspection of the internal system:  
1. Choose a method for shutting down the server from main power mode to  
standby power mode. See FIGURE 1-1 and FIGURE 1-2.  
Graceful shutdown – Use a ballpoint pen or other stylus to press and release  
the Power button on the front panel. This causes Advanced Configuration and  
Power Interface (ACPI) enabled operating systems to perform an orderly  
shutdown of the operating system. Servers not running ACPI-enabled  
operating systems will shut down to standby power mode immediately.  
Emergency shutdown – Use a ballpoint pen or other stylus to press and hold  
the Power button for four seconds to force main power off and enter standby  
power mode.  
Caution – Performing an emergency shutdown can cause open files to become  
corrupt. Use an emergency shutdown only when necessary.  
When main power is off, the Power/OK LED on the front panel will begin  
flashing, indicating that the server is in standby power mode.  
Caution – When you use the Power button to enter standby power mode, power is  
still directed to service processor and power supply fans, indicated when the  
Power/OK LED is flashing. To completely power off the server, you must disconnect  
the AC power cords from the back panel of the server.  
FIGURE 1-1 X4140 Server Front Panel  
Locate Button/LED  
PowerButton  
4
Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008  
Download from Www.Somanuals.com. All Manuals Search And Download.  
         
FIGURE 1-2 X4440 Server Front Panel  
Locate Button/LED  
Power Button  
2. Remove the server cover.  
For instructions on removing the server cover, refer to your server’s service  
manual.  
3. Inspect the internal status indicator LEDs. These can indicate component  
malfunction.  
For the LED locations and descriptions of their behavior, see “Internal Status  
Note – The server must be in standby power mode for viewing the internal LEDs.  
You can hold down the Locate button on the server back panel or front panel for  
5 seconds to initiate a “push-to-test” mode that illuminates all other LEDs both  
inside and outside of the chassis for 15 seconds.  
4. Verify that there are no loose or improperly seated components.  
5. Verify that all cable connectors inside the system are firmly and correctly  
attached to their appropriate connectors.  
6. Verify that any after-factory components are qualified and supported.  
For a list of supported PCI cards and DIMMs, refer to your server’s service  
manual.  
7. Check that the installed DIMMs comply with the supported DIMM population  
rules and configurations, as described in “DIMM Population Rules” on page 11.  
8. Replace the server cover.  
9. To restore the server to main power mode (all components powered on), use a  
ballpoint pen or other stylus to press and release the Power button on the  
server front panel. See FIGURE 1-1 and FIGURE 1-2.  
When main power is applied to the full server, the Power/OK LED next to the  
Power button lights and remains lit.  
Chapter 1 Initial Inspection of the Server  
Download from Www.Somanuals.com. All Manuals Search And Download.  
5
   
10. If the problem with the server is not evident, you can obtain additional  
information by viewing the power-on self test (POST) messages and BIOS  
event logs during system startup. Continue with “Viewing Event Logs” on  
6
Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008  
Download from Www.Somanuals.com. All Manuals Search And Download.  
CHAPTER  
2
Using SunVTS Diagnostic Software  
This chapter contains information about the SunVTS™ diagnostic software tool.  
Running SunVTS Diagnostic Tests  
The servers are shipped with a Bootable Diagnostics CD that contains the Sun  
Validation Test Suite (SunVTS) software.  
SunVTS provides a comprehensive diagnostic tool that tests and validates Sun  
hardware by verifying the connectivity and functionality of most hardware  
controllers and devices on Sun platforms. SunVTS software can be tailored with  
modifiable test instances and processor affinity features.  
The following tests are supported on x86 platforms:  
CD DVD Test (cddvdtest)  
CPU Test (cputest)  
Cryptographics Test (cryptotest)  
Disk and Diskette Drives Test (disktest)  
Data Translation Look-aside Buffer (dtlbtest)  
Emulex HBA Test (emlxtest)  
Floating Point Unit Test (fputest)  
InfiniBand Host Channel Adapter Test (ibhcatest)  
Level 1 Data Cache Test (l1dcachetest)  
Level 2 SRAM Test (l2sramtest)  
Ethernet Loopback Test (netlbtest)  
Network Hardware Test (nettest)  
Physical Memory Test (pmemtest)  
7
Download from Www.Somanuals.com. All Manuals Search And Download.  
         
QLogic Host Bus Adapter Test (qlctest)  
RAM Test (ramtest)  
Serial Port Test (serialtest)  
System Test (systest)  
Tape Drive Test (tapetest)  
Universal Serial Board Test (usbtest)  
Virtual Memory Test (vmemtest)  
SunVTS software has a sophisticated graphical user interface (GUI) that provides  
test configuration and status monitoring. The user interface can be run on one  
system to display the SunVTS testing of another system on the network. SunVTS  
software also provides a TTY-mode interface for situations in which running a GUI  
is not possible.  
SunVTS Documentation  
For the most up-to-date information on SunVTS software, go to:  
Diagnosing Server Problems With the Bootable  
Diagnostics CD  
SunVTS 6.4 or later software is preinstalled on your server. The server is also  
shipped with the Bootable Diagnostics CD. This CD is designed so that the server  
will boot from the CD. This CD boots and starts SunVTS software. Diagnostic tests  
run and write output to log files that the service technician can use to determine the  
problem with the server.  
Requirements  
To use the diagnostics CD you must have a keyboard, mouse, and monitor  
attached to the server on which you are performing diagnostics, or available  
through a remote KVM.  
8
Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008  
Download from Www.Somanuals.com. All Manuals Search And Download.  
           
Using the Bootable Diagnostics CD  
To use the diagnostics CD to perform diagnostics:  
1. With the server powered on, insert the CD into the DVD-ROM drive.  
2. Reboot the server, and press F2 during the start of the reboot so that you can  
change the BIOS setting for boot-device priority.  
3. When the BIOS Main menu appears, navigate to the BIOS Boot menu.  
Instructions for navigating within the BIOS screens appear on the BIOS screens.  
4. On the BIOS Boot menu screen, select Boot Device Priority.  
The Boot Device Priority screen appears.  
5. Select the DVD-ROM drive to be the primary boot device.  
6. Save and exit the BIOS screens.  
7. Reboot the server.  
When the server reboots from the CD in the DVD-ROM drive, the Solaris  
Operating System boots and SunVTS software starts and opens its first GUI  
window.  
8. In the SunVTS GUI, press Enter or click the Start button when you are  
prompted to start the tests.  
The test suite will run until it encounters an error or the test is completed.  
Note – The CD will take approximately nine minutes to boot.  
9. When SunVTS software completes the test, review the log files generated  
during the test.  
SunVTS provides access to four different log files:  
SunVTS test error log contains time-stamped SunVTS test error messages. The  
log file path name is /var/opt/SUNWvts/logs/sunvts.err. This file is not  
created until a SunVTS test failure occurs.  
SunVTS kernel error log contains time-stamped SunVTS kernel and SunVTS  
probe errors. SunVTS kernel errors are errors that relate to running SunVTS,  
and not to testing of devices. The log file path name is  
/var/opt/SUNWvts/logs/vtsk.err. This file is not created until SunVTS  
reports a SunVTS kernel error.  
SunVTS information log contains informative messages that are generated  
when you start and stop the SunVTS test sessions. The log file path name is  
/var/opt/SUNWvts/logs/sunvts.info. This file is not created until a  
SunVTS test session runs.  
Chapter 2 Using SunVTS Diagnostic Software  
Download from Www.Somanuals.com. All Manuals Search And Download.  
9
   
Solaris system message log is a log of all the general Solaris events logged by  
syslogd. The path name of this log file is /var/adm/messages.  
a. Click the Log button.  
The Log file window is displayed.  
b. Specify the log file that you want to view by selecting it from the Log file  
window.  
The content of the selected log file is displayed in the window.  
c. With the three lower buttons you can perform the following actions:  
Print the log file – A dialog box appears for you to specify your printer  
options and printer name.  
Delete the log file – The file remains on the display, but it will not be  
available the next time you try to display it.  
Close the Log file window – The window is closed.  
Note – If you want to save the log files: When you use the Bootable Diagnostics  
CD, the server boots from the CD. Therefore, the test log files are not on the server’s  
hard disk drive and they will be deleted when you power cycle the server. To save  
the log files, you must save them to a removable media device or FTP them to  
another system.  
10  
Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008  
Download from Www.Somanuals.com. All Manuals Search And Download.  
CHAPTER  
3
Troubleshooting DIMM Problems  
This chapter describes how to detect and correct problems with the server’s Dual  
Inline Memory Modules (DIMM)s. It includes the following sections:  
DIMM Population Rules  
The DIMM population rules for the server are as follows:  
Each CPU can support a maximum of eight DIMMs.  
The DIMM slots are paired and the DIMMs must be installed in pairs (0-1, 2-3, 4-  
5, and 6-7). See FIGURE 3-1 and FIGURE 3-2. The memory sockets are colored black  
or white to indicate which slots are paired by matching colors.  
DIMMs are populated starting from the outside (away from the CPU) and  
working toward the inside.  
CPUs with only a single pair of DIMMs must have those DIMMs installed in that  
CPU’s outside white DIMM slots (6 and 7). See FIGURE 3-1 and FIGURE 3-2.  
Only DDR2 800 Mhz, 667Mhz, and 533Mhz DIMMs are supported.  
Each pair of DIMMs must be identical (same manufacturer, size, and speed).  
11  
Download from Www.Somanuals.com. All Manuals Search And Download.  
         
DIMM Replacement Policy  
Replace a DIMM when one of the following events takes place:  
The DIMM fails memory testing under BIOS due to Uncorrectable Memory Errors  
(UCEs).  
UCEs occur and investigation shows that the errors originated from memory.  
In addition, a DIMM should be replaced whenever more than 24 Correctable  
Errors (CEs) originate in 24 hours from a single DIMM and no other DIMM is  
showing further CEs.  
If more than one DIMM has experienced multiple CEs, other possible causes of  
CEs have to be ruled out by a qualified Sun Support specialist before replacing  
any DIMMs.  
Retain copies of the logs showing the memory errors per the above rules to send to  
Sun for verification prior to calling Sun.  
How DIMM Errors Are Handled by the  
System  
This section describes system behavior for the two types of DIMM errors: UCEs and  
CEs, and also describes BIOS DIMM error messages.  
Uncorrectable DIMM Errors  
For all operating systems (OS’s), the behavior is the same for UCEs:  
1. When an UCE occurs, the memory controller causes an immediate reboot of the  
system.  
2. During reboot, the BIOS checks the Machine Check registers and determines that  
the previous reboot was due to an UCE, then reports this in POST after the  
memtest stage:  
A Hypertransport Sync Flood occurred on last boot  
12  
Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008  
Download from Www.Somanuals.com. All Manuals Search And Download.  
       
3. BIOS reports this event in the service processor’s system event log (SEL) as  
shown in the sample IPMItool output below:  
# ipmitool -H 10.6.77.249 -U root -P changeme -I lanplus sel list  
8 | 09/25/2007 | 03:22:03 | System Boot Initiated #0x02 | Initiated by warm  
reset | Asserted  
9 | 09/25/2007 | 03:22:03 | Processor #0x04 | Presence detected | Asserted  
a | 09/25/2007 | 03:22:03 | OEM #0x12 | | Asserted  
b | 09/25/2007 | 03:22:03 | System Event #0x12 | Undetermined system hardware  
failure | Asserted  
c | OEM record e0 | 00000002000000000029000002  
d | OEM record e0 | 00000004000000000000b00006  
e | OEM record e0 | 00000048000000000011110322  
f | OEM record e0 | 00000058000000000000030000  
10 | OEM record e0 | 000100440000000000fefff000  
11 | OEM record e0 | 00010048000000000000ff3efa  
12 | OEM record e0 | 10ab0000000010000006040012  
13 | OEM record e0 | 10ab0000001111002011110020  
14 | OEM record e0 | 0018304c00f200002000020c0f  
15 | OEM record e0 | 0019304c00f200004000020c0f  
16 | OEM record e0 | 001a304c00f45aa10015080a13  
17 | OEM record e0 | 001a3054000000000320004880  
18 | OEM record e0 | 001b304c00f200001000020c0f  
19 | OEM record e0 | 80000002000000000029000002  
1a | OEM record e0 | 80000004000000000000b00006  
1b | OEM record e0 | 80000048000000000011110322  
1c | OEM record e0 | 80000058000000000000030000  
1d | OEM record e0 | 800100440000000000fefff000  
1e | OEM record e0 | 80010048000000000000ff3efa  
1f | 09/25/2007 | 03:22:06 | System Boot Initiated #0x03 | Initiated by warm  
reset | Asserted  
20 | 09/25/2007 | 03:22:06 | Processor #0x04 | Presence detected | Asserted  
21 | 09/25/2007 | 03:22:15 | System Firmware Progress #0x01 | Memory  
initialization | Asserted  
22 | 09/25/2007 | 03:22:16 | Memory | Uncorrectable ECC | Asserted | CPU 2 DIMM 0  
23 | 09/25/2007 | 03:22:16 | Memory | Uncorrectable ECC | Asserted | CPU 2 DIMM 1  
24 | 09/25/2007 | 03:22:16 | Memory | Memory Device Disabled | Asserted | CPU  
2 DIMM 0  
25 | 09/25/2007 | 03:22:16 | Memory | Memory Device Disabled | Asserted | CPU  
2 DIMM 1  
Chapter 3 Troubleshooting DIMM Problems  
Download from Www.Somanuals.com. All Manuals Search And Download.  
13  
The lines in the display start with event numbers (in hex), followed by a description  
of the event. TABLE 3-1 describes the contents of the display:  
TABLE 3-1  
Lines in IPMI Output  
Event (hex)  
Description  
8
UCE caused a Hypertransport sync flood which lead to system's warm  
reset. #0x02 refers to a reboot count maintained since the last AC power  
reset.  
9
BIOS detected and initiated 4 processors in system.  
BIOS detected a Sync Flood caused this reboot.  
BIOS detected a hardware error caused the Sync Flood.  
a
b
c to 1e  
BIOS retrieved and reported some hardware evidence, including all  
processors' Machine Check Error registers (events 14 to 18).  
1f  
After BIOS detected that a UCE had occurred, it located the DIMM and  
reset. 0x03 refers to reboot count.  
21 to 25  
BIOS off-lined faulty DIMMs from system memory space and reported  
them. Each DIMM of a pair is being reported, since hardware UCE  
evidence cannot lead BIOS any further than detection of a faulty pair.  
Correctable DIMM Errors  
If a DIMM has 24 or more correctable errors in 24 hours, it is considered defective  
and should be replaced.  
At this time, CEs are not logged in the server’s system event logs. They are reported  
or handled in the supported OS’s as follows:  
Windows Server:  
a. A Machine Check error-message bubble appears on the task bar.  
b. The user must manually open Event Viewer to view errors. Access Event  
Viewer through this menu path:  
Start-->Administration Tools-->Event Viewer  
c. The user can then view individual errors (by time) to see details of the error.  
Solaris:  
Solaris FMA reports and (sometimes) retires memory with correctable Error  
Correction Code (ECC) errors. See your Solaris Operating System documentation  
for details. Use the command:  
fmdump -eV  
14  
Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008  
Download from Www.Somanuals.com. All Manuals Search And Download.  
 
to view ECC errors  
Linux:  
The HERD utility can be used to manage DIMM errors in Linux. See the x64  
Servers Utilities Reference Manual for details.  
If HERD is installed, it copies messages from /dev/mcelogto  
/var/log/messages.  
If HERD is not installed, a program called mcelog copies messages from  
/dev/mcelog to /var/log/mcelog.  
The Bootable Diagnostics CD described in Chapter 2 also captures and logs CEs.  
BIOS DIMM Error Messages  
The BIOS displays and logs the following DIMM error messages:  
NODE-n Memory Configuration Mismatch  
The following conditions will cause this error message:  
The DIMMs mode is not paired (running in 64-bit mode instead of 128-bit  
mode).  
The DIMMs’ speed is not same.  
The DIMMs do not support ECC.  
The DIMMs are not registered.  
The MCT stopped due to errors in the DIMM.  
The DIMM module type (buffer) is mismatched.  
The DIMM generation (I or II) is mismatched.  
The DIMM CL/T is mismatched.  
The banks on a two-sided DIMM are mismatched.  
The DIMM organization is mismatched (128-bit).  
The SPD is missing Trc or Trfc information.  
DIMM Fault LEDs  
When you press the Press to See Fault button on the motherboard or the mezzanine  
board, LEDs next to the DIMMs flash to indicate that the system has detected 24 or  
more CEs in a 24-hour period on that DIMM.  
Chapter 3 Troubleshooting DIMM Problems  
Download from Www.Somanuals.com. All Manuals Search And Download.  
15  
       
Note – The DIMM Fault and Motherboard Fault LEDs operate on stored power for  
up to a minute when the system is powered down, even after the AC power is  
disconnected, and the motherboard (or mezzanine board) is out of the system. The  
stored power lasts for about half an hour.  
Note – Disconnecting the AC power removes the fault indication. To recover fault  
information look in the SP SEL, as described in the Sun Integrated Lights Out Manager  
2.0 User's Guide.  
DIMM fault LED is off – The DIMM is operating properly.  
DIMM fault LED is flashing (amber) – At least one of the DIMMs in this DIMM  
pair has reported 24 CEs within a 24-hour period.  
Motherboard Fault LED on mezzanine is on – There is a fault on the motherboard.  
This LED is there because you cannot see the motherboard LEDs when the  
mezzanine board is present.  
Note – The Motherboard Fault LED operates independently of the Press to See Fault  
button, and does not operate on stored power.  
See FIGURE 3-1 for the locations of DIMMs and LEDs on the motherboard. See  
FIGURE 3-2 for the locations of DIMMs and LEDs on the mezzanine board.  
16  
Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008  
Download from Www.Somanuals.com. All Manuals Search And Download.  
FIGURE 3-1 DIMMs and LEDs on Motherboard  
Chapter 3 Troubleshooting DIMM Problems  
Download from Www.Somanuals.com. All Manuals Search And Download.  
17  
FIGURE 3-2 DIMMs and LEDs on Mezzanine Board  
Isolating and Correcting DIMM ECC  
Errors  
If your log files report an ECC error or a problem with a DIMM, complete the steps  
below until you can isolate the fault.  
In this example, the log file reports an error with the DIMM in CPU0, slot 7. The  
fault LEDs on CPU0, slots 6 and 7 are on.  
To isolate and correct DIMM ECC errors:  
1. If you have not already done so, shut down your server to standby power mode  
and remove the cover.  
2. Inspect the installed DIMMs to ensure that they comply with the “DIMM  
18  
Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008  
Download from Www.Somanuals.com. All Manuals Search And Download.  
     
3. Press the PRESS TO SEE FAULT button, and inspect the DIMM fault LEDs. See  
FIGURE 3-1 and FIGURE 3-2.  
A flashing LED identifies a component with a fault.  
For CEs, the LEDs correctly identify the DIMM where the errors were detected.  
For UCEs, both LEDs in the pair flash if there is a problem with either DIMM  
in the pair.  
Note – If your server is equipped with a mezzanine board, the motherboard DIMMs  
and LEDs will be hidden beneath it. However, the Motherboard Fault LED lights to  
indicate that there is a problem on the motherboard (only while AC power is still  
connected). If the Motherboard Fault LED on the mezzanine board lights, remove  
the mezzanine board as described in your server’s service manual, and inspect the  
LEDs on the motherboard.  
4. Disconnect the AC power cords from the server.  
Caution – Before handling components, attach an ESD wrist strap to a chassis  
ground (any unpainted metal surface). The system’s printed circuit boards and hard  
disk drives contain components that are extremely sensitive to static electricity.  
Note – To recover fault information look in the SP SEL, as described in the Sun  
Integrated Lights Out Manager 2.0 User's Guide.  
5. Remove the DIMMs from the DIMM slots in the CPU.  
Refer to your server’s service manual for details.  
6. Visually inspect the DIMMs for physical damage, dust, or any other  
contamination on the connector or circuits.  
7. Visually inspect the DIMM slot for physical damage. Look for cracked or  
broken plastic on the slot.  
8. Dust off the DIMMs, clean the contacts, and reseat them.  
Caution – Use only compressed air to dust DIMMs.  
9. If there is no obvious damage, replace any failed DIMMs.  
For UCEs, if the LEDs indicate a fault with the pair, replace both DIMMs. Ensure  
that they are inserted correctly with ejector latches secured.  
10. Reconnect AC power cords to the server.  
Chapter 3 Troubleshooting DIMM Problems  
Download from Www.Somanuals.com. All Manuals Search And Download.  
19  
11. Power on the server and run the diagnostics test again.  
12. Review the log file.  
If the tests identify the same error, the problem is in the CPU, not the DIMMs.  
20  
Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008  
Download from Www.Somanuals.com. All Manuals Search And Download.  
APPENDIX  
A
Event Logs and POST Codes  
This appendix contains information about the BIOS event log, the BMC system event  
log, the power-on self-test (POST), and console redirection. It contains the following  
sections:  
Viewing Event Logs  
Use this procedure to view the BIOS event log and the BMC system event log.  
1. To turn on main power mode (all components powered on) if necessary, use a  
ball point pen or other stylus to press and release the Power button on the  
server front panel. See FIGURE 1-1.  
When main power is applied to the full server, the Power/OK LED next to the  
Power button lights and remains lit.  
2. Enter the BIOS Setup utility by pressing the F2 key while the system is  
performing the power-on self-test (POST).  
The BIOS Main menu screen is displayed.  
3. View the BIOS event log.  
a. From the BIOS Main Menu screen, select Advanced.  
The Advanced Settings screen is displayed:  
21  
Download from Www.Somanuals.com. All Manuals Search And Download.  
       
Main  
******************************************************************************  
* Advanced Settings * Configure CPU.  
* *************************************************** *  
* WARNING: Setting wrong values in below sections  
Advanced  
PCIPnP  
Boot  
Security  
Chipset  
Exit  
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
* *  
* **  
*
*
may cause system to malfunction.  
* * CPU Configuration  
* * IDE Configuration  
* * Hyper Transport Configuration  
* * ACPI Configuration  
* * Event Log Configuration  
* * IPMI 2.0 Configuration  
* * MPS Configuration  
* * PCI Express Configuration  
Select Screen  
Select Item  
* * Remote Access Configuration  
* * USB Configuration  
* Enter Go to Sub Screen *  
*
*
*
*
*
* F1  
General Help  
*
*
*
*
*
* F10 Save and Exit  
* ESC Exit  
*
*
******************************************************************************  
v02.61 (C)Copyright 1985-2006, American Megatrends, Inc.  
22  
Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008  
Download from Www.Somanuals.com. All Manuals Search And Download.  
b. From the Advanced Settings screen, select Event Log Configuration.  
The Advanced Menu Event Logging Details screen is displayed.  
Advanced  
******************************************************************************  
* Event Logging details * View all unread events *  
* *************************************************** * on the Event Log.  
*
*
*
*
*
*
*
*
*
*
*
*
*
* View Event Log  
* Mark all events as read  
* Clear Event Log  
*
*
*
*
*
*
*
*
*
*
* *  
* **  
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
Select Screen  
Select Item  
* Enter Go to Sub Screen *  
* F1  
General Help  
*
*
*
*
*
* F10 Save and Exit  
* ESC Exit  
*
*
******************************************************************************  
v02.61 (C)Copyright 1985-2006, American Megatrends, Inc.  
c. From the Event Logging Details screen, select View Event Log.  
All unread events are displayed.  
4. View the BMC system event log:  
a. From the BIOS Main Menu screen, select Advanced.  
The Advanced Settings screen is displayed.  
b. From the Advanced Settings screen, select IPMI 2.0 Configuration.  
The Advanced Menu IPMI 2.0 Configuration screen is displayed:  
Appendix A Event Logs and POST Codes  
23  
Download from Www.Somanuals.com. All Manuals Search And Download.  
Advanced  
******************************************************************************  
* IPMI 2.0 Configuration  
* View all events in the *  
* *************************************************** * BMC Event Log.  
*
*
*
*
*
*
*
*
*
*
*
*
*
* Status Of BMC  
Working  
*
* * View BMC System Event Log  
* It will take up to  
* Reload BMC System Event Log  
* 60 Seconds approx.  
* Clear BMC System Event Log  
* to read all  
* * LAN Configuration  
* BMC SEL records.  
* * PEF Configuration  
* BMC Watch Dog Timer Action  
*
*
*
*
*
[Disabled]  
*
*
*
*
*
*
*
*
*
*
*
* *  
* **  
Select Screen  
Select Item  
* Enter Go to Sub Screen *  
* F1  
* F10 Save and Exit  
* ESC Exit  
*
*
General Help  
*
*
*
*
*
******************************************************************************  
v02.61 (C)Copyright 1985-2006, American Megatrends, Inc.  
c. From the IPMI 2.0 Configuration screen, select View BMC System Event Log.  
The log takes about 60 seconds to generate, then it is displayed on the screen.  
5. If the problem with the server is not evident, continue with “Using the ILOM  
24  
Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008  
Download from Www.Somanuals.com. All Manuals Search And Download.  
Power-On Self-Test (POST)  
The system BIOS provides a rudimentary power-on self-test. The basic devices  
required for the server to operate are checked, memory is tested, the LSI 1064 disk  
controller and attached disks are probed and enumerated, and the two Intel dual  
Gigabit Ethernet controllers are initialized.  
The progress of the self-test is indicated by a series of POST codes. These codes are  
displayed at the bottom right corner of the system’s VGA screen (once the self-test  
has progressed far enough to initialize the system video). However, the codes are  
displayed as the self-test runs and scroll off of the screen too quickly to be read. An  
alternate method of displaying the POST codes is to redirect the output of the  
console to a serial port (see “Redirecting Console Output” on page 26).  
How BIOS POST Memory Testing Works  
The BIOS POST memory testing is performed as follows:  
1. The first megabyte of DRAM is tested by the BIOS before the BIOS code is  
shadowed (that is, copied from ROM to DRAM).  
2. Once executing out of DRAM, the BIOS performs a simple memory test (a  
write/read of every location with the pattern 55aa55aa).  
Note – Enabling Quick Boot causes the BIOS to skip the memory test. See  
Note – Because the server can contain up to 64 MB of memory (128 MB for the  
X4440), the memory test can take several minutes. You can cancel POST testing by  
pressing any key during POST.  
3. The BIOS polls the memory controllers for both correctable and uncorrectable  
memory errors and logs those errors into the service processor.  
Appendix A Event Logs and POST Codes  
Download from Www.Somanuals.com. All Manuals Search And Download.  
25  
       
Redirecting Console Output  
Use the following instructions to access the service processor and redirect the  
console output so that the BIOS POST codes can be read.  
1. Initialize the BIOS Setup utility by pressing the F2 key while the system is  
performing the power-on self-test (POST).  
The BIOS Main menu screen is displayed.  
2. Select the Advanced menu tab.  
The Advanced Settings screen is displayed.  
3. Select IPMI 2.0 Configuration.  
The IPMI 2.0 Configuration screen is displayed.  
4. Select the LAN Configuration menu item.  
The LAN Configuration screen displays the service processor’s IP address.  
5. To configure the service processor’s IP address (optional):  
a. Select the IP Assignment option that you want to use (DHCPor Static).  
If you choose DHCP, the server’s IP address is retrieved from your network’s  
DHCP server and displayed using the following format:  
Current IP address in BMC : xxx.xxx.xxx.xxx  
If you choose Staticto assign the IP address manually, perform the  
following steps:  
i. Type the IP address in the IP Address field.  
You can also enter the subnet mask and default gateway settings in their  
respective fields.  
ii. Select Commit and press Return to commit the changes.  
iii. Select Refresh and press Return to see your new settings displayed in the  
Current IP address in BMCfield.  
6. Start a web browser and type the service processor’s IP address in the  
browser’s URL field.  
7. When you are prompted for a user name and password, type the following:  
User Name: root  
Password: changeme  
The Sun Integrated Lights Out Manager main GUI screen is displayed.  
8. Click the Remote Control tab.  
9. Click the Redirection tab.  
26  
Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008  
Download from Www.Somanuals.com. All Manuals Search And Download.  
   
10. Set the color depth for the redirection console at either 6 or 8 bits.  
11. Click the Start Redirection button.  
12. When you are prompted for a user name and password, type the following:  
User Name: root  
Password: changeme  
The current POST screen is displayed.  
Appendix A Event Logs and POST Codes  
Download from Www.Somanuals.com. All Manuals Search And Download.  
27  
Changing POST Options  
These instructions are optional, but you can use them to change the operations that  
the server performs during POST testing. To change POST options:  
1. Initialize the BIOS Setup utility by pressing the F2 key while the system is  
performing the power-on self-test (POST).  
The BIOS Main menu screen is displayed.  
2. Select Boot.  
The Boot Settings screen is displayed.  
Main  
******************************************************************************  
* Boot Settings * Configure Settings  
* *************************************************** * during System Boot.  
Advanced  
PCIPnP  
Boot  
Security  
Chipset  
Exit  
*
*
*
*
*
*
*
*
*
*
*
* * Boot Settings Configuration  
*
* * Boot Device Priority  
* * Hard Disk Drives  
*
*
*
*
*
*
*
* *  
* **  
* * CD/DVD Drives  
*
*
*
*
*
*
*
*
*
*
Select Screen  
Select Item  
* Enter Go to Sub Screen *  
* F1  
General Help  
*
*
*
*
*
* F10 Save and Exit  
* ESC Exit  
*
*
******************************************************************************  
v02.61 (C)Copyright 1985-2006, American Megatrends, Inc.  
28  
Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008  
Download from Www.Somanuals.com. All Manuals Search And Download.  
   
3. Select Boot Settings Configuration.  
The Boot Settings Configuration screen is displayed.  
Boot  
******************************************************************************  
**  
* Boot Settings Configuration  
* *************************************************** * certain tests while  
* Quick Boot  
* Quiet Boot  
* AddOn ROM Display Mode  
* Bootup Num-Lock  
* Wait For 'F1' If Error  
* Interrupt 19 Capture  
*
*
*
*
*
*
*
*
*
*
*
*
* Allows BIOS to skip  
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
[Disabled]  
[Disabled]  
[Force BIOS]  
[On]  
[Disabled]  
[Enabled]  
* booting. This will  
* decrease the time  
* needed to boot the  
* system.  
*
*
*
*
*
*
* *  
Select Screen  
Select Item  
Change Option  
General Help  
* **  
* +-  
* F1  
* F10 Save and Exit  
* ESC Exit  
*
*
******************************************************************************  
**  
v02.61 (C)Copyright 1985-2006, American Megatrends, Inc.  
4. On the Boot Settings Configuration screen, there are several options that you  
can enable or disable:  
Quick Boot – This option is disabled by default. If you enable this, the BIOS  
skips certain tests while booting, such as the extensive memory test. This  
decreases the time it takes for the system to boot.  
Quiet Boot – This option is disabled by default. If you enable this, the Sun  
Microsystems logo is displayed instead of POST codes.  
Add On ROM Display Mode – This option is set to Force BIOS by default.  
This option has effect only if you have also enabled the Quiet Boot option, but  
it controls whether output from the Option ROM is displayed. The two settings  
for this option are as follows:  
Force BIOS – Remove the Sun logo and display Option ROM output.  
Keep Current – Do not remove the Sun logo. The Option ROM output is not  
displayed.  
Appendix A Event Logs and POST Codes  
Download from Www.Somanuals.com. All Manuals Search And Download.  
29  
Boot Num-Lock – This option is On by default (keyboard Num-Lock is turned  
on during boot). If you set this to off, the keyboard Num-Lock is not turned on  
during boot.  
Wait for F1 if Error – This option is disabled by default. If you enable this, the  
system will pause if an error is found during POST and will only resume when  
you press the F1 key.  
Interrupt 19 Capture – This option is reserved for future use. Do not change.  
Default Boot Order – The letters in the brackets represent the boot devices. To  
see the letters defined, position your cursor over the field and read the  
definition in the right side of the screen.  
30  
Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008  
Download from Www.Somanuals.com. All Manuals Search And Download.  
POST Codes  
TABLE A-1 contains descriptions of each of the POST codes, listed in the same order  
in which they are generated. These POST codes appear as a four-digit string that is a  
combination of two-digit output from primary I/O port 80 and two-digit output  
from secondary I/O port 81. In the POST codes listed in TABLE A-1, the first two  
digits are from port 81 and the last two digits are from port 80.  
TABLE A-1 POST Codes  
Post Code  
00d0  
Description  
Coming out of POR, PCI configuration space initialization, enabling 8111’s SMBus.  
Disable cache, full memory sizing, and verify that flat mode is enabled.  
Memory detections and sizing in boot block, cache disabled, IO APIC enabled.  
Test base 512KB memory. Adjust policies and cache first 8MB.  
Bootblock code is copied from ROM to lower RAM. BIOS is now executing out of RAM.  
00d2  
00d3  
01d4  
01d5  
01d6  
Key sequence and OEM specific method is checked to determine if BIOS recovery is  
forced. If next code is E0, BIOS recovery is being executed. Main BIOS checksum is tested.  
01d7  
Restoring CPUID; moving bootblock-runtime interface module to RAM; determine  
whether to execute serial flash.  
01d8  
01d9  
01da  
0004  
Uncompressing runtime module into RAM. Storing CPUID information in memory.  
Copying main BIOS into memory.  
Giving control to BIOS POST.  
Check CMOS diagnostic byte to determine if battery power is OK and CMOS checksum is  
OK. If the CMOS checksum is bad, update CMOS with power-on default values.  
00c2  
Set up boot strap processor for POST. This includes frequency calculation, loading BSP  
microcode, and applying user requested value for GART Error Reporting setup question.  
00c3  
00c6  
Errata workarounds applied to the BSP (#78 & #110).  
Re-enable cache for boot strap processor, and apply workarounds in the BSP for errata  
#106, #107, #69, and #63 if appropriate.  
00c7  
000a  
000c  
000e  
HT sets link frequencies and widths to their final values.  
Initializing the 8042 compatible Keyboard Controller.  
Detecting the presence of Keyboard in KBC port.  
Testing and initialization of different Input Devices. Traps the INT09h vector, so that the  
POST INT09h handler gets control for IRQ1.  
8600  
Preparing CPU for booting to OS by copying all of the context of the BSP to all application  
processors present. NOTE: APs are left in the CLI HLT state.  
Appendix A Event Logs and POST Codes  
Download from Www.Somanuals.com. All Manuals Search And Download.  
31  
     
TABLE A-1 POST Codes (Continued)  
Post Code  
Description  
de00  
Preparing CPU for booting to OS by copying all of the context of the BSP to all application  
processors present. NOTE: APs are left in the CLI HLT state.  
8613  
Initialize PM regs and PM PCI regs at Early-POST. Initialize multi-host bridge, if system  
supports it. Setup ECC options before memory clearing. Enable PCI-X clock lines in the  
8131.  
0024  
862a  
002a  
042a  
052a  
122a  
152a  
252a  
202c  
Uncompress and initialize any platform specific BIOS modules.  
BBS ROM initialization.  
Generic Device Initialization Manager (DIM) - Disable all devices.  
ISA PnP devices - Disable all devices.  
PCI devices - Disable all devices.  
ISA devices - Static device initialization.  
PCI devices - Static device initialization.  
PCI devices - Output device initialization.  
Initializing different devices. Detecting and initializing the video adapter installed in the  
system that have optional ROMs.  
002e  
0033  
0037  
Initializing all the output devices.  
Initializing the silent boot module. Set the window for displaying text information.  
Displaying sign-on message, CPU information, setup key message, and any OEM specific  
information.  
4538  
5538  
8600  
PCI devices - IPL device initialization.  
PCI devices - General device initialization.  
Preparing CPU for booting to OS by copying all of the context of the BSP to all application  
processors present. NOTE: APs are left in the CLI HLT state.  
32  
Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008  
Download from Www.Somanuals.com. All Manuals Search And Download.  
POST Code Checkpoints  
The POST code checkpoints are the largest set of checkpoints during the BIOS pre-  
boot process. TABLE A-2 describes the type of checkpoints that might occur during  
the POST portion of the BIOS. These two-digit checkpoints are the output from  
primary I/O port 80.  
TABLE A-2 POST Code Checkpoints  
Post Code  
Description  
03  
Disable NMI, Parity, video for EGA, and DMA controllers. At this point, only ROM  
accesses go to the GPNV. If BB size is 64K, turn on ROM Decode below FFFF0000h. It  
should allow USB to run in the E000 segment. The HT must program the NB specific  
initialization and OEM specific initialization, and can program if it need be at beginning of  
BIOS POST, similar to overriding the default values of kernel variables.  
04  
Check CMOS diagnostic byte to determine if battery power is OK and CMOS checksum is  
OK. Verify CMOS checksum manually by reading storage area. If the CMOS checksum is  
bad, update CMOS with power-on default values and clear passwords. Initialize status  
register A. Initialize data variables that are based on CMOS setup questions. Initialize both  
the 8259-compatible PICs in the system.  
05  
06  
Initialize the interrupt controlling hardware (generally PIC) and interrupt vector table.  
Do R/W test to CH-2 count reg. Initialize CH-0 as system timer. Install the POSTINT1Ch  
handler. Enable IRQ-0 in PIC for system timer interrupt. Traps INT1Ch vector to  
“POSTINT1ChHandlerBlock.”  
C0  
C1  
C2  
Early CPU Init Start--Disable Cache--Init Local APIC.  
Set up boot strap processor information.  
Set up boot strap processor for POST. This includes frequency calculation, loading BSP  
microcode, and applying user requested value for GART Error Reporting setup question.  
C3  
C5  
Errata workarounds applied to the BSP (#78 & #110).  
Enumerate and set up application processors. This includes microcode loading and  
workarounds for errata (#78, #110, #106, #107, #69, #63).  
C6  
Re-enable cache for boot strap processor, and apply workarounds in the BSP for errata  
#106, #107, #69, and #63 if appropriate. In case of mixed CPU steppings, errors are sought  
and logged, and an appropriate frequency for all CPUs is found and applied. NOTE: APs  
are left in the CLI HLT state.  
C7  
The HT sets link frequencies and widths to their final values. This routine gets called after  
CPU frequency has been calculated to prevent bad programming.  
0A  
0B  
0C  
Initializes the 8042 compatible Keyboard Controller.  
Detects the presence of PS/2 mouse.  
Detects the presence of Keyboard in KBC port.  
Appendix A Event Logs and POST Codes  
Download from Www.Somanuals.com. All Manuals Search And Download.  
33  
     
TABLE A-2 POST Code Checkpoints (Continued)  
Post Code  
Description  
0E  
Testing and initialization of different Input Devices. Also, update the Kernel Variables.  
Traps the INT09h vector, so that the POST INT09h handler gets control for IRQ1.  
Uncompress all available language, BIOS logo, and Silent logo modules.  
13  
20  
Initialize PM regs and PM PCI regs at Early-POST, Initialize multi-host bridge, if system  
will support it. Setup ECC options before memory clearing. REDIRECTION causes  
corrected data to written to RAM immediately. CHIPKILL provides 4 bit error det/corr of  
x4 type memory. Enable PCI-X clock lines in the 8131.  
Relocate all the CPUs to a unique SMBASE address. The BSP will be set to have its entry  
point at A000:0. If less than 5 CPU sockets are present on a board, subsequent CPUs entry  
points will be separated by 8000h bytes. If more than 4 CPU sockets are present, entry  
points are separated by 200h bytes. CPU module will be responsible for the relocation of  
the CPU to correct address. NOTE: APs are left in the INIT state.  
24  
Uncompress and initialize any platform-specific BIOS modules.  
Initialize System Management Interrupt.  
30  
2A  
2C  
Initializes different devices through DIM.  
Initializes different devices. Detects and initializes the video adapter installed in the  
system that have optional ROMs.  
2E  
31  
Initializes all the output devices.  
Allocate memory for ADM module and uncompress it. Give control to ADM module for  
initialization. Initialize language and font modules for ADM. Activate ADM module.  
33  
37  
Initializes the silent boot module. Set the window for displaying text information.  
Displaying sign-on message, CPU information, setup key message, and any OEM specific  
information.  
38  
39  
3A  
3B  
Initializes different devices through DIM.  
Initializes DMAC-1 and DMAC-2.  
Initialize RTC date/time.  
Test for total memory installed in the system. Also, Check for DEL or ESC keys to limit  
memory test. Display total memory in the system.  
3C  
By this point, RAM read/write test is completed, program memory holes or handle any  
adjustments needed in RAM size with respect to NB. Test if HT Module found an error in  
BootBlock and CPU compatibility for MP environment.  
40  
50  
52  
Detect different devices (parallel ports, serial ports, and coprocessor in CPU,... etc.)  
successfully installed in the system and update the BDA, EBDA,... etc.  
Programming the memory hole or any kind of implementation that needs an adjustment  
in system RAM size if required.  
Updates CMOS memory size from memory found in memory test. Allocates memory for  
Extended BIOS Data Area from base memory.  
34  
Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008  
Download from Www.Somanuals.com. All Manuals Search And Download.  
TABLE A-2 POST Code Checkpoints (Continued)  
Post Code  
Description  
60  
75  
78  
7A  
7C  
84  
85  
87  
8C  
Initializes NUM-LOCK status and programs the KBD typematic rate.  
Initialize Int-13 and prepare for IPL detection.  
Initializes IPL devices controlled by BIOS and option ROMs.  
Initializes remaining option ROMs.  
Generate and write contents of ESCD in NVRam.  
Log errors encountered during POST.  
Displays errors to the user and gets the user response for error.  
Execute BIOS setup if needed/requested.  
After all device initialization is done, program any user selectable parameters relating to  
NB/SB, such as timing parameters, non-cacheable regions and the shadow RAM  
cacheability, and do any other NB/SB/PCIX/OEM specific programming needed during  
Late-POST. Background scrubbing for DRAM, and L1 and L2 caches are set up based on  
setup questions. Get the DRAM scrub limits from each node.  
8D  
8E  
90  
Build ACPI tables (if ACPI is supported).  
Program the peripheral parameters. Enable/Disable NMI as selected.  
Late POST initialization of system management interrupt.  
Check boot password if installed.  
A0  
A1  
A2  
Clean-up work needed before booting to OS.  
Takes care of runtime image preparation for different BIOS modules. Fills the free area in  
F000h segment with 0FFh. Initializes the Microsoft IRQ Routing Table. Prepares the  
runtime language module. Disables the system configuration display if needed.  
A4  
A7  
Initialize runtime language module.  
Displays the system configuration screen if enabled. Initializes the CPUs before boot,  
which includes the programming of the MTRRs.  
A8  
A9  
AA  
AB  
AC  
Prepare CPU for OS boot including final MTRR values.  
Wait for user input at configuration display if needed.  
Uninstall POST INT1Ch vector and INT09h vector. Deinitializes the ADM module.  
Prepare BBS for Int 19 boot.  
Any kind of Chipsets (NB/SB) specific programming needed during End- POST, just  
before giving control to runtime code booting to OS. Program the system BIOS (0F0000h  
shadow RAM) cacheability. Ported to handle any OEM specific programming needed  
during End-POST. Copy OEM specific data from POST_DSEG to RUN_CSEG.  
Appendix A Event Logs and POST Codes  
35  
Download from Www.Somanuals.com. All Manuals Search And Download.  
TABLE A-2 POST Code Checkpoints (Continued)  
Post Code  
Description  
B1  
00  
Save system context for ACPI.  
Prepares CPU for booting to OS by copying all of the context of the BSP to all application  
processors present. NOTE: APs are left in the CLI HLT state.  
61-70  
OEM POST Error. This range is reserved for chipset vendors and system manufacturers.  
The error associated with this value may be different from one platform to the next.  
36  
Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008  
Download from Www.Somanuals.com. All Manuals Search And Download.  
APPENDIX  
B
Status Indicator LEDs  
This appendix contains information about the locations and behavior of the LEDs on  
the server. It describes the external LEDs that can be viewed on the outside of the  
server and the internal LEDs that can be viewed only with the main cover removed.  
External Status Indicator LEDs  
See the following figures and tables for information about the LEDs that are  
viewable on the outside of the server.  
FIGURE B-1 shows and describes the front panel LEDs.  
FIGURE B-2 shows and describes the back panel LEDs.  
FIGURE B-3 shows and describes the hard drive LEDs.  
FIGURE B-4 and FIGURE B-5 show the location of the internal LEDs.  
37  
Download from Www.Somanuals.com. All Manuals Search And Download.  
       
Front Panel LEDs  
FIGURE B-1 Front Panel LEDs (X4140 shown)  
1
4
5
6
2
3
Figure Legend  
1
2
3
Locator LED/Locator button: White  
Service Required LED: Amber  
Power/OK LED: Green  
4
5
6
Rear PS LED: (Amber) Power supply fault  
System Over Temperature LED: (Amber)  
Top Fan LED: (Amber) Service action required on fan(s)  
Back Panel LEDs  
FIGURE B-2 Back Panel LEDs (X4140 shown)  
5
1
4
2
3
Figure Legend  
1
Power Supply LEDs:  
3
4
5
Service Required LED  
Power Supply OK: Green  
Power Supply Fail: Amber  
AC OK: Green  
Power OK LED  
Ethernet Port LEDs  
Left side: Green indicates link activity  
Right side:  
2
Locator LED Button  
Green indicates link activity  
Amber indicates link is operating at less than maximum speed.  
38  
Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008  
Download from Www.Somanuals.com. All Manuals Search And Download.  
     
Hard Drive LEDs  
FIGURE B-3 Hard Drive LEDs  
1
2
3
Figure Legend  
1
2
3
Ready to remove LED: Blue – Service action is allowed  
Fault LED: Amber – Service action is required  
Status LED: Green – Blinks when data is being transferred  
Internal Status Indicator LEDs  
The server has internal status indicators on the motherboard, and on the mezzanine  
board. For motherboard locations, see FIGURE B-4. For mezzanine board locations, see  
FIGURE B-5.  
The DIMM Fault LEDs indicate a problem with the corresponding DIMM. They  
are located next to the DIMM ejector handles.  
When you press the Press to See Fault button, if there is a problem with a DIMM,  
the corresponding DIMM Fault LED flashes. See “DIMM Fault LEDs” on page 15  
for details.  
The CPU Fault LEDs indicate a problem with the corresponding CPU.  
When you press the Press to See Fault button, if there is a problem with a CPU,  
the corresponding CPU Fault LED flashes.  
Note – The DIMM Fault and Motherboard Fault LEDs operate on stored power for  
up to a minute when the system is powered down, even after the AC power is  
disconnected, and the motherboard (or mezzanine board) is out of the system. The  
stored power lasts for about half an hour.  
The Motherboard Fault LED on the mezzanine board indicates that there is a  
problem with the motherboard.  
Appendix B Status Indicator LEDs  
39  
Download from Www.Somanuals.com. All Manuals Search And Download.  
         
Note – The mezzanine board, when present, obscures part of the motherboard,  
including the LEDs. The Motherboard Fault LED indicates that one or more of the  
LEDs on the motherboard is active.  
FIGURE B-4 DIMMs and LEDs on Motherboard  
40  
Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008  
Download from Www.Somanuals.com. All Manuals Search And Download.  
FIGURE B-5 DIMMs and LEDs on Mezzanine Board  
Appendix B Status Indicator LEDs  
Download from Www.Somanuals.com. All Manuals Search And Download.  
41  
42  
Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008  
Download from Www.Somanuals.com. All Manuals Search And Download.  
APPENDIX  
C
Using the ILOM Service Processor  
GUI to View System Information  
This appendix contains information about using the Integrated Lights Out Manager  
(ILOM) Service processor (SP) GUI to view monitoring and maintenance information  
for your server.  
For more information on using the ILOM SP GUI to maintain the server (for  
example, configuring alerts), refer to the Integrated Lights Out Manager Administration  
Guide.  
If any of the logs or information screens indicate a DIMM error, see Chapter 3.  
If the problem with the server is not evident after viewing ILOM SP logs and  
43  
Download from Www.Somanuals.com. All Manuals Search And Download.  
     
Making a Serial Connection to the SP  
To make a serial connection to the SP:  
1. Connect a serial cable from the RJ-45 Serial Management port on server to a  
terminal device.  
2. Press ENTER on the terminal device to establish a connection between that  
terminal device and the ILOM SP.  
Note – If you are connecting to the serial port on the SP before it has been powered  
up or during its power-up sequence, you will see boot messages.  
The service processor eventually displays a login prompt. For example:  
SUNSP0003BA84D777 login:  
The first string in the prompt is the default host name for the ILOM SP. It consists  
of the prefix SUNSPand the MAC address of the ILOM SP. The MAC address for  
each ILOM SP is unique.  
3. Log in to the SP and type the default user name, root, with the default  
password, changeme.  
Once you have successfully logged in to the SP, it displays its default command  
prompt.  
->  
4. To start the serial console, type the following commands:  
cd /SP/console  
start  
To exit console mode and return to the service processor, type  
(escape/shift 9)  
Continue with the following procedures:  
44  
Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008  
Download from Www.Somanuals.com. All Manuals Search And Download.  
   
Viewing ILOM SP Event Logs  
Events are notifications that occur in response to some actions. The IPMI system  
event log (SEL) provides status information about the server’s hardware and  
software to the ILOM software, which displays the events in the ILOM web GUI. To  
view event logs:  
1. Log in to the SP as Administrator or Operator to reach the ILOM web GUI:  
a. Type the IP address of the server’s SP into your web browser.  
The Sun Integrated Lights Out Manager Login screen is displayed.  
b. Type your user name and password.  
When you first try to access the ILOM SP, you are prompted to type the default  
user name and password. The default user name and password are:  
Default user name: root  
Default password: changeme  
2. From the System Monitoring tab, select Event Logs.  
The System Event Logs page is displayed. See FIGURE C-1 for a page that shows  
sample information.  
Appendix C Using the ILOM Service Processor GUI to View System Information  
45  
Download from Www.Somanuals.com. All Manuals Search And Download.  
     
FIGURE C-1 System Event Logs Page  
3. Select the category of event that you want to view in the log from the drop-  
down list box.  
You can select from the following types of events:  
Sensor-specific events. These events relate to a specific sensor for a component,  
for example, a fan sensor or a power supply sensor.  
BIOS-generated events. These events relate to error messages generated in the  
BIOS.  
System management software events. These events relate to events that occur  
within the ILOM software.  
46  
Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008  
Download from Www.Somanuals.com. All Manuals Search And Download.  
After you have selected a category of event, the Event Log table is updated with the  
specified events. The fields in the Event Log are described in TABLE C-1.  
TABLE C-1  
Event Log Fields  
Field  
Description  
Event ID  
Time Stamp  
The number of the event, in sequence from number 1.  
The day and time the event occurred. If the Network Time Protocol  
(NTP) server is enabled to set the SP time, the SP clock will use  
Universal Coordinated Time (UTC). For more information about  
Sensor Name  
The name of a component for which an event was recorded. The  
sensor name abbreviations correspond to these components:  
sys: System or chassis  
• p0: Processor 0  
• p1: Processor 1  
• io: I/O board  
• ps: Power supply  
• fp: Front panel  
• ft: Fan tray  
• mb: Motherboard  
Sensor Type  
Description  
The type of sensor for the specified event.  
A description of the event.  
4. To clear the event log, click the Clear Event Log button.  
A confirmation dialog box is displayed.  
5. Click OK to clear all entries in the log.  
6. If the problem with the server is not evident after viewing ILOM SP logs and  
Interpreting Event Log Time Stamps  
The system event log time stamps are related to the service processor clock settings.  
If the clock settings change, the change is reflected in the time stamps.  
When the service processor reboots, the SP clock is set to Thu Jan 1 00:00:00 UTC  
1970. The SP reboots as a result of the following:  
A complete system unplug/replug power cycle  
An IPMI command; for example, mc reset cold  
A command-line interface (CLI) command; for example, reset /SP  
Appendix C Using the ILOM Service Processor GUI to View System Information  
47  
Download from Www.Somanuals.com. All Manuals Search And Download.  
     
ILOM web GUI operation; for example, from the Maintenance tab, selecting Reset  
SP  
An SP firmware upgrade  
After an SP reboot, the SP clock is changed by the following events:  
When the host is booted. The host’s BIOS unconditionally sets the SP time to that  
indicated by the host’s RTC. The host’s RTC is set by the following operations:  
When the host’s CMOS is cleared as a result of changing the host’s RTC battery  
or inserting the CMOS-clear jumper on the motherboard. The host’s RTC starts  
at Jan 1 00:01:00 2002.  
When the host’s operating system sets the host’s RTC. The BIOS does not  
consider time zones. Solaris and Linux software respect time zones and will set  
the system clock to UTC. Therefore, after the OS adjusts the RTC, the time set  
by the BIOS will be UTC.  
When the user sets the RTC using the host BIOS Setup screen.  
Continuously via NTP if NTP is enabled on the SP. NTP jumping is enabled to  
recover quickly from an erroneous update from the BIOS or user. NTP servers  
provide UTC time. Therefore, if NTP is enabled on the SP, the SP clock will be in  
UTC.  
Via the CLI, ILOM web GUI, and IPMI  
Viewing Replaceable Component  
Information  
Depending on the component you select, information about the manufacturer,  
component name, serial number, and part number can be displayed. To view  
replaceable component information:  
1. Log in to the SP as Administrator or Operator to reach the ILOM web GUI:  
a. Type the IP address of the server’s SP into your web browser.  
The Sun Integrated Lights Out Manager Login screen is displayed.  
b. Type your user name and password.  
When you first try to access the ILOM Service Processor, you are prompted to  
type the default user name and password. The default user name and  
password are:  
Default user name: root  
Default password: changeme  
48  
Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008  
Download from Www.Somanuals.com. All Manuals Search And Download.  
   
2. From the System Information tab, select Components.  
The Replaceable Component Information page is displayed. See FIGURE C-2.  
FIGURE C-2 Replaceable Component Information Page  
3. Select a component from the drop-down list.  
Information about the selected component is displayed.  
4. If the problem with the server is not evident after viewing replaceable  
component information, continue with “Running SunVTS Diagnostic Tests” on  
page 7.  
Appendix C Using the ILOM Service Processor GUI to View System Information  
49  
Download from Www.Somanuals.com. All Manuals Search And Download.  
Viewing Sensors  
This section describes how to view the server temperature, voltage, and fan sensor  
readings.  
For a complete list of sensors, see Appendix D.  
To view sensor readings:  
1. Log in to the SP as Administrator or Operator to reach the ILOM web GUI:  
a. Type the IP address of the server’s SP into your web browser.  
The Sun Integrated Lights Out Manager Login screen is displayed.  
b. Type your user name and password.  
When you first try to access the ILOM Service Processor, you are prompted to  
type the default user name and password. The default user name and  
password are:  
Default user name: root  
Default password: changeme  
2. From the System Monitoring tab, select Sensor Readings.  
The Sensor Readings page is displayed. See FIGURE C-3.  
50  
Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008  
Download from Www.Somanuals.com. All Manuals Search And Download.  
   
FIGURE C-3 Sensor Readings Page  
3. Click the Refresh button to update the sensor readings to their current status.  
4. Click a sensor to display its thresholds.  
A display of properties and values appears. See the example in FIGURE C-4.  
Appendix C Using the ILOM Service Processor GUI to View System Information  
51  
Download from Www.Somanuals.com. All Manuals Search And Download.  
FIGURE C-4 Sensor Details Page  
5. If the problem with the server is not evident after viewing sensor readings  
52  
Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008  
Download from Www.Somanuals.com. All Manuals Search And Download.  
APPENDIX  
D
Error Handling  
This appendix contains information about how the servers process and log errors.  
See the following sections:  
Handling of Uncorrectable Errors  
This section lists facts and considerations about how the server handles  
uncorrectable errors.  
Note – The BIOS ChipKill feature must be disabled if you are testing for failures of  
multiple bits within a DRAM (ChipKill corrects for the failure of a four-bit wide  
DRAM).  
The BIOS logs the error to the SP system event log (SEL) through the board  
management controller (BMC).  
The SP's SEL is updated with the failing DIMM pair's particular bank address.  
The system reboots.  
The BIOS logs the error in DMI.  
53  
Download from Www.Somanuals.com. All Manuals Search And Download.  
     
Note – If the error is on low 1MB, the BIOS freezes after rebooting. Therefore, no  
DMI log is recorded.  
An example of the error reported by the SEL through IPMI 2.0 is as follows:  
When low memory is erroneous, the BIOS is frozen on pre-boot low memory  
test because the BIOS cannot decompress itself into faulty DRAM and execute  
the following items:  
ipmitool> sel list  
100 | 08/26/2005 | 11:36:09 | OEM #0xfb |  
200 | 08/26/2005 | 11:36:12 | System Firmware Error | No usable system memory  
300 | 08/26/2005 | 11:36:12 | Memory | Memory Device Disabled | CPU 0 DIMM 0  
When the faulty DIMM is beyond the BIOS's low 1MB extraction space, proper  
boot happens:  
ipmitool> sel list  
100 | 08/26/2005 | 05:04:04 | OEM #0xfb |  
200 | 08/26/2005 | 05:04:09 | Memory | Memory Device Disabled | CPU 0 DIMM 0  
Note the following considerations for this revision:  
Uncorrectable ECC Memory Error is not reported.  
Multi-bit ECC errors are reported as Memory Device Disabled.  
On first reboot, BIOS logs a HyperTransport Error in the DMI log.  
The BIOS disables the DIMM.  
The BIOS sends the SEL records to the BMC.  
The BIOS reboots again.  
The BIOS skips the faulty DIMM on the next POST memory test.  
The BIOS reports available memory, excluding the faulty DIMM pair.  
FIGURE D-1 shows an example of a DMI log screen from BIOS Setup Page.  
54  
Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008  
Download from Www.Somanuals.com. All Manuals Search And Download.  
FIGURE D-1 DMI Log Screen, Uncorrectable Error  
Appendix D Error Handling  
Download from Www.Somanuals.com. All Manuals Search And Download.  
55  
Handling of Correctable Errors  
This section lists facts and considerations about how the server handles correctable  
errors.  
During BIOS POST:  
The BIOS polls the MCK registers.  
The BIOS logs to DMI.  
The BIOS logs to the SP SEL through the BMC.  
The feature is turned off at OS boot time by default.  
The following Linux versions report correctable ECC syndrome and memory fill  
errors in /var/log, if kernel flag mceis indicated at boot time, or if mceis  
enabled through kernel compile or installation:  
RH3 Update5 single core  
RH4 Update1+  
SLES9 SP1+  
The Linux kernel (x86_64/kernel/mce.c) repeats a report every 30 seconds  
until another error is encountered and an 8131 flag is reset.  
Solaris support provides full self-healing and automated diagnosis for the CPU  
and Memory subsystems.  
FIGURE D-2 shows an example of a DMI log screen from BIOS Setup Page:  
56  
Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008  
Download from Www.Somanuals.com. All Manuals Search And Download.  
   
FIGURE D-2 DMI Log Screen, Correctable Error  
If during any stage of memory testing the BIOS finds itself incapable of  
reading/writing to the DIMM, it takes the following actions:  
The BIOS disables the DIMM as indicated by the Memory Decreased message  
in the example in EXAMPLE D-1.  
The BIOS logs an SEL record.  
The BIOS logs an event in DMI.  
Appendix D Error Handling  
Download from Www.Somanuals.com. All Manuals Search And Download.  
57  
EXAMPLE D-1  
DMI Log Screen, Correctable Error, Memory Decreased  
58  
Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008  
Download from Www.Somanuals.com. All Manuals Search And Download.  
Handling of Parity Errors (PERR)  
This section lists facts and considerations about how the server handles parity errors  
(PERR).  
The handling of parity errors works through NMIs.  
During BIOS POST, the NMI is logged in the DMI and the SP SEL. See the  
following example command and output:  
[root@d-mpk12-53-238 root]# ipmitool -H 129.146.53.95 -U root -P changeme -I lan  
sel list -v  
SEL Record ID  
Record Type  
Timestamp  
Generator ID  
EvM Revision  
Sensor Type  
Sensor Number  
Event Type  
: 0100  
: 00  
: 01/10/2002 20:16:16  
: 0001  
: 04  
: Critical Interrupt  
: 00  
: Sensor-specific Discrete  
: Assertion Event  
: 04ff00  
Event Direction  
Event Data  
Description  
: PCI PERR  
FIGURE D-3 shows an example of a DMI log screen from BIOS Setup Page, with a  
parity error.  
Appendix D Error Handling  
Download from Www.Somanuals.com. All Manuals Search And Download.  
59  
   
FIGURE D-3 DMI Log Screen, PCI Parity Error  
The BIOS displays the following messages and freezes (during POST or DOS):  
NMI EVENT!!  
System Halted due to Fatal NMI!  
The Linux NMI trap catches the interrupt and reports the following NMI  
“confusion report” sequence:  
Aug 5 05:15:00 d-mpk12-53-159 kernel: Uhhuh. NMI received for unknown reason 2d  
on CPU 0.  
Aug 5 05:15:00 d-mpk12-53-159 kernel: Uhhuh. NMI received for unknown reason 2d  
on CPU 1.  
Aug 5 05:15:00 d-mpk12-53-159 kernel: Dazed and confused, but trying to continue  
Aug 5 05:15:00 d-mpk12-53-159 kernel: Do you have a strange power saving mode  
enabled?  
Aug 5 05:15:00 d-mpk12-53-159 kernel: Uhhuh. NMI received for unknown reason 3d  
on CPU 1.  
Aug 5 05:15:00 d-mpk12-53-159 kernel: Dazed and confused, but trying to continue  
Aug 5 05:15:00 d-mpk12-53-159 kernel: Do you have a strange power saving mode  
enabled?  
Aug 5 05:15:00 d-mpk12-53-159 kernel: Uhhuh. NMI received for unknown reason 3d  
on CPU 0.  
Aug 5 05:15:00 d-mpk12-53-159 kernel: Dazed and confused, but trying to continue  
Aug 5 05:15:00 d-mpk12-53-159 kernel: Do you have a strange power saving mode  
enabled?  
Aug 5 05:15:00 d-mpk12-53-159 kernel: Dazed and confused, but trying to continue  
Aug 5 05:15:00 d-mpk12-53-159 kernel: Do you have a strange power saving mode  
enabled?  
60  
Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008  
Download from Www.Somanuals.com. All Manuals Search And Download.  
Note – The Linux system reboots, but does not inform the BIOS of this incident.  
Handling of System Errors (SERR)  
This section lists facts and considerations about how the server handles system  
errors (SERR).  
System error handling works through the HyperTransport Synch Flood Error  
mechanism on 8111 and 8131.  
The following events happen during BIOS POST:  
POST reports any previous system errors at the bottom of screen. See  
FIGURE D-4 for an example.  
FIGURE D-4 POST Screen, Previous System Error Listed  
SERR and Hypertransport Synch Flood Error are logged in DMI and the SP  
SEL. See the following sample output:  
SEL Record ID  
Record Type  
Timestamp  
: 0a00  
: 00  
: 08/10/2005 06:05:32  
: 0001  
Generator ID  
Appendix D Error Handling  
Download from Www.Somanuals.com. All Manuals Search And Download.  
61  
   
EvM Revision  
Sensor Type  
Sensor Number  
Event Type  
Event Direction  
Event Data  
: 04  
: Critical Interrupt  
: 00  
: Sensor-specific Discrete  
: Assertion Event  
: 05ffff  
Description  
: PCI SERR  
FIGURE D-5 shows an example DMI log screen from the BIOS Setup Page with a  
system error.  
FIGURE D-5 DMI Log Screen with Error  
62  
Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008  
Download from Www.Somanuals.com. All Manuals Search And Download.  
Handling Mismatching Processors  
This section lists facts and considerations about how the server handles mismatching  
processors.  
The BIOS performs a complete POST.  
The BIOS displays a report of any mismatching CPUs, as shown in the following  
example:  
AMIBIOS(C)2003 American Megatrends, Inc.  
BIOS Date: 08/10/05 14:51:11 Ver: 08.00.10  
CPU : AMD Opteron(tm) Processor 254, Speed : 2.4 GHz  
Count : 3, CPU Revision, CPU0 : E4, CPU1 : E6  
Microcode Revision, CPU0 : 0, CPU1 : 0  
DRAM Clocking CPU0 = 400 MHz, CPU1 Core0/1 = 400 MHz  
Sun Fire Server, 1 AMD North Bridge, Rev E4  
1 AMD North Bridge, Rev E6  
1 AMD 8111 I/O Hub, Rev C2  
2 AMD 8131 PCI-X Controllers, Rev B2  
System Serial Number : 0505AMF028  
BMC Firmware Revision : 1.00  
Checking NVRAM..  
Initializing USB Controllers .. Done.  
Press F2 to run Setup (CTRL+E on Remote Keyboard)  
Press F12 to boot from the network (CTRL+N on Remote Keyboard)  
Press F8 for BBS POPUP (CTRL+P on Remote Keyboard)  
No SEL or DMI event is recorded.  
The system enters Halt mode and the following message is displayed:  
******** Warning: Bad Mix of Processors *********  
Multiple core processors cannot be installed with single core  
processors.  
Fatal Error... System Halted.  
Appendix D Error Handling  
Download from Www.Somanuals.com. All Manuals Search And Download.  
63  
   
Hardware Error Handling Summary  
TABLE D-1 summarizes the most common hardware errors that you might encounter  
with these servers.  
TABLE D-1 Hardware Error Handling Summary  
Logged (DMI  
Log or SP  
SEL)  
Error  
Description  
Handling  
Fatal?  
SP failure  
The SP fails to boot The SP controls the system reset, so the  
upon application of system may power on, but will not come out  
Not logged Fatal  
system power.  
of reset.  
• During power up, the SP's boot loader  
turns on the power LED.  
• During SP boot, Linux startup, and SP  
sanity check, the power LED blinks.  
• The LED is turned off when SP  
management code (the IPMI stack) is  
started.  
• At exit of BIOS POST, the LED goes to  
STEADY ON state.  
SP failure  
SP boots but fails  
POST.  
The SP controls the system RESET, so the  
system will not come out of reset.  
Not logged Fatal  
BIOS POST  
failure  
Server BIOS does  
not pass POST.  
There are fatal and non-fatal errors in POST.  
The BIOS does detect some errors that are  
announced during POST as POST codes on  
the bottom right corner of the display on the  
serial console and on the video display. Some  
POST codes are forwarded to the SP for  
logging.  
The POST codes do not come out in  
sequential order and some are repeated,  
because some POST codes are issued by code  
in add-in card BIOS expansion ROMs.  
In the case of early POST failures (for  
example, the BSP fails to operate correctly),  
BIOS just halts without logging.  
For some other POST failures subsequent to  
memory and SP initialization, the BIOS logs a  
message to the SP’s SEL.  
64  
Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008  
Download from Www.Somanuals.com. All Manuals Search And Download.  
     
TABLE D-1 Hardware Error Handling Summary (Continued)  
Logged (DMI  
Log or SP  
SEL)  
Error  
Description  
Handling  
Fatal?  
Single-bit  
DRAM ECC  
error  
With ECC enabled  
in the BIOS Setup,  
the CPU detects  
and corrects a  
single-bit error on  
the DIMM interface.  
The CPU corrects the error in hardware. No SP SEL  
interrupt or machine check is generated by  
the hardware. The polling is triggered every  
half-second by SMI timer interrupts and is  
done by the BIOS SMI handler.  
Normal  
operation  
The BIOS SMI handler starts logging each  
detected error and stops logging when the  
limit for the same error is reached. The BIOS's  
polling can be disabled through a software  
interface.  
Single four-bit With CHIP-KILL  
DRAM error  
The CPU corrects the error in hardware. No SP SEL  
Normal  
operation  
enabled in the BIOS interrupt or machine check is generated by  
Setup, the CPU the hardware. The polling is triggered every  
detects and corrects half-second by SMI timer interrupts and is  
for the failure of a  
four-bit-wide  
DRAM on the  
done by the BIOS SMI handler.  
The BIOS SMI handler starts logging each  
detected error and stops logging when the  
limit for the same error is reached. The BIOS's  
polling can be disabled through a software  
interface.  
DIMM interface.  
Uncorrectable The CPU detects an The “sync flood” method is used to prevent  
SP SEL  
Fatal  
DRAM ECC  
error  
uncorrectable  
the erroneous data from being propagated  
multiple-bit DIMM across the Hypertransport links. The system  
error.  
reboots, the BIOS recovers the machine check  
register information, maps this information to  
the failing DIMM (when CHIPKILL is  
disabled) or DIMM pair (when CHIPKILL is  
enabled), and logs that information to the SP.  
The BIOS will halt the CPU.  
Unsupported  
DIMM  
configuration  
Unsupported  
DIMMs are used, or error, and halts the system.  
supported DIMMs  
are loaded  
improperly.  
The BIOS displays an error message, logs an DMI Log  
Fatal  
Fatal  
SP SEL  
HyperTranspor CRC or link error  
t link failure  
Sync floods on HyperTransport links, the  
machine resets itself, and error information  
gets retained through reset.  
DMI Log  
SP SEL  
on one of the  
Hypertransport  
Links.  
The BIOS reports, A Hyper Transport  
sync flood error occurred on last  
boot, press F1 to continue.  
Appendix D Error Handling  
Download from Www.Somanuals.com. All Manuals Search And Download.  
65  
TABLE D-1 Hardware Error Handling Summary (Continued)  
Logged (DMI  
Log or SP  
SEL)  
Error  
Description  
Handling  
Fatal?  
PCI SERR,  
PERR  
System or parity  
error on a PCI bus. machine resets itself, and error information  
gets retained through reset.  
Sync floods on HyperTransport links, the  
DMI Log  
SP SEL  
Fatal  
The BIOS reports, A Hyper Transport  
sync flood error occurred on last  
boot, press F1 to continue.  
BIOS POST  
Microcode  
Error  
The BIOS could not The BIOS displays an error message, logs the DMI Log  
Non-fatal  
find or load the  
CPU Microcode  
Update to the CPU.  
The message most  
likely appears when  
a new CPU is  
error to DMI, and boots.  
installed in a  
motherboard with  
an outdated BIOS.  
In this case, the  
BIOS must be  
updated.  
BIOS POST  
CMOS  
Checksum Bad Checksum check.  
CMOS contents  
failed the  
The BIOS displays an error message, logs the DMI Log  
error to DMI, and boots.  
Non-fatal  
Fatal  
Unsupported  
CPU  
The BIOS supports The BIOS displays an error message, logs the DMI Log  
mismatched  
error, and halts the system.  
configuration  
frequency and  
steppings in CPU  
configuration, but  
some CPUs might  
not be supported.  
Correctable  
error  
The CPU detects a  
variety of  
The CPU corrects the error in hardware. No DMI Log  
Normal  
operation  
interrupt or machine check is generated by  
SP SEL  
correctable errors in the hardware. The polling is triggered every  
the MCi_STATUS  
registers.  
half second by SMI timer interrupts, and is  
done by the BIOS SMI handler.  
The SMI handler logs a message to the SP  
SEL if the SEL is available, otherwise SMI  
logs a message to DMI. The BIOS's polling  
can be disabled through software SMI.  
Single fan  
failure  
Fan failure is  
The Front Fan Fault, Service Action Required, SP SEL  
Non-fatal  
detected by reading and individual fan module LEDs are lit.  
tach signals.  
66  
Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008  
Download from Www.Somanuals.com. All Manuals Search And Download.  
TABLE D-1 Hardware Error Handling Summary (Continued)  
Logged (DMI  
Log or SP  
SEL)  
Error  
Description  
Handling  
Fatal?  
Multiple fan  
failure  
Fan failure is  
The Front Fan Fault, Service Action Required, SP SEL  
Fatal  
detected by reading and individual fan module LEDs are lit.  
tach signals.  
Single power  
When any of the  
Service Action Required, and Power Supply SP SEL  
Non-fatal  
supply failure AC/DC  
PS_VIN_GOOD or  
Fault LEDs are lit.  
PS_PWR_OK  
signals are  
deasserted.  
DC/DC power Any  
The Service Action Required LED is lit, the  
system is powered down to standby power  
SP SEL  
Fatal  
Fatal  
Fatal  
converter  
failure  
POWER_GOOD  
signal is deasserted mode, and the Power LED enters standby  
from the DC/DC  
converters.  
blink state.  
Voltage  
The SP monitors  
The Service Action Required LED and Power SP SEL  
above/below  
threshold  
system voltages and Supply Fault LED blink.  
detects voltage  
above or below a  
given threshold.  
High  
temperature  
The SP monitors  
CPU and system  
temperatures, and  
detects  
The Service Action Required LED and System SP SEL  
Overheat Fault LED blink. The motherboard  
is shut down above the specified critical level.  
temperatures above  
a given threshold.  
Processor  
The CPU drives the CPLD shuts down power to the CPU. The  
SP SEL  
Fatal  
thermal trip  
THERMTRIP_L  
signal upon  
Service Action Required LED and System  
Overheat Fault LED blink.  
detecting an  
overtemp condition.  
Boot device  
failure  
The BIOS is not able The BIOS goes to the next boot device in the DMI Log  
Non-fatal  
to boot from a  
device in the boot  
device list.  
list. If all devices in the list fail, an error  
message is displayed, retry from beginning of  
list. SP can control/change boot order.  
Appendix D Error Handling  
Download from Www.Somanuals.com. All Manuals Search And Download.  
67  
68  
Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008  
Download from Www.Somanuals.com. All Manuals Search And Download.  
Service Processor system event log, See SP SEL  
SP event log  
Integrated Lights-Out Manager Service Processor,  
See ILOM SP GUI  
SP SEL  
SunVTS  
L
LEDs  
M
T
third-party Web sites, ix  
P
troubleshooting  
typographic conventions, ix  
U
POST  
R
S
sensors  
70  
Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008  
Download from Www.Somanuals.com. All Manuals Search And Download.  

Sharp Air Conditioner AF R80FX User Manual
Sharp Fax Machine AL 2060 User Manual
Sharp Printer MX 6500N User Manual
Shure Microphone SM31FH User Manual
Snapper Trimmer s31sst User Manual
Sony Car Stereo System HAP S1 User Manual
Sony Computer Monitor S73 User Manual
Sony Computer Monitor SDM M51 User Manual
Sony Digital Camera SNC Z20P User Manual
Sony Microcassette Recorder CFS 515L User Manual