Compaq Server SC RMS User Manual

Compaq AlphaServer SC RMS

Reference Manual

Quadrics Supercomputers World Ltd. Document Version 7 - June 22nd 2001 - AA-RLAZB-TE

Download from Www.Somanuals.com. All Manuals Search And Download.

Contents

Introduction

1-1

1.1

Scope of Manual . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Audience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1-1

1-3

1.2

1.3

1.4

1.5

1.6

1.7

Using this Manual . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Related Information . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Location of Online Documentation . . . . . . . . . . . . . . . . . . .

Reader’s Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Overview of RMS

2-1

2-3

2-4

2-5

2-6

2-7

2-8

2-9

2.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.2

The System Architecture . . . . . . . . . . . . . . . . . . . . . . . . .

Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

The Role of the RMS . . . . . . . . . . . . . . . . . . . . . . . . . . . .

The Structure of the RMS . . . . . . . . . . . . . . . . . . . . . .

The RMS Daemons . . . . . . . . . . . . . . . . . . . . . . . . . .

The RMS Commands . . . . . . . . . . . . . . . . . . . . . . . . .

The RMS Database . . . . . . . . . . . . . . . . . . . . . . . . . .

RMS Management Functions . . . . . . . . . . . . . . . . . . . . . .

Allocating Resources . . . . . . . . . . . . . . . . . . . . . . . . .

Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Access Control and Accounting . . . . . . . . . . . . . . . . . . .

2.2.1

2.3

2.3.1

2.3.2

2.3.3

2.3.4

2.4

2.4.1

2.4.2

2.4.3

Contents i

Download from Www.Somanuals.com. All Manuals Search And Download.

2.4.4

RMS Conﬁguration . . . . . . . . . . . . . . . . . . . . . . . . . .

2-10

Parallel Programs Under RMS

3-1

3-2

3-3

3.1

3.2

3.3

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Resource Requests . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Loading and Running Programs . . . . . . . . . . . . . . . . . . . . .

RMS Daemons

4-1

4-2

4-3

4-4

4-5

4-6

4-7

4-8

4.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4.1.1

4.1.2

4.1.3

4.2

Startup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Log Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Daemon Status . . . . . . . . . . . . . . . . . . . . . . . . . . . .

The Database Manager . . . . . . . . . . . . . . . . . . . . . . . . . .

The Machine Manager . . . . . . . . . . . . . . . . . . . . . . . . . .

Interaction with the Database . . . . . . . . . . . . . . . . . . .

The Partition Manager . . . . . . . . . . . . . . . . . . . . . . . . . .

Partition Startup . . . . . . . . . . . . . . . . . . . . . . . . . . .

Interaction with the Database . . . . . . . . . . . . . . . . . . .

The Switch Network Manager . . . . . . . . . . . . . . . . . . . . . .

Interaction with the Database . . . . . . . . . . . . . . . . . . .

The Transaction Log Manager . . . . . . . . . . . . . . . . . . . . . .

Interaction with the Database . . . . . . . . . . . . . . . . . . .

The Event Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Interaction with the Database . . . . . . . . . . . . . . . . . . .

The Process Manager . . . . . . . . . . . . . . . . . . . . . . . . . . .

Interaction with the Database . . . . . . . . . . . . . . . . . . .

The RMS Daemon . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Interaction with the Database . . . . . . . . . . . . . . . . . . .

4.3

4.3.1

4.4

4.4.1

4.4.2

4.5

4.5.1

4.6

4.6.1

4.7

4.7.1

4.8

4.8.1

4.9

4.9.1

RMS Commands

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5-1

5-3

allocate(1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

ii Contents

Download from Www.Somanuals.com. All Manuals Search And Download.

nodestatus(1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

msqladmin(1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

prun(1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

rcontrol(1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

rinfo(1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

rmsbuild(1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

rmsctl(1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

rmsexec(1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

rmshost(1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

rmsquery(1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

rmstbladm(1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5-8

5-9

5-11

5-20

5-32

5-35

5-37

5-39

5-41

5-42

5-44

Access Control, Usage Limits and Accounting

6-1

6-2

6-3

6-4

6-5

6-6

6.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Users and Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Access Controls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Access Controls Example . . . . . . . . . . . . . . . . . . . . . .

How Access Controls are Applied . . . . . . . . . . . . . . . . . . . .

Memory Limit Rules . . . . . . . . . . . . . . . . . . . . . . . . .

Priority Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

CPU Usage Limit Rules . . . . . . . . . . . . . . . . . . . . . . .

Accounting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

6.2

6.3

6.3.1

6.4

6.4.1

6.4.2

6.4.3

6.5

RMS Scheduling

7-1

7-2

7-3

7-5

7-6

7.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

7.2

Scheduling Policies . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Scheduling Constraints . . . . . . . . . . . . . . . . . . . . . . . . . .

What Happens When a Request is Received . . . . . . . . . . . . . .

Memory Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Swap Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Time Slicing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Suspend and Resume . . . . . . . . . . . . . . . . . . . . . . . .

7.3

7.4

7.4.1

7.4.2

7.4.3

7.4.4

Contents iii

Download from Www.Somanuals.com. All Manuals Search And Download.

7.4.5

Idle Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

7-6

Event Handling

8-1

8-2

8-3

8-4

8-6

8.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

8.1.1

8.1.2

8.2

Posting Events . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Waiting on Events . . . . . . . . . . . . . . . . . . . . . . . . . .

Event Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

List of Events Generated . . . . . . . . . . . . . . . . . . . . . . . . .

Extending the RMS Event Handling Mechanism . . . . . . . .

8.3

8.3.1

Setting up RMS

9-1

9.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9.2

Installation Planning . . . . . . . . . . . . . . . . . . . . . . . . . . .

Node Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Setting up RMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Starting RMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Initial Setup with One Partition . . . . . . . . . . . . . . . . . .

Simple Day/Night Setup . . . . . . . . . . . . . . . . . . . . . . .

Day-to-Day Operation . . . . . . . . . . . . . . . . . . . . . . . . . . .

Periodic Shift Changes . . . . . . . . . . . . . . . . . . . . . . . .

Backing Up the Database . . . . . . . . . . . . . . . . . . . . . .

Summarizing Accounting Data . . . . . . . . . . . . . . . . . . .

Archiving Data . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Database Maintenance . . . . . . . . . . . . . . . . . . . . . . . .

Conﬁguring Nodes Out . . . . . . . . . . . . . . . . . . . . . . .

Local Customization of RMS . . . . . . . . . . . . . . . . . . . . . . .

Partition Startup . . . . . . . . . . . . . . . . . . . . . . . . . . .

Core File Handling . . . . . . . . . . . . . . . . . . . . . . . . . .

Event Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Switch Manager Conﬁguration . . . . . . . . . . . . . . . . . . .

Log Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9-1

9.2.1

9.3

9-2

9.3.1

9.3.2

9.3.3

9.4

9-2

9-3

9-4

9-5

9.4.1

9.4.2

9.4.3

9.4.4

9.4.5

9.4.6

9.5

9-5

9-6

9-7

9-9

9-10

9-11

9-12

9.5.1

9.5.2

9.5.3

9.5.4

9.6

iv Contents

Download from Www.Somanuals.com. All Manuals Search And Download.

10 The RMS Database

10-1

10-2

10-4

10-6

10-8

10-9

10.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

10.1.1

10.1.2

10.1.3

10.2

General Information about the Tables . . . . . . . . . . . . . . .

Access to the Database . . . . . . . . . . . . . . . . . . . . . . . .

Categories of Table . . . . . . . . . . . . . . . . . . . . . . . . . .

Listing of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

The Access Controls Table . . . . . . . . . . . . . . . . . . . . . .

The Accounting Statistics Table . . . . . . . . . . . . . . . . . .

The Attributes Table . . . . . . . . . . . . . . . . . . . . . . . . .

The Elans Table . . . . . . . . . . . . . . . . . . . . . . . . . . . .

The Elites Table . . . . . . . . . . . . . . . . . . . . . . . . . . .

The Events Table . . . . . . . . . . . . . . . . . . . . . . . . . . .

10.2.1

10.2.2

10.2.3

10.2.4

10.2.5

10.2.6

10.2.7

10.2.8

10.2.9

10.2.10

10.2.11

10.2.12

10.2.13

10.2.14

10.2.15

10.2.16

10.2.17

10.2.18

10.2.19

10.2.20

10.2.21

10.2.22

10.2.23

10.2.24

The Event Handlers Table . . . . . . . . . . . . . . . . . . . . . . 10-10

The Fields Table . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-11

The Installed Components Table . . . . . . . . . . . . . . . . . . 10-12

The Jobs Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-12

The Link Errors Table . . . . . . . . . . . . . . . . . . . . . . . . 10-13

The Modules Table . . . . . . . . . . . . . . . . . . . . . . . . . . 10-14

The Module Types Table . . . . . . . . . . . . . . . . . . . . . . . 10-15

The Nodes Table . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-15

The Node Statistics Table . . . . . . . . . . . . . . . . . . . . . . 10-16

The Partitions Table . . . . . . . . . . . . . . . . . . . . . . . . . 10-17

The Projects Table . . . . . . . . . . . . . . . . . . . . . . . . . . 10-19

The Resources Table . . . . . . . . . . . . . . . . . . . . . . . . . 10-19

The Servers Table . . . . . . . . . . . . . . . . . . . . . . . . . . 10-20

The Services Table . . . . . . . . . . . . . . . . . . . . . . . . . . 10-21

The Software Products Table . . . . . . . . . . . . . . . . . . . . 10-22

The Switch Boards Table . . . . . . . . . . . . . . . . . . . . . . 10-23

The Transactions Table . . . . . . . . . . . . . . . . . . . . . . . 10-23

The Users Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-24

Contents v

Download from Www.Somanuals.com. All Manuals Search And Download.

A Compaq AlphaServer SC Interconnect Terms

A-1

A-4

A.1

A.2

A.3

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Link States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Link Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B RMS Status Values

B-1

B-2

B-3

B-4

B-5

B-6

B.1

B.2

B.3

B.4

B.5

B.6

B.7

B.8

B.9

Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Generic Status Values . . . . . . . . . . . . . . . . . . . . . . . . . . .

Job Status Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Link Status Values . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Module Status Values . . . . . . . . . . . . . . . . . . . . . . . . . . .

Node Status Values . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Partition Status Values . . . . . . . . . . . . . . . . . . . . . . . . . .

Resource Status Values . . . . . . . . . . . . . . . . . . . . . . . . . .

Transaction Status Values . . . . . . . . . . . . . . . . . . . . . . . .

C RMS Kernel Module

C-1

C-2

C-3

C-4

C-6

C-8

C-10

C.1

C.2

C.3

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Capabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

System Call Interface . . . . . . . . . . . . . . . . . . . . . . . . . . .

rms_setcorepath(3) . . . . . . . . . . . . . . . . . . . . . . . . . .

rms_getcorepath(3) . . . . . . . . . . . . . . . . . . . . . . . . . .

rms_prgcreate(3) . . . . . . . . . . . . . . . . . . . . . . . . . . .

rms_prgdestroy(3) . . . . . . . . . . . . . . . . . . . . . . . . . .

rms_prgids(3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

rms_prginfo(3) . . . . . . . . . . . . . . . . . . . . . . . . . . . .

rms_getprgid(3) . . . . . . . . . . . . . . . . . . . . . . . . . . . .

rms_prgsuspend(3) . . . . . . . . . . . . . . . . . . . . . . . . . .

rms_prgresume(3) . . . . . . . . . . . . . . . . . . . . . . . . . .

rms_prgsignal(3) . . . . . . . . . . . . . . . . . . . . . . . . . . .

rms_prgaddcap(3) . . . . . . . . . . . . . . . . . . . . . . . . . .

rms_setcap(3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

vi Contents

Download from Www.Somanuals.com. All Manuals Search And Download.

rms_ncaps(3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

rms_getcap(3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

rms_prggetstats(3) . . . . . . . . . . . . . . . . . . . . . . . . . .

C-12

C-13

D RMS Application Interface

D.1

D-1

D-2

D-4

D-6

D-7

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

rms_allocateResource(3) . . . . . . . . . . . . . . . . . . . . . . .

rms_deallocateResource(3) . . . . . . . . . . . . . . . . . . . . .

rms_run(3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

rms_suspendResource(3) . . . . . . . . . . . . . . . . . . . . . .

rms_resumeResource(3) . . . . . . . . . . . . . . . . . . . . . . .

rms_killResource(3) . . . . . . . . . . . . . . . . . . . . . . . . .

rms_defaultPartition(3) . . . . . . . . . . . . . . . . . . . . . . .

rms_numCpus(3) . . . . . . . . . . . . . . . . . . . . . . . . . . .

rms_numNodes(3) . . . . . . . . . . . . . . . . . . . . . . . . . .

rms_freeCpus(3) . . . . . . . . . . . . . . . . . . . . . . . . . . .

E Accounting Summary Script

E-1

E-2

E-3

E.1

E.2

E.3

E.4

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Command Line Interface . . . . . . . . . . . . . . . . . . . . . . . . .

Example Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Listing of the Script . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Glossary

Index

Glossary-1

Index-1

Contents vii

Download from Www.Somanuals.com. All Manuals Search And Download.

List of Figures

2.1

2.2

2.3

2.4

2.5

2.6

2.7

A Network of Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2-2

2-3

High Availability RMS Conﬁguration . . . . . . . . . . . . . . . . . . . .

The Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Partitioning a System . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Distribution of Processes . . . . . . . . . . . . . . . . . . . . . . . . . . .

Preemption of Low Priority Jobs . . . . . . . . . . . . . . . . . . . . . . .

Two Conﬁgurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2-6

2-7

2-8

2-9

2-10

3.1

3.2

Distribution of Parallel Processes . . . . . . . . . . . . . . . . . . . . . .

Loading and Running a Parallel Program . . . . . . . . . . . . . . . . .

3-2

3-3

A.1

A.2

A.3

A 2-Stage, 16-Node, Switch Network . . . . . . . . . . . . . . . . . . . .

A 3-Stage, 64-Node, Switch Network . . . . . . . . . . . . . . . . . . . .

A 3-Stage, 128-Node, Switch Network . . . . . . . . . . . . . . . . . . .

A-2

A-3

List of Figures i

Download from Www.Somanuals.com. All Manuals Search And Download.

List of Tables

10.1 Access Controls Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

10-4

10-5

10-6

10-7

10-8

10-9

10.2 Accounting Statistics Table . . . . . . . . . . . . . . . . . . . . . . . . .

10.3 Machine Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

10.4 Performance Statistics Attributes . . . . . . . . . . . . . . . . . . . . . .

10.5 Server Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

10.6 Scheduling Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

10.7 Elans Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

10.8 Elites Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

10.9 Events Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

10.10 Example of Status Changes . . . . . . . . . . . . . . . . . . . . . . . . . 10-10

10.11 Event Handlers Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-10

10.12 Fields Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-11

10.13 Type Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-11

10.14 Installed Components Table . . . . . . . . . . . . . . . . . . . . . . . . . 10-12

10.15 Jobs Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-12

10.16 Link Errors Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-13

10.17 Modules Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-14

10.18 Module Types Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-15

10.19 Valid Module Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-15

10.20 Nodes Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-16

10.21 Node Statistics Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-17

List of Tables i

Download from Www.Somanuals.com. All Manuals Search And Download.

10.22 Partitions Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-18

10.23 Projects Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-19

10.24 Resources Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-19

10.25 Servers Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-20

10.26 Services Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-21

10.27 Entries in the Services Table . . . . . . . . . . . . . . . . . . . . . . . . . 10-22

10.28 Software Products Table . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-22

10.29 Component Attribute Values . . . . . . . . . . . . . . . . . . . . . . . . . 10-22

10.30 Switch Boards Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-23

10.31 Transaction Log Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-23

10.32 Entry in the Transactions Table . . . . . . . . . . . . . . . . . . . . . . . 10-24

10.33 Users Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-24

A.1

Switch Network Parameters . . . . . . . . . . . . . . . . . . . . . . . . .

A-3

B.1

B.2

B.3

B.4

B.5

B.6

B.7

B.8

Job Status Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Link Status Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Module Status Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Node Status Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Run Level Status Values . . . . . . . . . . . . . . . . . . . . . . . . . . .

Partition Status Values . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Resource Status Values . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Transaction Status Values . . . . . . . . . . . . . . . . . . . . . . . . . .

B-2

B-3

B-4

B-5

B-6

ii List of Tables

Download from Www.Somanuals.com. All Manuals Search And Download.

Introduction

1.1 Scope of Manual

This manual describes the Resource Management System (RMS). The manual’s purpose

is to provide a technical overview of the RMS system, its functionality and

programmable interfaces. It covers the RMS daemons, client applications, the RMS

database, the system call interface to the RMS kernel module and the application

program interface to the RMS database.

1.2 Audience

This manual is intended for system administrators and developers. It provides a

detailed technical description of the operation and features of RMS and describes the

programming interface between RMS and third-party systems.

The manual assumes that the reader is familiar with the following:

• UNIX^ꢀoperating system including shell scripts

• C programming language

1.3 Using this Manual

This manual contains ten chapters and ﬁve appendices. The contents of these are as

follows:

1-1

Download from Www.Somanuals.com. All Manuals Search And Download.

Related Information

Chapter 1 (Introduction)

explains the layout of the manual and the conventions used to present

information

Chapter 2 (Overview of RMS)

overviews the functions of the RMS and introduces its components

Chapter 3 (Parallel Programs Under RMS)

shows how parallel programs are executed under RMS

Chapter 4 (RMS Daemons)

describes the functionality of the RMS daemons

Chapter 5 (RMS Commands)

describes the RMS commands

Chapter 6 (Access Control, Usage Limits and Accounting)

explains RMS access controls, usage limits and accounting

Chapter 7 (RMS Scheduling)

describes how RMS schedules parallel jobs

Chapter 8 (Event Handling)

describes RMS event handling

Chapter 9 (Setting up RMS)

explains how to set up RMS

Chapter 10 (The RMS Database)

presents the structure of tables in the RMS database

Appendix A (Compaq AlphaServer SC Interconnect Terms)

deﬁnes terms relating to support for QsNet in RMS

Appendix B (RMS Status Values)

lists the status values of RMS objects

Appendix C (RMS Kernel Module)

describes the RMS kernel module and its system call interface

Appendix D (RMS Application Interface)

describes the RMS application interface

Appendix E (Accounting Summary Script)

contains an example of producing accounting information

1-2 Introduction

Download from Www.Somanuals.com. All Manuals Search And Download.

Conventions

1.4 Related Information

The following manuals provide additional information about the RMS from the point of

view of either the system administrator or the user:

• Compaq AlphaServer SC User Guide

• Compaq AlphaServer SC System Administration Guide

1.5 Location of Online Documentation

Online documentation in HTML format is installed in the directory

/usr/opt/rms/docs/html and can be accessed from a browser at

http://rmshost:8081/html/index.html. PostScript and PDF versions of the

documents are in /usr/opt/rms/docs. Please consult your system administrator if

you have difﬁculty accessing the documentation. On-line documentation can also be

found on the AlphaServer SC System Software CD-ROM.

New versions of this and other Quadrics documentation can be found on the Quadrics

web site http://www.quadrics.com.

Further information on AlphaServer SC can be found on the Compaq website

http://www.compaq.com/hpc.

1.6 Reader’s Comments

If you would like to make any comments on this or any other AlphaServer SC manual

please contact your local Compaq support centre.

1.7 Conventions

The following typographical conventions have been used in this document:

monospace type

Monospace type denotes literal text. This is used for command

descriptions, ﬁle names and examples of output.

bold monospace type

Bold monospace type indicates text that the user enters when

contrasted with on-screen computer output.

Introduction 1-3

Download from Www.Somanuals.com. All Manuals Search And Download.

Conventions

italic monospace type

Italic (slanted) monospace type denotes some meta text. This is used

most often in command or parameter descriptions to show where a

textual value is to be substituted.

italic type

Italic (slanted) proportional type is used in the text to introduce new

terms. It is also used when referring to labels on graphical elements

such as buttons.

Ctrl/x

This symbol indicates that you hold down the Ctrl key while you

press another key or mouse button (shown here by x).

TLA

Small capital letters indicate an abbreviation (see Glossary).

ls(1)

A cross-reference to a reference page includes the appropriate section

number in parentheses.

A number sign represents the superuser prompt.

%, $

A percent sign represents the C shell system prompt. A dollar sign

represents the system prompt for the Bourne, Korn, and POSIX

shells.

1-4 Introduction

Download from Www.Somanuals.com. All Manuals Search And Download.

Overview of RMS

2.1 Introduction

This chapter describes the role of the Resource Management System (RMS). The RMS

provides tools for the management and use of a Compaq AlphaServer SC system. To put

into context the functions that RMS performs, a brief overview of the system architecture

is given ﬁrst in Section 2.2. Section 2.3 outlines the main functions of the RMS and

introduces the major components of the RMS: a set of UNIX daemons, a suite of

command line utilities and a SQL database. Finally, Section 2.4 describes the resource

management facilities from the system administrator’s point of view.

2.2 The System Architecture

An RMS system looks like a standard UNIX system: it has the familiar command shells,

editors, compilers, linkers and libraries; it runs the same applications. The RMS system

differs from the conventional UNIX one in that it can run parallel applications as well as

sequential ones. The processes that execute on the system, particularly the parallel

programs, are controlled by the RMS.

2.2.1 Nodes

An RMS system comprises a network of computers (referred to as nodes) as shown in

Figure 2.1. Each node may have single or multiple processors (such as a SMP server);

each node runs a single copy of UNIX. Nodes used interactively to login to the RMS

Overview of RMS 2-1

Download from Www.Somanuals.com. All Manuals Search And Download.

The System Architecture

system are also connected to an external LAN. The application nodes, used for running

parallel programs, are accessed through the RMS.

Figure 2.1: A Network of Nodes

QM-S16

Switch

Switch Network Control

Switch Network

...

Interactive Nodes

with LAN/FDDI

Concentrator

Interface

Terminal

Application Nodes

Management Network

All of the nodes are connected to a management network (normally, a 100 BaseT

Ethernet). They may also be connected to a Compaq AlphaServer SC Interconnect, to

provide high-performance user-space communications between application processes.

The RMS processes that manage the system reside either on an interactive node or on a

separate management server. This node, known as rmshost, holds the RMS database,

which stores all state for the RMS system.

For high-availability installations, the rmshost node should be an interactive node

rather than a management server. This will allow you to conﬁgure the system for

failover, as shown in Figure 2.2 (see Chapter 15 of the System Administration Guide for

details).

2-2 Overview of RMS

Download from Www.Somanuals.com. All Manuals Search And Download.

The Role of the RMS

Figure 2.2: High Availability RMS Conﬁguration

RMS Host

Backup RMS Host

RMS Database

The RMS processes run on the node with the name rmshost, which migrates to the

backup on fail-over. The database is held on a shared disk, accessible to both the

primary and backup node.

2.3 The Role of the RMS

The RMS provides a single point interface to the system for resource management. This

interface enables a system administrator to manage the system resources (CPUs,

memory, disks, and so on) effectively and easily. The RMS includes facilities for the

following administrative functions:

Monitoring

controlling and monitoring the nodes in the network to ensure the

correct operation of the hardware

Fault diagnosis

diagnosing faults and isolating errors; instigating fault recovery

and escalation procedures

Data collection

Allocating CPUs

Access control

Accounting

recording statistics on system performance

allocating system resources to applications

controlling user access to resources

single point for collecting accounting data

providing the system support required to run parallel programs

Parallel jobs

Overview of RMS 2-3

Download from Www.Somanuals.com. All Manuals Search And Download.

The Role of the RMS

Scheduling

Audit

deciding when and where to run parallel jobs

maintaining an audit trail of system state changes

From the user’s point of view, RMS provides tools for:

Information

Execution

querying the resources of the system

loading and running parallel programs on a given set of resources

monitoring the execution of parallel programs

Monitoring

2.3.1 The Structure of the RMS

RMS is implemented as a set of UNIX commands and daemons, programmed in C and

C++, using sockets for communications. All of the details of the system (its

conﬁguration, its current state, usage statistics) are maintained in a SQL database, as

shown in Figure 2.3. See Section 2.3.4 for an overview and

Chapter 10 (The RMS Database) for details of the database.

2.3.2 The RMS Daemons

A set of daemons provide the services required for managing the resources of the system.

To do this, the daemons both query and update the database (see Section 2.3.4).

• The Database Manager, msqld, provides SQL database services.

• The Machine Manager, mmanager, monitors the status of nodes in an RMS system.

• The Partition Manager, pmanager, controls the allocation of resources to users and

the scheduling of parallel programs.

• The Switch Network Manager, swmgr, supervises the operation of the Compaq

AlphaServer SC Interconnect, monitoring it for errors and collecting performance

data.

• The Event Manager, eventmgr, runs handlers in response to system incidents and

notiﬁes clients who have registered an interest in them.

• The Transaction Log Manager, tlogmgr, instigates database transactions that have

been requested in the Transaction Log. All client transactions are made through this

mechanism. This ensures that changes to the database are serialized and an audit

trail is kept.

• The Process Manager, rmsmhd, runs on each node in the system. It starts the other

RMS daemons.

2-4 Overview of RMS

Download from Www.Somanuals.com. All Manuals Search And Download.

The Role of the RMS

• The RMS Daemon, rmsd, runs on each node in the system. It loads and runs user

processes and monitors resource usage and system performance.

The RMS daemons are described in more detail in Chapter 4 (RMS Daemons).

2.3.3 The RMS Commands

RMS commands call on the RMS daemons to get information about the system, to

distribute work across the system, to monitor the state of programs and, in the case of

administrators, to conﬁgure the system and back it up. A suite of these RMS client

applications is supplied. There are commands for users and commands for system

administrators.

The user commands for gaining access to the system and running parallel programs are

as follows:

• allocate reserves resources for a user.

• prun loads and runs parallel programs.

• rinfo gets information about the resources in the system.

• rmsexec performs load balancing for the efﬁcient execution of sequential programs.

• rmsquery queries the database. Administrators can also use rmsquery to update

the database.

The system administration commands for managing the system are as follows:

• nodestatus gets and sets node status information.

• rcontrol starts, stops and reconﬁgures services.

• rmsbuild populates the RMS database with information on a given system.

• rmsctl starts and stops RMS and shows the system status.

• rmshost reports the name of the node hosting the RMS database.

• rmstbladm builds and maintains the database.

• msqladmin performs database server administration.

The services available to the different types of user (application programmer, operator,

system administrator) are subject to access control. Access control restrictions are

embedded in the SQL database, based on standard UNIX group IDs (see

Overview of RMS 2-5

Download from Www.Somanuals.com. All Manuals Search And Download.

RMS Management Functions

Section 10.2.20). Users have read access to all tables but no write access. Operator and

administrative applications are granted limited write access. Password-protected

administrative applications and RMS itself have full read/write access.

The RMS commands are described in more detail in Chapter 5 (RMS Commands).

2.3.4 The RMS Database

The database provides a platform-independent interface to the RMS system. Users and

administrators can interact with the database using standard SQL queries. For example,

the following query displays details about the nodes in the machine. It selects ﬁelds

from the table called nodes (see Section 10.2.14). The query is submitted through the

RMS client rmsquery.

$ rmsquery "select name,status from nodes"

atlasms running

atlas0

atlas1

atlas2

atlas3

running

Figure 2.3: The Database

Node Configuration

Network Configuration

Access Control

Resource Quotas

Accounting

Auditing

Usage Statistics

System State

Internal Support

RMS uses the mSQL database engine from Hughes Technologies (for details see

http://www.Hughes.com.au). Client applications may use C, C++, Java, HTML or

UNIX script interfaces to generate SQL queries. See the Quadrics support page

http://www.quadrics.com/web/support for details of the SQL language.

2-6 Overview of RMS

Download from Www.Somanuals.com. All Manuals Search And Download.

RMS Management Functions

2.4 RMS Management Functions

The RMS gives the system administrator control over how the resources of a system are

assigned to the tasks it must perform. This includes the allocation of resources

(Section 2.4.1), scheduling policies (Section 2.4.2), access controls and accounting

(Section 2.4.3) and system conﬁguration (Section 2.4.4).

2.4.1 Allocating Resources

The nodes in an RMS system can be conﬁgured into mutually exclusive sets known as

partitions as shown in Figure 2.4. The administrator can create partitions with different

mixes of resources to support a range of uses. For example, a system may have to cater

for a variety of processing loads, including the following:

• Interactive login sessions for conventional UNIX processes

• Parallel program development

• Production execution of parallel programs

• Distributed system services, such as database or ﬁle system servers, used by

conventional UNIX processes

• Sequential batch streams

Figure 2.4: Partitioning a System

Parallel

Sequential

batch

The system administrator can allocate a partition with appropriate resources for each of

these tasks. Furthermore, the administrator can control who accesses the partitions (by

user or by project) and how much of the resource they can consume. This ensures that

resources intended for a particular purpose, for example, running production parallel

codes, are not diverted to other uses, for example, running user shells.

Overview of RMS 2-7

Download from Www.Somanuals.com. All Manuals Search And Download.

RMS Management Functions

A further partition, the root partition, is always present. It includes all nodes. It does

not have a scheduler. The root partition can only be used by administrative users (root

and rms by default).

2.4.2 Scheduling

Partitions enable different scheduling policies to be put into action. On each partition,

one or more of three scheduling policies can be deployed to suit the intended usage:

1. Gang scheduling of parallel programs, where all processes in a program are

scheduled and de-scheduled together. This is the default scheduling policy for parallel

partitions.

2. Regular UNIX scheduling with the addition of load balancing, whereby the user can

run a sequential program on a lightly loaded node. The load may be judged in terms

of free CPU time, free memory or number of users.

3. Batch scheduling, where the use of resources is controlled by a batch system.

Scheduling parameters such as time limits, time slice interval and minimum request

size are applied on an individual partition basis. Default priorities, memory limits and

CPU usage limits can be applied to users or projects to tune the partition’s workload. For

details see Chapter 6 (Access Control, Usage Limits and Accounting) and

Chapter 7 (RMS Scheduling).

The partition shown in Figure 2.5 has its CPUs allocated to ﬁve parallel jobs. The jobs

have been allocated CPUs in two different ways: jobs 1 and 2 use all of the CPUs on each

node; jobs 3, 4 and 5 are running with only one or two CPUs per node. RMS allows the

user to specify how their job will be laid out, trading off the competing beneﬁts of

increased locality on the one hand against increased total memory size on the other.

With this allocation of resources, all ﬁve parallel programs can run concurrently on the

partition.

Figure 2.5: Distribution of Processes

Job 3

Job 4

4 CPUs

Job 5

Job 1

Job 2

16 Nodes

2-8 Overview of RMS

Download from Www.Somanuals.com. All Manuals Search And Download.

RMS Management Functions

The RMS scheduler allocates contiguous ranges of nodes with a given number of CPUs

per node ¹. Where possible each resource request is met by allocating a single range of

nodes. If this is not possible, an unconstrained request (those that only specify the

number of CPUs required) may be satisﬁed by allocating CPUs on disjoint nodes. This

ensures that an unconstrained resource request can utilize all of the available CPUs.

The scheduler attempts to ﬁnd free CPUs for each request. If this is not possible, the

request blocks until CPUs are available. RMS preempts programs when a higher priority

job is submitted, as shown in Figure 2.6. Initially, CPUs have been allocated for resource

requests 1 and 2. When the higher priority resource request 3 is submitted, 1 and 2 are

suspended; 3 runs to completion after which 1 and 2 are restarted.

Figure 2.6: Preemption of Low Priority Jobs

start jobs

Resource 1

Resource 2

suspend jobs

Resource 3

start job

job ends

Resource 1

Resource 2

resume jobs

2.4.3 Access Control and Accounting

Users are allocated resources on a per-partition basis. Resources in this context include

both CPUs and memory. The system administrator can control access to resources both

at the individual user level and at the project level (where a project is a list of users).

This means that default access controls can be set up at the project level and overridden

on an individual user basis as required. The access controls mechanism is described in

¹The scheduler allocates contiguous ranges of nodes so that processes may take advantage of the Compaq

AlphaServer SC Interconnect hardware support for broadcast and barrier operations which operate over a

contiguous range of network addresses.

Overview of RMS 2-9

Download from Www.Somanuals.com. All Manuals Search And Download.

RMS Management Functions

detail in Chapter 6 (Access Control, Usage Limits and Accounting).

Each partition, except the root partition, is managed by a Partition Manager (see

Section 4.4), which mediates user requests, checking access permissions and usage

limits before scheduling CPUs and starting user jobs.

An accounting record is created as CPUs are allocated to each request. It is updated

periodically until the resources are freed. The accounting record itemizes CPU and

memory usage, indexed by job, by user and by project.

2.4.4 RMS Conﬁguration

The set of partitions active at any time is known as a conﬁguration. A system will

normally have a number of conﬁgurations, each appropriate to a particular operating

pattern. For example, there may be one conﬁguration for normal working hours and

another for night time and weekend operation.

The CPUs allocated to a partition may vary between conﬁgurations. For example, a login

partition (nodes allocated for interactive use) may have more nodes allocated during

working hours than at night – it may even be absent from the night time conﬁguration.

A pair of conﬁgurations are shown in Figure 2.7.

Figure 2.7: Two Conﬁgurations

16 nodes, 4 CPUs per node

Day

Parallel

Development

Night

Parallel

RMS supports automated reconﬁguration at shift changes as well as dynamic

reconﬁguration in response to a request from an operator or administrator. The RMS

client rcontrol (Page 5-20) manages the switch-over from one conﬁguration to another.

For automatic reconﬁguration, rcontrol can be invoked from a cron job.

2-10 Overview of RMS

Download from Www.Somanuals.com. All Manuals Search And Download.

Parallel Programs Under RMS

3.1 Introduction

RMS provides users with tools for running parallel programs and monitoring their

execution, as described in Chapter 5 (RMS Commands). Users can determine what

resources are available to them and request allocation of the CPUs and memory required

to run their programs. This chapter describes the structure of parallel programs under

RMS and how they are run.

A parallel program consists of a controlling process, prun, and a number of application

processes distributed over one or more nodes. Each process may have multiple threads

running on one or more CPUs. prun can run on any node in the system but it normally

runs in a login partition or on an interactive node.

In a system with SMP nodes, RMS can allocate CPUs so as to use all of the CPUs on the

minimum number of nodes (a block distribution); alternatively, it can allocate a speciﬁed

number of CPUs on each node (a cyclic distribution). This ﬂexibility allows users to

choose between the competing beneﬁts of increased CPU count and memory size on each

node (generally good for multithreaded applications) and increased numbers of nodes

(generally best for applications requiring increased total memory size, memory

bandwidth and I/O bandwidth).

Parallel programs can be written so that they will run with varying numbers of CPUs

and varying numbers of CPUs per node. They can, for example, query the number of

processors allocated and determine their data distributions and communications

patterns accordingly (see Appendix C (RMS Kernel Module) for details).

Parallel Programs Under RMS 3-1

Download from Www.Somanuals.com. All Manuals Search And Download.

Resource Requests

3.2 Resource Requests

Having logged into the system, a user makes a request for the resources needed to run a

parallel program by using the RMS commands prun (see Page 5-11) or allocate (see

Page 5-3). When using the prun command, the request can specify details such as the

following:

• The partition on which to run the program (the -p option)

• The number of processes to run (the -n option)

• The number of nodes required (the -N option)

• The number of CPUs required per process (the -c option)

• The memory required per process (the RMS_MEMLIMIT environment variable)

• The distribution of processes over the nodes (the -m, -B and -R options)

• How standard input, output and error streams should be handled (the -i, -o and -e

options)

• The project to which the program belongs for accounting and scheduling purposes

(the -P option)

Two variants of a program with eight processes are shown in Figure 3.1: ﬁrst, with one

process per node; and then, with two processes per node.

Figure 3.1: Distribution of Parallel Processes

1 Process Per Node

2 Processes Per Node

3-2 Parallel Programs Under RMS

Download from Www.Somanuals.com. All Manuals Search And Download.

Loading and Running Programs

The resource request is sent to the Partition Manager, pmanager (described in

Section 4.4). The Partition Manager performs access checks (described in

Chapter 6 (Access Control, Usage Limits and Accounting)) and then allocates CPUs

according to the policies established for the partition (see Chapter 7 (RMS Scheduling)).

RMS makes a distinction between allocating resources and starting jobs on them. Before

the Partition Manager schedules a parallel program, it will ensure that the required

CPUs and memory are allocated. Note that this may cause requests to block for longer

than you might expect – especially when the job has not speciﬁed how much memory it

requires. Once CPUs have been allocated, jobs can be started on them immediately.

3.3 Loading and Running Programs

A simple parallel program is shown in Figure 3.2. It has eight application processes,

distributed over four nodes, two processes per node.

Figure 3.2: Loading and Running a Parallel Program

Partition Manager

rmsd

rmsloader

stdio

prun

Four Nodes in a Parallel Partition

RMS Node

Once the CPUs have been allocated, prun asks the pmanager to start the application

processes on the allocated CPUs. The pmanager does this by instructing the daemons

running on each of the allocated nodes to start the loader process rmsloader on the

user’s behalf.

The rmsloader process starts the application processes executing, forwarding their

stdout and stderr streams to prun (unless otherwise directed). Meanwhile, prun

supplies information on the application processes as requested by rmsloader and

forwards stdout and stderr to the controlling terminal or output ﬁles.

prun forwards stdin and certain signals (QUIT, USR1, USR2, WINCH) to the application

processes. If prun is killed, RMS cleans up the parallel program, killing the application

Parallel Programs Under RMS 3-3

Download from Www.Somanuals.com. All Manuals Search And Download.

Loading and Running Programs

processes, removing any core ﬁles if requested (see Page 5-11) and then deallocating the

CPUs.

The application processes are run from the user’s current working directory with the

current limits and group rights. The data and stack size limits may be reduced if RMS

has applied a memory limit to the program.

During execution, the processes may be suspended at any time by the scheduler to allow

a program with higher priority to run. All of the processes in a parallel program are

suspended together under the gang-scheduling policy used by RMS for parallel programs

(see Chapter 7 (RMS Scheduling) for details). They are restarted together when the

higher priority program has completed.

A parallel program exits when all of its processes have exited. When this happens, the

rmsloader processes reduce the exit status back to the controlling process by

performing a global OR of the exit status of each of the processes. If prun is run with

verbose reporting enabled, a non-zero exit status is accompanied by a message, as shown

in the following example:

$ prun -v myprog

...

myprog: process 0 exited with status 1

If the level of reporting is increased with the -vv option, prun provides a commentary

on the resource request. With the -vvv option, rmsloader also outputs information

identifying the activity on each node running the program, as shown in the following

example.

$ prun -vvv myprog

prun: running /home/duncan/myprog

prun: requesting 2 CPUs

prun: starting 2 processes on 2 cpus default memlimit no timelimit

prun: stdio server running

prun: loader 1 starting on atlas1 (10.128.0.7)

prun: loader 0 starting on atlas0 (10.128.0.8)

loader[atlas1]: program description complete

loader[atlas1]: nodes 2 contexts 1 capability type 0xffff8002 entries 2

loader[atlas1]: run process 1 node=5 cntx=244

prun: process 1 is pid 1265674 on atlas1

loader[atlas0]: program description complete

loader[atlas0]: nodes 2 contexts 1 capability type 0xffff8002 entries 2

loader[atlas0]: run process 0 node=4 cntx=244

prun: process 0 is pid 525636 on atlas0

...

When the program has exited, the CPUs are deallocated and the scheduler is called to

service the queue of waiting jobs.

3-4 Parallel Programs Under RMS

Download from Www.Somanuals.com. All Manuals Search And Download.

Loading and Running Programs

Sometimes, it is desirable for a user to be granted more control over the use of a

resource. For instance, the user may want to run several jobs concurrently or use the

same nodes for a sequence of jobs. This functionality is supported by the command

allocate (see Page 5-3) which allows a user to allocate CPUs in a parallel partition to a

UNIX shell. These CPUs are used for subsequent parallel jobs started from this shell.

The CPUs remain allocated until the shell exits or a time limit expires (see Section 7.3

and Section 7.4.5).

Parallel Programs Under RMS 3-5

Download from Www.Somanuals.com. All Manuals Search And Download.

RMS Daemons

4.1 Introduction

This chapter describes the role of the RMS daemons. There are daemons that run on the

rmshost node providing services for the system as a whole:

msqld

Manages the database (see Section 4.2).

mmanager

pmanager

swmgr

Monitors the health of the machine as a whole (see Section 4.3).

Controls the use of resources (see Section 4.4).

Monitors the health of the Compaq AlphaServer SC Interconnect (see

Section 4.5).

tlogmgr

Carries out transactions on behalf of RMS servers (see Section 4.6).

eventmgr

Provides a system-wide event-handling service (see Section 4.7).

There are daemons that run on each node, providing support for RMS functionality on

that node:

rmsmhd

rmsd

Acts as the Process Manager, starting all of the other RMS daemons

(see Section 4.8).

Carries out instructions from pmanager to run users’ programs (see

Section 4.9).

RMS Daemons 4-1

Download from Www.Somanuals.com. All Manuals Search And Download.

The Machine Manager

4.1.1 Startup

RMS is started as each node executes the initialization script /sbin/init.d/rms with

the start argument on startup. This starts the rmsmhd daemon which, in turn, starts

the other daemons on that node.

The daemons can also be started, stopped and reloaded individually by rcontrol once

RMS is running. See Page 5-20 for details.

4.1.2 Log Files

Output from the management daemons is logged to the directory /var/rms/adm/log.

The log ﬁles are called daemon.log, where daemon gives the name of the RMS daemon,

such as swmgr. The Partition Managers are distinguished by sufﬁxing pmanager with a

hyphen and then the name of the partition. For example, the Partition Manager for the

partition par1 is known is pmanager-par1.

Errors are logged to /var/rms/adm/log/error.log.

Output from rmsmhd and rmsd is logged to /tmp/rms.log on each node.

4.1.3 Daemon Status

The servers table contains information on the status of each daemon: the time it was

started, its process ID and the name of its host node (see Section 10.2.19 for details of

the table structure).

Note that the status ﬁeld in the servers table is set to error if an error occurs when

starting an RMS daemon. The corresponding entry in the events table describes what

went wrong (see Chapter 8 (Event Handling) for details).

The command rinfo can be used to get reports on the status of each daemon. See

Page 5-32 for details.

4.2 The Database Manager

The Database Manager, msqld, manages the RMS database, providing an SQL interface

for its clients. Client applications may use C, C++, Java or UNIX scripts to generate SQL

queries for msqld.

The database holds all state information for RMS. This information is initially created

by the RMS client application rmsbuild (see Page 5-35). The information is updated by

the other RMS daemons as RMS operates. The information can be backed up, restored

and generally maintained using the database administration program, rmstbladm (see

Page 5-44).

4-2 RMS Daemons

Download from Www.Somanuals.com. All Manuals Search And Download.

The Partition Manager

4.3 The Machine Manager

The Machine Manager, mmanager, is responsible for detecting and reporting changes in

the state of each node in the system. It records the current state of each node and any

changes in state in the database.

When a node is functioning correctly, rmsd, a daemon which runs on each node,

periodically updates the database. However, if the node crashes, or IP trafﬁc to and from

the node stops, then these updates stop. RMS uses the external monitor, mmanager, to

check periodically the service level of each node. It monitors whether IP is functioning

and whether the RMS daemons on each node are operating.

4.3.1 Interaction with the Database

The Machine Manager records the current status of nodes in the nodes table (see

Section 10.2.14) while changes to node status are entered in the events table (see

Section 10.2.6).

The interval at which the Machine Manager performs status checks is set in the

attributes table (see Section 10.2.3) with the node-status-poll-interval

attribute. If this attribute is not present, the general attribute rms-poll-interval is

used instead.

4.4 The Partition Manager

The nodes in the RMS machine are conﬁgured into mutually exclusive sets known as

partitions (see Section 2.4). By restricting access to partitions, the system administrator

can reserve particular partitions for speciﬁc types of tasks or users. In this way, the

system administrator can ensure that resources are used most effectively; for example,

that resources intended for running parallel programs are not consumed running user

shells. The access restrictions are set up in the access_controls table (see

Section 10.2.1) of the RMS database.

Each partition is controlled by a Partition Manager, pmanager. The Partition Manager

mediates each user’s requests for resources (CPUs and memory) to run jobs in the

partition. It checks the user’s access permissions and resource limits before adding the

request to its scheduling queue. The request blocks until the resources are allocated for

the job.

When the resources requested by the user become available, the Partition Manager

instructs rmsd, a daemon that runs on each node in the partition (see Section 4.9), to

create a communications context for the user’s job. Finally, the Partition Manager

replies to the user’s request and the user’s job starts.

RMS Daemons 4-3

Download from Www.Somanuals.com. All Manuals Search And Download.

The Partition Manager

The Partition Manager makes new scheduling decisions periodically and in response to

incoming resource requests (see Chapter 7 (RMS Scheduling) for details). These

decisions may result in jobs being suspended or resumed. Such scheduling operations,

together with those performed as jobs are killed, are performed by the Partition

Manager sending scheduling or signal delivery requests to the rmsds.

The Partition Manager is connected to its rmsds by a tree of sockets. Commands are

routed down this tree; they complete when an acknowledgement is returned. For

example, jobs are only marked as ﬁnished when the Partition Manager has conﬁrmed

that all of their processes have exited.

If the tree of sockets is broken by a node crash, the Partition Manager marks the node’s

partition as blocked and generates an event. The node can then be rebooted or

conﬁgured out of the machine. If the node is rebooted, the rmsds reconnect and the

Partition Manager continues as before. If the node cannot be rebooted then the partition

must be halted, the node conﬁgured out and the partition restarted. Jobs that spanned

the failing node are cleaned up at this point. The other jobs run on unless explicitly

killed. Scheduling and signal delivery operations are suspended while the partition is

blocked.

4.4.1 Partition Startup

The Partition Manager is started by the rmsmhd daemon, running on the rmshost node,

on instruction from rcontrol (see Page 5-20). Once the partition is running, a startup

script /opt/rms/etc/pstartup is executed. This script performs site-speciﬁc and

OS-speciﬁc actions depending upon the partition type.

4.4.2 Interaction with the Database

The Partition Manager makes updates to the partitions table (see Section 10.2.16)

when it starts and as CPUs are allocated and freed.

The Partition Manager creates an entry in the resources table (see Section 10.2.18)

each time a user makes a request for resources to run a job. This entry is updated each

time CPUs are allocated or deallocated. The Partition Manager adds an entry to the

jobs table (see Section 10.2.10) as each job starts, updating it if the job is suspended or

resumed and when the job completes.

The Partition Manager creates an entry in the accounting statistics (acctstats) table

(see Section 10.2.2) when CPUs are allocated. The entry is updated periodically until the

request completes.

The Partition Manager consults the users table (see Section 10.2.24), the projects

table (see Section 10.2.17) and the access_controls table (see Section 10.2.1) to verify

users’ access permissions and usage limits.

4-4 RMS Daemons

Download from Www.Somanuals.com. All Manuals Search And Download.

The Transaction Log Manager

Conﬁguration information about each partition is held in the partitions table (see

Section 10.2.16). The information is indexed by the name of the partition together with

the name of the active conﬁguration.

4.5 The Switch Network Manager

The Switch Network Manager, swmgr, controls and monitors the Compaq AlphaServer

SC Interconnect (see Appendix A (Compaq AlphaServer SC Interconnect Terms)). It does

this using the switch network control interface connected to the parallel port of the

primary management node. If swmgr detects an error in the switch network, it updates

the status of the switch concerned and generates an event.

swmgr collects fan, power supply and temperature data from the Compaq AlphaServer

SC Interconnect modules, updating status information and generating events if

components fail or temperatures exceed their operating limits. See Section 9.5.4 for

site-speciﬁc details of conﬁguring the swmgr.

4.5.1 Interaction with the Database

The Switch Network Manager creates and maintains the entries in the elites table

(see Section 10.2.5) and the switch_boards table (see Section 10.2.22). It maintains

entries in the elans table (see Section 10.2.4). In the event of errors, it creates entries

in the link_errors table (see Section 10.2.11).

4.6 The Transaction Log Manager

The Transaction Log Manager, tlogmgr, executes change of state requests that have

been entered in the transactions table (see Section 10.2.23) by RMS administrative

clients. This mechanism is employed to serialize changes to the database and to provide

an audit trail of such changes.

The entry in the transactions table records who requested the change, and names the

service required together with any arguments to pass to the process on startup. A

transaction handle (a unique ID) is generated for the entry and passed to both the client

and the RMS daemon that provides the service.

The RMS daemon uses the transaction handle to label any results it produces, such as an

entry in the transaction_outputs table (see Section 10.1.3). The client uses the

handle to select the result from the relevant table. Output from the service is appended

to an output log. The name of this log is entered in the transactions table together

with the status of the transaction.

The services that are available are listed in the services table (see Section 10.2.20).

RMS Daemons 4-5

Download from Www.Somanuals.com. All Manuals Search And Download.

The Process Manager

Each entry in the services table speciﬁes which command to run, who can run it and

on which host.

4.6.1 Interaction with the Database

The Transaction Log Manager maintains the transactions table (see Section 10.2.23).

It consults the services table (see Section 10.2.20) in order to execute transactions on

behalf of its clients.

4.7 The Event Manager

When an RMS daemon detects an anomaly (such as a node crash or a high temperature

reading), it writes an event description to the events table (see Section 10.2.6). It is the

job of the Event Manager, eventmgr, to execute recovery scripts that either correct the

fault or report it to the operators if manual intervention is required.

On receiving an event notiﬁcation, the Event Manager looks for a matching entry in the

event_handlers table (see Section 10.2.7), executing the handler script if it ﬁnds a

match (see Section 8.2 for details). If no match is found, it runs the default event

handler script; this script is site-speciﬁc, but it would typically run a command to

escalate the event through SNMP or email.

The Event Manager also implements the event-waiting mechanism that enables client

applications both to generate and to wait efﬁciently on a speciﬁed event. Typical events

include the following:

• Nodes changing state

• Partitions starting

• Transaction log entries being executed

The details that describe the event are held in the events table (see Section 10.2.6).

The Event Manager’s job is to notify interested clients that the event has occurred. This

frees the clients from having to poll for the information. For more information on RMS

event handling, see Chapter 8 (Event Handling).

4.7.1 Interaction with the Database

The Event Manager consults the events table (see Section 10.2.6) and the

event_handlers table (see Section 10.2.7).

4-6 RMS Daemons

Download from Www.Somanuals.com. All Manuals Search And Download.

The RMS Daemon

4.8 The Process Manager

The Process Manager, rmsmhd, is responsible for starting and stopping the other RMS

daemons. It runs on each node and is responsible for managing the other daemons that

run on its node. It starts them as the node boots, stops them as the node halts and starts

or stops them in response to requests from the RMS client application rcontrol (see

Page 5-20).

4.8.1 Interaction with the Database

RMS stores information regarding which daemons run on which nodes; this information

is stored centrally in the RMS database, rather than in node-speciﬁc conﬁguration ﬁles.

On startup, the Process Manager checks the servers table (see Section 10.2.19) for

entries matching its node. This information is used to start the other daemons. If its

child processes (the other daemons) are killed, it checks the table to see whether they

should be restarted. The Process Manager creates its own entry in the servers table.

4.9 The RMS Daemon

The RMS daemon rmsd runs on each node in the machine. Its purpose is as follows:

• To start application processes

• To implement scheduling decisions made by the Partition Manager

• To clean up after parallel programs when they have ﬁnished

• To execute RMS remote procedure calls on behalf of clients elsewhere in the network

• To collect accounting data and performance statistics

rmsd carries out the following tasks on behalf of the Partition Manager to run a user’s

parallel program:

• Creating and destroying communication contexts (see Section C.2)

• Starting the application loader, rmsloader.

• Delivering signals

• Suspending and resuming processes

• Collecting accounting data from the kernel

RMS Daemons 4-7

Download from Www.Somanuals.com. All Manuals Search And Download.

The RMS Daemon

The rmsds communicate with each other and with the Partition Manager that controls

their node over a balanced tree of sockets. Requests (for example, to deliver a signal to

all processes in a parallel program) are passed down this tree to the appropriate range of

nodes. The results of each request are combined as they pass back up the tree.

rmsd is started by the RMS daemon rmsmhd and restarted when it exits – this happens

when a partition is shut down.

4.9.1 Interaction with the Database

rmsd records conﬁguration information about each node (number of CPUs, amount of

memory and so on) in the nodes table (see Section 10.2.14) as it starts. It periodically

records usage statistics in the node statistics (node_stats) table (see Section 10.2.15).

The interval at which these statistics are sampled is set in the attributes table with

the cpu-stats-poll-interval attribute.

rmsd records details of the node’s Compaq AlphaServer SC Interconnect conﬁguration in

the elans table as it starts (see Section 10.2.4 and

Appendix A (Compaq AlphaServer SC Interconnect Terms)).

4-8 RMS Daemons

Download from Www.Somanuals.com. All Manuals Search And Download.

RMS Commands

5.1 Introduction

This chapter describes the RMS commands. RMS includes utilities that enable system

administrators to conﬁgure and manage the system, in addition to those that enable

users to run their programs.

RMS includes the following commands intended for use by system administrators:

rcontrol

rmsbuild

The rcontrol command is used to control the system resources.

The rmsbuild command creates and populates an RMS database for

a given machine.

rmsctl

The rmsctl script is used to stop and start the RMS system and to

report its status.

rmsquery

rmstbladm

The rmsquery command is used to select data from the database and,

in the case of system administrators, to update it.

The table administration rmstbladm program is used to create a

database, to back it up and to restore it.

The following utilities are used internally by RMS and may also be used by system

administrators:

nodestatus

The nodestatus command is used to get or set the status or run

level of a node.

RMS Commands 5-1

Download from Www.Somanuals.com. All Manuals Search And Download.

Introduction

rmshost

The rmshost command reports the name of the node running the

RMS management daemons.

msqladmin

The msqladmin command is used for creating and deleting databases

and stopping the mSQL server.

RMS includes the following commands for all users of the system:

allocate

The allocate command is used to reserve access to a set of CPUs

either for running multiple tasks in parallel or for running a sequence

of commands on the same CPUs.

prun

The prun command is used to run a parallel program or to run

multiple copies of a sequential program.

rinfo

rmsexec

The rinfo command is used to determine what resources are

available and which jobs are running.

The rmsexec command is used to run a sequential program on a

lightly loaded node.

The following sections describe the commands in more detail, listing them in

alphabetical order.

5-2 RMS Commands

Download from Www.Somanuals.com. All Manuals Search And Download.

allocate(1)

NAME

allocate – Reserves access to CPUs

SYNOPSIS

allocate [-hIv] [-B base] [-C CPUs] [-N nodes | all] [-n CPUs]

[-p partition] [-P project] [-R request]

[script [args ...]]

OPTIONS

-B base

Speciﬁes the number of the base node (the ﬁrst node to use) in the

partition. Numbering within the partition starts at 0. By default, the

base node is unassigned, leaving the scheduler free to select nodes

that are not in use.

-C CPUs

-h

Speciﬁes the number of CPUs required per node (default 1).

Display the list of options.

-I

Allocate CPUs immediately or fail. By default, allocate blocks until

resources become available.

-N nodes | all

Speciﬁes the number of nodes to allocate (default 1). To allocate one

CPU on each node in the partition, use the argument all as follows:

allocate -N all. Either the -C option or the -n option can be

combined with -N but not both.

-n CPUs

Speciﬁes the total number of CPUs required.

-P project

Speciﬁes the name of the project with which the job should be

associated for scheduling and accounting purposes.

-p partition Speciﬁes the target partition from which the resources are to be

allocated.

-R request

Requests a particular conﬁguration of resources. The types of

request currently supported are as follows:

RMS Commands 5-3

Download from Www.Somanuals.com. All Manuals Search And Download.

allocate(1)

immediate=0 | 1

With a value of 1, this speciﬁes that the request

should fail if it cannot be met immediately (this is

the same as the -I option).

hwbcast=0 | 1 With a value of 1, this speciﬁes a contiguous range

of nodes and constrains the scheduler to queue the

request until a contiguous range becomes available.

rails=n

In a multirail system, this speciﬁes the number of

rails required, where 1 ≤ n ≤ 32.

Multiple requests can be entered as a comma-separated list, for

example, -R hwbcast=1,immediate=1.

-v

Speciﬁes verbose operation.

DESCRIPTION

The allocate program allocates resources for subsequent use by the prun(1)

command. allocate is intended for use where a user wants to run a sequence of

commands or several programs concurrently on the same set of CPUs.

The -p, -N, -C, -B and -n options control which CPUs are allocated. The -N option

speciﬁes how many nodes are to be allocated. When this option is speciﬁed the user is

allocated a constant number of CPUs per node (default 1). The -C option speciﬁes the

number of CPUs required per node. The alternative -n option speciﬁes the total number

of CPUs to allocate. This option does not force the allocation of a constant number of

CPUs per node.

The -B option speciﬁes the base of a contiguous range of nodes relative to the start of the

partition. The -N option speciﬁes its extent. So for example -B0-N4 speciﬁes the ﬁrst

four nodes in the partition. Note that nodes that have been conﬁgured out are excluded.

The -B option should be used to gain access to a speciﬁc ﬁle system or device that is not

available on all nodes. If the -B option is used, the scheduler allocates a contiguous

range of nodes and the same number of CPUs on each node. Using this option causes a

request to block until the base node and any additional nodes required to run the

program are free.

The -p option speciﬁes the partition from which CPUs can be allocated. CPUs cannot be

allocated from the root partition.

The Partition Manager, pmanager, allocates processing resources to users as and when

the resources are requested and become available. (See Section 4.4). By default, a

contiguous range of nodes is allocated to the request where possible. This enables

programs to take advantage of the system’s hardware broadcast facilities. The -R option

5-4 RMS Commands

Download from Www.Somanuals.com. All Manuals Search And Download.

allocate(1)

can be used with hwbcast set to 1 to ensure that the range of nodes allocated is

contiguous.

Before allocating resources, the Partition Manager checks the resource limits imposed

on the current project. The project can be speciﬁed explicitly with the -P option. This

overrides the value of the environment variable RMS_PROJECT or any default setting in

the users table. (See Section 10.2.24).

The script argument (with optional arguments) can be used in two different ways, as

follows:

1. script is not speciﬁed, in which case an interactive command shell is spawned with

the resources allocated to it. The user can conﬁrm that resources have been allocated

to an interactive shell by using the rinfo command. (See Page 5-32).

The resources are reserved until the shell exits or until a time limit deﬁned by the

system administrator expires, whichever happens ﬁrst. (See Section 10.2.16).

Parallel programs, executed from this interactive shell, all run on the shell’s

resources (concurrently, if sufﬁcient resources are available).

2. script speciﬁes a shell script, in which case the resources are allocated to the named

subshell and freed when execution of the script completes.

ENVIRONMENT VARIABLES

The following environment variables may be used to identify resource requirements and

modes of operation to allocate. They are used where no equivalent command line

options are given.

RMS_IMMEDIATE Controls whether to exit (value 1) rather than block (value 0) if

resources are not immediately available. The -I option overrides the

value of this environment variable. By default, allocate blocks until

resources become available. Root resource requests are always met.

RMS_MEMLIMIT Speciﬁes the maximum amount of memory required. This must be

less than or equal to the limit set by the system administrator.

RMS_PARTITION Speciﬁes the name of a partition. The -p option overrides the value of

this environment variable.

RMS_PROJECT

Speciﬁes the name of the project with which the request should be

associated for accounting purposes. The -P option overrides the value

of this environment variable.

RMS Commands 5-5

Download from Www.Somanuals.com. All Manuals Search And Download.

allocate(1)

RMS_TIMELIMIT Speciﬁes the execution time limit in seconds. The program will be

signaled either after this time has elapsed or after any time limit

imposed by the system has elapsed. The shorter of the two time limits

is used.

RMS_DEBUG

Speciﬁes whether to execute in verbose mode and display diagnostic

messages. Setting a value of 1 or more will generate additional

information that may be useful in diagnosing problems. (See

Section 9.6). If this environment variable is not set the -v option

enables reporting of resource request debug information.

allocate passes all existing environment variables through to the shell that it

executes. In addition, it sets the following environment variable:

RMS_RESOURCEID

The identiﬁer of the allocated resource.

EXAMPLES

To run a sequence of jobs on the same CPUs:

$ allocate -N 16 jobscript

where jobscript is a shell script such as the following:

#!/bin/sh

# simple job script

prun -n 16 program1

prun -n 16 program2

If the script was run directly then each resource request would block until resources

became available and there would be no guarantee of both requests using the same

CPUs. By running the script under allocate, there is only one resource request and

both jobs are run on the same CPUs.

To run two programs on the same CPUs at the same time:

$ allocate -N 16 -C 2 << EOF

prun program1 &

prun program2 &

rinfo

wait

EOF

WARNINGS

In earlier versions, the -i option speciﬁed immediate mode. This functionality has been

moved to the -I option. Use of -i is now deprecated. If -i is speciﬁed without an

5-6 RMS Commands

Download from Www.Somanuals.com. All Manuals Search And Download.

allocate(1)

argument, it is interpreted as -I and the user is warned that this feature should not be

used anymore.

SEE ALSO

prun, rinfo

RMS Commands 5-7

Download from Www.Somanuals.com. All Manuals Search And Download.

nodestatus(1)

NAME

nodestatus – Gets or sets the status or run level of each node

SYNOPSIS

nodestatus [-bhr] [status]

OPTIONS

-b

-h

-r

Operate in the background.

Display the list of options.

Get/set run level.

DESCRIPTION

The nodestatus command is used to update status information in the RMS database as

nodes are booted or halted. When run without arguments, nodestatus gets the status

of the node on which it is running from the Machine Manager. When run with the -r

ﬂag, nodestatus gets the current run level.

When nodestatus is run with the status argument, it updates the node’s status or,

if the -r ﬂag is set, it updates the node’s run level. The change is reﬂected in the

nodes table for the node on which the command is running. (See Section 10.2.14). This

mechanism is used to track the progress of booting a node. Administrative privileges are

required to update the status or run level of a node.

The status can be one of these values: not responding, active or running.

Status updates may be delayed if the node running the database server is down. If

background operation is speciﬁed with the -b option, nodestatus runs in the

background and keeps trying until the database server is up and running.

5-8 RMS Commands

Download from Www.Somanuals.com. All Manuals Search And Download.

msqladmin(1)

NAME

msqladmin – Perform administrative operations on the mSQL database server

SYNOPSIS

msqladmin [-q] [-f confFile] [-h host] command

OPTIONS

-f confFile

Specify a non-default conﬁguration ﬁle to be loaded. The default

action is to load the standard conﬁguration ﬁle located in

/var/rms/msql.conf.

-h host

Specify a remote hostname or IP address on which the mSQL server

(msql2d) is running. The default is to connect to a server on the

localhost using a UNIX domain socket rather than TCP/IP (which

gives better performance).

-q

Put msqladmin into quiet mode. If this ﬂag is speciﬁed, msqladmin

will not prompt the user to verify dangerous actions (such as dropping

a database).

DESCRIPTION

msqladmin is used to perform administrative operations on an mSQL database server.

Such tasks include the creation of databases, performing server shutdowns and so on.

The available commands for msqladmin are:

create db_name

Creates a new database called db_name.

drop db_name Removes the database called db_name from the server. This will also

delete all data contained in the database speciﬁed.

shutdown

reload

Terminates the mSQL server.

Forces the server to reload ACL information.

version

Displays version and conﬁguration information about the currently

running server.

RMS Commands 5-9

Download from Www.Somanuals.com. All Manuals Search And Download.

msqladmin(1)

stats

Displays server statistics.

Most administrative functions can only be executed by the user speciﬁed in the run-time

conﬁguration as the admin user (rms). They can also only be executed from the host on

which the server process is running (for example you cannot shut down a remote server

process).

EXAMPLES

# msqladmin version

Version Details :-

msqladmin version

2.0.11

mSQL server version

mSQL protocol version

mSQL connection

2.0.11

Localhost via UNIX socket

OSF1-V5.0-alpha

Target platform

Configuration Details :-

Default config file

TCP socket

/var/rms/msql.conf

1114

UNIX socket

/var/rms/adm/msql/msql2.sock

mSQL user

rms

Admin user

rms

Install directory

PID file location

Memory Sync Timer

Hostname Lookup

/var/rms

/var/rms/adm/msql/msql2.pid

120

True

5-10 RMS Commands

Download from Www.Somanuals.com. All Manuals Search And Download.

prun(1)

NAME

prun – Runs a parallel program

SYNOPSIS

prun [-hIOrstv] [-B base] [-c cpus] [-e mode] [-i mode] [-o mode]

[-N nodes | all] [-n procs] [-m block | cyclic] [-P project]

[-p partition] [-R request] program [args ...]

OPTIONS

-B base

Speciﬁes the number of the base node (the ﬁrst node to use) in the

partition. Numbering within the partition starts at 0. By default, the

base node is unassigned, leaving the scheduler free to select nodes

that are not in use.

-c cpus

Speciﬁes the number of CPUs required per process (default 1).

-h

-I

Display the list of options.

Allocate CPUs immediately or fail. By default, prun blocks until

resources become available.

-e mode

-i mode

-o mode

Speciﬁes how standard error output is redirected. Valid values for

mode and their meanings are described below.

Speciﬁes how standard input is redirected. Valid values for mode and

their meanings are described below.

Speciﬁes how standard output is redirected. Valid values for mode

and their meanings are described below.

-m block | cyclic

Speciﬁes whether to use block (the default) or cyclic distribution of

processes over nodes.

-N nodes | all

Speciﬁes the number of nodes required. To use all nodes in a partition

select the all argument as follows: prun -N all. If the number of

nodes is not speciﬁed then the RMS scheduler will allocate one CPU

per process.

RMS Commands 5-11

Download from Www.Somanuals.com. All Manuals Search And Download.

prun(1)

-n procs

Speciﬁes the number of processes required. The -n and -N options

can be combined to control how processes are distributed over nodes.

If neither is speciﬁed, prun starts one process.

-O

Allows resources to be over-committed. Set this ﬂag to run more than

one process per CPU.

-P project

Speciﬁes the name of the project with which the job should be

associated for scheduling and accounting purposes.

-p partition Speciﬁes the partition on which to run the program. By default, the

partition speciﬁed in the attributes table is used. The default is

parallel. (See Section 10.2.3).

-R request

Requests a particular conﬁguration of resources. The types of

request currently supported are as follows:

immediate=0 | 1

With a value of 1, this speciﬁes that the request

should fail if it cannot be met immediately (the

same as the -I option).

hwbcast=0 | 1 With a value of 1, this speciﬁes a contiguous range

of nodes and constrains the scheduler to queue the

request until a contiguous range of nodes becomes

available.

rails=n

In a multirail system, this speciﬁes the number of

rails required, where 1 ≤ n ≤ 32.

Multiple requests can be entered as a comma-separated list, for

example, -R hwbcast=1,immediate=1.

-r

Run processes using rsh. Used for administrative operations such as

starting and stopping RMS.

-s

-t

-v

Print statistics as the job exits.

Preﬁx output with the process number.

Speciﬁes verbose operation. Multiple vs increase the level of output:

-vv shows each stage in running a program and -vvv enables debug

output from the rmsloader processes on each node.

5-12 RMS Commands

Download from Www.Somanuals.com. All Manuals Search And Download.

prun(1)

DESCRIPTION

The prun program executes multiple copies of the speciﬁed program on a partition.

prun automatically requests resources for the program unless it is executed from a shell

that already has resources allocated to it. (See Page 5-3).

The way in which processes are allocated to CPUs is controlled by the -c, -n, -p, -B and

-N options. The -n option speciﬁes the total number of processes to run. The -c option

speciﬁes the number of CPUs required per process, this defaults to 1. The -N option

speciﬁes how many nodes are to be used.

If the -N option is not used then the scheduler selects CPUs for the program from any of

the available nodes. Where possible RMS will allocate a contiguous range of nodes, but

will only be constrained to do so if the -B or -R hwbcast=1 options are set. If the -N is

used, the scheduler allocates the speciﬁed number of nodes (allocating a contiguous

range of nodes if possible) and the same number of CPUs on each node. By default, a

contiguous range of nodes is allocated to the request where possible. This enables

programs to take advantage of the system’s hardware broadcast facilities. The -R option

can be used with hwbcast set to 1 to ensure that the range of nodes allocated is

contiguous.

The -B option speciﬁes the base of a contiguous range of nodes relative to the start of the

partition. The -N option speciﬁes its extent. So for example -B0 -N4 speciﬁes the ﬁrst

four nodes in the partition. Note that nodes that have been conﬁgured out are excluded.

The -B option should be used to gain access to a speciﬁc ﬁle system or device that is not

available on all nodes. If the -B option is used, the scheduler allocates a contiguous

range of nodes and the same number of CPUs on each node. Using this option causes a

request to block until the base node and any additional nodes required to run the

program are free.

The -I option speciﬁes that resource requests should fail if they cannot be met

immediately. The default is to block until CPUs are available.

The -m option speciﬁes how processes are to be distributed over nodes. The choice is

between block (the default) and cyclic. If a program has n processes with identiﬁers

0,1,...n-1 distributed over N nodes then, in a block distribution, the ﬁrst n/N

processes are allocated to the ﬁrst node and so on. If the distribution is cyclic, process 0

runs on the ﬁrst node, process 1 on the second and so on until process N-1 is placed on

the last node, at which stage the distribution wraps around, with process N running on

the ﬁrst node and so on.

The -p option speciﬁes the partition to use. If no partition is speciﬁed then the default

partition is used. The default partition is stored in the attributes table. (See

Section 10.2.3). Note that use of the root partition (all nodes in the machine) is

restricted to administrative users.

RMS Commands 5-13

Download from Www.Somanuals.com. All Manuals Search And Download.

prun(1)

Before allocating resources, prun checks the resource limits imposed on the current

project. The project can be speciﬁed explicitly with the -P option. This overrides the

value of the environment variable RMS_PROJECT or any default setting in the users

table. (See Section 10.2.24).

By default, when running a parallel program, prun forwards standard input to the

process with an identiﬁer of 0. The -i option requests a different mode of operation.

Valid values for mode and their meanings are as follows:

rank

Forward standard input to the process that is identiﬁed by rank

where 0 ≤ rank ≤ n-1 and n is the number of processes in the

program.

all

Broadcast standard input to all of the processes.

Do not forward standard input.

none

file

prun opens the named file and associates it with the standard input

stream so that each process reads standard input from the ﬁle. If the

ﬁle does not exist, a read returns EOF.

file.%

prun expands the % character to generate and open a separate ﬁle

name for each process: process 0 reads standard input from file.0,

process 1 reads standard input from file.1 and so on. If the ﬁle does

not exist, a read returns EOF.

If the mode is rank or all, prun polls its standard input and forwards the data to the

rmsloader of the application process (or processes if the mode is all). rmsloader

writes the data to the standard input pipe for the process. This write may fail if the pipe

is full, the application has not read the data. If this happens, rmsloader will

periodically attempt to resend the data to the pipe. prun will not poll for further

standard input until it has received an acknowledgement from the process (or all

processes in the case of broadcast input) to say that this operation has completed.

The -o and -e options control the redirection and ﬁltering of standard output and

standard error respectively. Valid values for mode and their meanings for these options

are as follows:

rank

Redirect to prun standard output (or standard error) from the process

identiﬁed by rank where 0 ≤ rank ≤ n-1 and n is the number of

processes in the program.

all

Redirect standard output (or standard error) from all processes to

prun. This is the default.

5-14 RMS Commands

Download from Www.Somanuals.com. All Manuals Search And Download.

prun(1)

none

Do not redirect standard output (or standard error) from any process.

file

prun opens the named file for output and associates it with the

standard output (standard error) stream so that each process writes

standard output (standard error) to the ﬁle.

file.%

prun expands the % character to generate and open for output a

separate ﬁle name for each process: process 0 writes standard output

(standard error) to file.0, process 1 writes to file.1 and so on.

Standard output from a parallel program is line-buffered and redirected to prun when a

newline character is received. Output that does not end in a newline is buffered by

rmsloader.

Standard error is unbuffered and forwarded to prun as soon as it is received by

rmsloader.

There is no global synchronization of output from a parallel program. If multiple

processes output data, the order in which the data is output will not necessarily be the

same each time the program is run.

prun exits when all of the processes in the parallel program have exited or when one

process has been killed. If all processes exit cleanly then the exit status of prun is the

global OR of their individual exit status values. If one of the processes is killed, prun will

exit with a status value of 128 plus the signal number. prun can also exit with the

following codes:

125 One or more processes were still running when the exit timeout expired.

126 prun was run with the -I option and resources were not available.

127 prun was run with invalid arguments.

If an application process started by prun is killed, RMS will run a post mortem core

analysis script that generates a backtrace if it can ﬁnd a core ﬁle for the process.

The attribute rms-keep-core in the attributes table determines whether core ﬁles

are saved. (See Section 10.2.3). The environment variable RMS_KEEP_CORE can be set to

override the value in the attributes table.

Core ﬁles are saved in the directory local-corepath/resource-id. The value of

local-corepath is deﬁned in the attributes table. The resource-id can be listed

by rinfo. (See Page 5-32). prun also sets the environment variable RMS_RESOURCE_ID

to the value of the resource identiﬁer.

RMS Commands 5-15

Download from Www.Somanuals.com. All Manuals Search And Download.

prun(1)

ENVIRONMENT VARIABLES

The following environment variables may be used to identify resource requirements and

modes of operation to prun. These environment variables are used where no equivalent

command line options are given:

RMS_IMMEDIATE Controls whether to exit rather than block if resources are not

immediately available. The -I option overrides the value of this

environment variable. By default, prun blocks until resources become

available. Root resource requests are always met.

RMS_KEEP_CORE Controls whether core ﬁles are saved. Overrides the default

behaviour set by the system administrator.

RMS_MEMLIMIT The maximum amount of memory required per process in megabytes.

This must be less than or equal to the limit set by the system

administrator.

RMS_PARTITION Speciﬁes the name of a partition. The -p option overrides the value of

this environment variable.

RMS_PROJECT

The name of the project with which the job should be associated for

scheduling and accounting purposes. The -P option overrides the

value of this environment variable.

RMS_TIMELIMIT Speciﬁes the execution time limit in seconds. The program will be

signaled either after this time has elapsed or after any time limit

imposed by the system has elapsed. The shorter of the two time limits

is used.

RMS_DEBUG

Whether to execute in verbose mode and display diagnostic messages.

Setting a value of 1 or more generates additional information that

may be useful in diagnosing problems. (See Section 9.6).

RMS_EXITTIMEOUT

Speciﬁes the time allowed in seconds between the ﬁrst process exit

and the last. This option can be useful in parallel programs where one

process can exit leaving the others blocked in interprocess

communication. It should be used in conjunction with an exit barrier

at the end of correct execution of the program.

RMS_STDINMODE Speciﬁes the mode for forwarding standard input to a parallel

program. The -i option overrides the value of this environment

variable. Values for mode are the same as those used with the -i

option.

5-16 RMS Commands

Download from Www.Somanuals.com. All Manuals Search And Download.

prun(1)

RMS_STDOUTMODE

RMS_STDERRMODE

Speciﬁes the mode for redirecting standard output from a parallel

program. The -o option overrides the value of this environment

variable. Values for mode are the same as those used with the -o

option.

Speciﬁes the mode for redirecting standard error from a parallel

program. The -e option overrides the value of this environment

variable. Values for mode are the same as those used with the -e

option.

prun passes all existing environment variables through to the processes that it

executes. In addition, it sets the following environment variables:

RMS_JOBID

RMS_NNODES

RMS_NODEID

The identiﬁer for the job.

The number of nodes used by the application.

The logical identiﬁer of the node within the set allocated to the

application.

RMS_NPROCS

RMS_RANK

The total number of processes in the application.

The rank of the process in the application. The rank ranges from 0

to n-1, where n is the number of processes in the program.

RMS_RESOURCEID

The identiﬁer of the allocated resource.

EXAMPLES

In the following example, prun is used to run a four-process program with no

speciﬁcation of where the processes should run.

$ prun -n 4 hostname

atlas0.quadrics.com

The machine atlas has four CPUs per node and so, by default, the scheduler allocates

all four CPUs on one node to run the program. Add the -N option, as follows, to control

how the processes are distributed over nodes.

RMS Commands 5-17

Download from Www.Somanuals.com. All Manuals Search And Download.

prun(1)

$ prun -n 4 -N 2 hostname

atlas0.quadrics.com

atlas1.quadrics.com

$ prun -n 4 -N 4 hostname

atlas1.quadrics.com

atlas3.quadrics.com

atlas0.quadrics.com

atlas2.quadrics.com

The -m option controls how processes are distributed over nodes. It is used in the

following example in conjunction with the -t option which tags each line of output with

the identiﬁer of the process that wrote it.

$ prun -t -n 4 -N 2 -m block hostname

0 atlas0.quadrics.com

1 atlas0.quadrics.com

2 atlas1.quadrics.com

3 atlas1.quadrics.com

$ prun -t -n 4 -N 2 -m cyclic hostname

0 atlas0.quadrics.com

2 atlas0.quadrics.com

1 atlas1.quadrics.com

3 atlas1.quadrics.com

The examples so far have used simple UNIX utilities to illustrate where processes are

run. Parallel programs are run in just the same way. The following example measures

DMA performance between a pair of processes on different nodes.

$ prun -N 2 dping 0 1k

0 bytes

1 bytes

2 bytes

4 bytes

8 bytes

2.33 uSec

3.58 uSec

3.61 uSec

2.44 uSec

2.47 uSec

2.55 uSec

2.57 uSec

3.48 uSec

4.23 uSec

4.99 uSec

6.39 uSec

9.26 uSec

0.00 MB/s

0.28 MB/s

0.55 MB/s

1.64 MB/s

3.24 MB/s

6.27 MB/s

12.45 MB/s

18.41 MB/s

30.25 MB/s

51.32 MB/s

80.08 MB/s

110.55 MB/s

16 bytes

32 bytes

64 bytes

128 bytes

256 bytes

512 bytes

1024 bytes

The -s option instructs prun to print a summary of the resources used by the job when

it ﬁnishes.

$ prun -s -N 2 dping 0 32

0 bytes

2.35 uSec

0.00 MB/s

5-18 RMS Commands

Download from Www.Somanuals.com. All Manuals Search And Download.

prun(1)

1 bytes

2 bytes

4 bytes

8 bytes

16 bytes

32 bytes

3.60 uSec

3.53 uSec

2.44 uSec

2.47 uSec

2.54 uSec

2.57 uSec

0.28 MB/s

0.57 MB/s

1.64 MB/s

3.23 MB/s

6.29 MB/s

12.46 MB/s

Allocated time

System time

Elapsed time

User time

Cpus used

1.00 secs

0.93 secs

1.99 secs

0.13 secs

Note that the allocated time (in CPU seconds) is twice the elapsed time (in seconds)

because two CPUs were allocated.

WARNINGS

In earlier versions, the -i option speciﬁed immediate mode. This functionality has been

moved to the -I option. Use of -i is now deprecated. If -i is speciﬁed without an

argument, it is interpreted as -I and the user is warned that this feature should not be

used anymore.

SEE ALSO

allocate, rinfo

RMS Commands 5-19

Download from Www.Somanuals.com. All Manuals Search And Download.

rcontrol(1)

NAME

rcontrol – Controls use of system resources

SYNOPSIS

rcontrol command [args ...] [-ehs] [-r level] [command args ...]

OPTIONS

-e

-h

Exit on the ﬁrst error.

Display the list of options.

Set reporting level.

-r level

-s

Stop and print warning on error.

command is speciﬁed as follows:

create object [=] name [configuration=val] [partition=val] [attr=val]

object may be one of: access_control, attribute,

configuration, node, partition, project, user. If an

access_control is speciﬁed, a partition must also be named to

identify the object uniquely. Similarly, if a partition is speciﬁed, a

configuration must also be named together with a list of nodes.

remove object [=] name [configuration=val] [partition=val]

object may be one of: access_control, attribute,

configuration, node, partition, project, user. If an

access_control is speciﬁed, a partition must also be named to

identify the object uniquely. If a partition is speciﬁed, a

configuration must also be named to identify the object uniquely.

configure in nodes[=] list

list speciﬁes a quoted list of nodes, such as ’atlas[1-3,6,8]’.

configure out nodes[=] list

list speciﬁes a quoted list of nodes, such as ’atlas[1-3,6,8]’.

5-20 RMS Commands

Download from Www.Somanuals.com. All Manuals Search And Download.

rcontrol(1)

start object [=] name

object may be one of: configuration, partition, server.

stop object [=] name [option [=] kill | wait]

object may be one of: configuration, partition, server. If

server is speciﬁed as the object, no option should be given.

reload object [=] name [debug [=] value]

object may be one of: partition, server.

suspend job [=] name [name ...]

job may be one of: resource, batchid.

suspend attribute [=] value [attribute [=] value ...]

Attributes of the same name are ORed together. Attributes with

different names are ANDed together. The result of the logical

expression identiﬁes a resource or set of resources as the target of the

command.

resume job [=] name [name ...]

job may be one of: resource, batchid.

resume attribute [=] value [attribute [=] value ...]

Attributes of the same name are ORed together. Attributes with

different names are ANDed together. The result of the logical

expression identiﬁes a resource or set of resources as the target of the

command.

kill job [=] name [name ...] [signal [=] sig]

job may be one of: resource, batchid.

kill attribute [=] value [attribute [=] value ...] [signal [=] sig]

Attributes of the same name are ORed together. Attributes with

different names are ANDed together. The result of the logical

expression identiﬁes a resource or set of resources as the target of the

command.

set job [=] name priority [=] value

job may be one of: resource, batchid.

set object [=] name attribute [=] value [attribute [=] value ...]

object may be one of: access_control, configuration, node,

partition, project, user.

RMS Commands 5-21

Download from Www.Somanuals.com. All Manuals Search And Download.

rcontrol(1)

set attribute [=] name val [=] value

exit

help [all | command]

show object [=] name

object may be one of: nodes, configuration, partition.

DESCRIPTION

rcontrol is used to manage the following: nodes, partitions and conﬁgurations; servers;

users and their resource requests, projects and access controls; system attributes.

rcontrol can create, start, stop and remove a conﬁguration or partition. It can create,

remove and set the attributes of nodes and conﬁgure them in and out of the machine.

Operations on nodes may specify a single host name, such as atlas4, or a list of host

names, such as ’atlas[4-7]’. Lists of host names must always be quoted.

rcontrol can start or stop an RMS server. It can also instruct a running server to

reload access control information or change its reporting level.

rcontrol can be used to suspend or resume the allocation of CPUs to a resource request,

alter its scheduling priority or send a signal to its jobs. Operations on resource requests

may specify a request by name or by using the batch system identiﬁer. Alternatively,

requests can be identiﬁed by attributes such as user name, partition, project or status.

rcontrol can be used to create or remove or to set the attributes of users, projects and

access controls. Details of which attributes can be modiﬁed in this way are speciﬁed in

the fields table in the RMS database. System attributes can also be created, removed

or have their value set.

The help command prints information on all of the commands and their arguments.

When used with the name of a command as an argument, it prints more information on

the speciﬁed command.

When used without arguments, rcontrol runs interactively. A sequence of commands

can be entered. Use the exit command or Ctrl/d to exit.

Most rcontrol commands are restricted to administrative users (root and rms users,

by default). The job control commands (suspend, resume, kill and set priority)

may also be issued by the user running the job in question.

In all of the rcontrol commands, the use of the equals sign is optional. The following

two examples – using rcontrol to conﬁgure into the system three nodes named

atlas1, atlas2 and atlas3 – are equivalent.

5-22 RMS Commands

Download from Www.Somanuals.com. All Manuals Search And Download.

rcontrol(1)

# rcontrol configure in nodes = ’atlas[1-3]’

# rcontrol configure in nodes ’atlas[1-3]’

Creating and Removing Nodes

To create a new node description, use rcontrol with the create command and the

argument node followed by the hostname of the node. Additional attribute-value pairs

specify properties of the node, such as its type and position. The attributes rack and

unit specify the position of the node in the system.

# rcontrol create node = atlas1 type = ES40 rack = 0 unit = 3

To remove a node description from the RMS database, use rcontrol with the remove

command and the argument node followed by the name of the node.

# rcontrol remove node = atlas1

Creating and Removing Partitions

RMS scheduling policy and access controls are based on partitions. Partitions are

non-overlapping sets of nodes. The set of partitions in operation at any time is called the

active conﬁguration. RMS provides for several operational conﬁgurations and includes

mechanisms for switching between them with rcontrol.

To create a new partition description, use rcontrol with the create command and the

argument partition followed by the name of the partition. In addition, you must

specify the conﬁguration to which the partition belongs. Additional attribute-value pairs

specify properties of the partition: a list of its nodes, its scheduling type, time limit, time

slice interval, memory limit or minimum number of CPUs that may be allocated. The

nodes attribute must be speciﬁed. Default values will be selected for the other

attributes if none are given.

# rcontrol create partition = p1 configuration = day nodes = ’atlas[1-4]’ type = parallel

The scheduling type attribute of the partition may be one of the following:

parallel

The partition is for the exclusive use of gang-scheduled parallel

programs.

The partition runs interactive user logins and load-balanced

sequential jobs.

general

batch

The partition runs all classes of job. This is the default partition type.

The partition is for the exclusive use of a batch system.

RMS Commands 5-23

Download from Www.Somanuals.com. All Manuals Search And Download.

rcontrol(1)

The timelimit attribute speciﬁes the maximum time in seconds for which CPUs can be

allocated on the partition. On expiry of the time limit, jobs will be sent the signal

SIGXCPU. If they have not exited within a grace period, they will be killed. The grace

period for a site is deﬁned in the attributes table (attribute name grace-period).

Its default value is 60 seconds.

The timeslice attribute speciﬁes the period in seconds for which jobs are allocated

CPUs before the CPUs may be reallocated to another job of equal priority. The default

value for timeslice is NULL, disabling time-slicing.

The memlimit attribute deﬁnes the default memory limit per CPU for applications

running on this partition. It can be overridden on a per-user or per-project basis. The

default value of memlimit is NULL, disabling memory limits unless they are set for

speciﬁc users or projects.

The mincpus attribute controls the minimum number of CPUs that may be allocated to

a job running on this partition. The default value of mincpus is 0. The maximum

number of CPUs that can be allocated is controlled on a per-user or per-project basis.

To remove a partition description from the RMS database, use rcontrol with the

remove command and the argument partition followed by the name of the partition.

You must also specify the name of the conﬁguration since the same partition name may

appear in a number of conﬁgurations. To remove an entire conﬁguration from the RMS

database, use rcontrol with the remove command and the argument configuration

followed by the name of the conﬁguration.

# rcontrol remove partition = par1 configuration = night

# rcontrol remove configuration = night

Note that partitions cannot be removed while they are in use. Similarly, the nodes and

type of a partition cannot be changed while the partition is running. If the other

attributes of a partition are changed while the partition is running, the Partition

Manager is reloaded automatically so that it uses the new information for subsequent

jobs. Jobs that are already running are not affected.

Starting and Stopping Partitions

To start a partition in the active conﬁguration, use rcontrol with the start command

and the partition argument followed by the name of the partition. To start all of the

partitions in a conﬁguration, use rcontrol with the start command and the

configuration argument followed by the name of the conﬁguration. A conﬁguration is

made active by starting it in this way.

# rcontrol start partition = par1

# rcontrol start configuration = day

5-24 RMS Commands

Download from Www.Somanuals.com. All Manuals Search And Download.

rcontrol(1)

To stop a partition in the active conﬁguration, use rcontrol with the stop command

and the partition argument followed by the name of the partition. To stop all of the

partitions in the active conﬁguration, use rcontrol with the stop command and the

configuration argument followed by the name of the conﬁguration.

When stopping partitions you can optionally specify what should happen to the running

jobs. The options are to leave them running, to wait for them to exit or to kill them. The

default is to leave them running.

# rcontrol stop partition = par1 option = kill

# rcontrol stop configuration = day option = wait

Conﬁguring Nodes In or Out

To conﬁgure a node in or out, use rcontrol with the configure in or configure

out commands. Use the nodes argument to specify the list of nodes being conﬁgured in

or out.

# rcontrol configure in nodes = ’atlas[2-4]’

# rcontrol configure out nodes = ’atlas[2,5-7]’

Note that partitions must be stopped before nodes can be conﬁgured in or out. Jobs may

be left running but any jobs running on a node while it is being conﬁgured out will be

killed. When stopping a partition, it is advisable to wait until jobs have exited (or kill

them).

Reloading Database Information

To instruct a Partition Manager to reload its access_controls, users, and projects

tables, use rcontrol with the reload command and the partition argument

followed by the name of the partition.

# rcontrol reload partition = par1

To instruct a Partition Manager to change its reporting level, use rcontrol with the

reload command and the partition argument followed by the name of the partition.

In addition, you should specify the attribute debug and a value. The Partition Manager

writes its reports to a log ﬁle in the directory /var/rms/adm/log. See Section 4.1.2 and

Section 9.6.

# rcontrol reload partition = par1 debug = 1

Managing Servers

To stop an RMS server, use rcontrol with the stop command and the server

argument followed by the name of the server. To start it again, use rcontrol with the

RMS Commands 5-25

Download from Www.Somanuals.com. All Manuals Search And Download.

rcontrol(1)

start command, the server argument and the name of the server. The command

rinfo (with the -s ﬂag) can be used to show the status of the RMS servers.

To instruct an RMS server to change its reporting level, use the reload command and

the server argument with the name of the server. In addition, you should specify the

attribute debug and a value. RMS servers write their log ﬁles to the directory

/var/rms/adm/log on the rmshost. See Section 9.6.

# rcontrol stop server = mmanager

# rcontrol start server = mmanager

# rcontrol reload server = mmanager debug = 1

Managing Resources

To instruct the scheduler to suspend the allocation of CPUs to a resource request, use

rcontrol with the suspend command followed by either the name of the resource or

the batch system’s identiﬁer for the request. This suspends jobs running on the allocated

CPUs and decrements the user’s CPU usage count.

# rcontrol suspend resource = 2234

# rcontrol suspend batchid = 14

Note that a resource request that has been suspended by an administrative user cannot

be resumed by its owner.

To instruct the scheduler to resume the allocation of CPUs to a resource request, use

rcontrol with the resume command followed by either the name of the resource or the

batch system’s identiﬁer for the request. This reschedules jobs that were running on the

allocated CPUs, unless doing so would cause the user’s CPU usage limit to be exceeded.

# rcontrol resume resource = 2267

# rcontrol resume batchid = 384

To instruct RMS to send a signal to the jobs running on an allocated resource request,

use rcontrol with the kill command followed by either the name of the resource or

the batch system’s identiﬁer for the request. This kills the jobs running on the allocated

CPUs (by sending the signal SIGKILL to each process). The optional attribute signal

can be used to send a speciﬁc signal. For example, to send the signal SIGTERM:

# rcontrol kill resource = 9835 signal = 15

# rcontrol kill batchid = 396 signal = 15

To instruct the scheduler to change the priority of a resource request, use rcontrol

with the set command and the resource argument followed by either the name of the

resource or the batch system’s identiﬁer for the request. In addition, you should specify

the attribute priority and the new value. Priority values range from 0 to 100 (default

50).

5-26 RMS Commands

Download from Www.Somanuals.com. All Manuals Search And Download.

rcontrol(1)

# rcontrol set resource = 32 priority = 25

# rcontrol set batchid = 48 priority = 40

rcontrol can also be used to suspend, kill or resume jobs identiﬁed by their attributes.

The attributes that can be speciﬁed are: partition, project, status and user.

Attributes of the same name are ORed together, attributes with different names are

ANDed.

For example, to kill a job run by a user called tom on the partition par1 whether its

status is blocked or queued:

# rcontrol kill user = tom status = blocked status = queued partition = par1

To suspend all of the jobs belonging to the project called science:

# rcontrol suspend project = science

Managing Users, Projects and Access Controls

In addition to managing partitions and nodes, rcontrol can be used to create, remove

and set the attributes of users, projects and access controls. The fields table contains

details of which objects and attributes may be modiﬁed. See Section 10.2.8.

The table has seven ﬁelds: the tablename ﬁeld speciﬁes the table that will be modiﬁed;

the name ﬁeld speciﬁes which entry in the named table will be modiﬁed; the type ﬁeld

determines the range of valid values; the min ﬁeld gives the minimum for values of type

integer while the max ﬁeld gives the maximum; the textattr ﬁeld either gives a

comma-separated list of valid values or a table-name.table-field pair. In the case

of the table-name.table-field pair, the value in the name ﬁeld of the fields table

must also be present in the table named table-name in the ﬁeld called table-field.

The access ﬁeld speciﬁes whether this ﬁeld can be updated by the system

administrator.

To create a user, use the rcontrol create command to specify the object type (in this

case, user) and the object name (for example, frank).

# rcontrol create user = frank

To update an existing user record, use the rcontrol set command. For example, to

change the projects to which a user belongs, use rcontrol set followed by the object

type (in this case, user), the object name (in this example, frank), the attribute to be

changed (projects), and its new value (in this example, parallax); the new value

must already have been deﬁned as a project.

# rcontrol set user = frank projects = parallax

RMS Commands 5-27

Download from Www.Somanuals.com. All Manuals Search And Download.

rcontrol(1)

Note that a user can be in more than one project in which case the value would be a

comma-separated list:

# rcontrol set user = frank projects = parallax,science

To create an access control called, for example, science, in the par1 partition, use

rcontrol with the create command followed by the type of the object, its name and

the name of the partition. Additional attribute-value pairs specify attributes of the

access control, for example, its class.

# rcontrol create access_control = science partition = par1 class = project

Just as partitions require a conﬁguration name to identify them uniquely, access

controls require a partition name.

To set the attributes of an object, use rcontrol with the set command followed by the

name of the object. Specify the name of the attribute and the required value. An

attribute’s value can be set to null by entering NULL, Null or null as the value.

# rcontrol set access_control = std partition = par1 priority=75 memlimit=NULL

To remove an object, use rcontrol with the remove command and the name of the

object.

# rcontrol remove user = frank

# rcontrol remove access_control = science partition = par1

After changing user, project or access control information, the Partition Managers must

be reloaded so that they use the new information.

# rcontrol reload partition = par1

Jobs that were already running will not be affected by any change to resource limits

except that they may be suspended if the new CPU usage limits are lower than before.

Setting System Attributes

System attributes can be created, removed or set using rcontrol create, remove and

set.

# rcontrol create attribute = name val=value

# rcontrol remove attribute = name

# rcontrol set attribute = name val=value

Any system attributes can be modiﬁed in this way but there are some, mentioned below,

whose values are checked if they are created or set. (See Section 10.2.3).

5-28 RMS Commands

Download from Www.Somanuals.com. All Manuals Search And Download.

rcontrol(1)

The attribute pmanager-queuedepth limits the number of resource requests that a

Partition Manager will handle at any time. If the attribute is undeﬁned or set to NULL or

0, no limit is imposed. By default, it is set to 0.

If a limit is set and reached, subsequent resource requests by prun will block or, if the

immediate option to prun is set, fail. The blocked requests will not appear in the RMS

database.

To set the pmanager-queuedepth attribute, use rcontrol with the set command.

Specify attribute, give the attribute name and set the val argument to the required

value.

# rcontrol set attribute = pmanager-queuedepth val = 20

If you set a limit while the partition is running, you should also reload the partition to

make the limit take effect.

# rcontrol reload partition = par1

The attribute pmanager-idletimeout limits the amount of time an allocated resource

may remain idle. If a resource request exceeds this limit, it will time out with an exit

status of 125 and allocate will exit with the following message:

allocate: Error: idle timeout expired for resource allocation

If the attribute is undeﬁned or set to NULL, no limit is imposed. By default, it is not set.

To set a limit, use rcontrol with the set argument. Specify attribute, give the

attribute name and set the val argument to the required timeout value in seconds.

# rcontrol set attribute = pmanager-idletimeout val = 5

If you set a time limit while the partition is running, you should also reload the partition

to make the limit take effect.

# rcontrol reload partition = par1

The attribute default-priority determines the default priority given to resource

requests. Priorities may range from 0 to 100. The default is 50.

To set the default-priority attribute, use rcontrol with the set command. Specify

attribute, give the attribute name and set the val argument to the required value.

# rcontrol set attribute = default-priority val = 75

The attribute grace-period speciﬁes the amount of time in seconds that jobs are given

to exit after they have exceeded their time limit and received a signal to quit. It may be

set to any value between 0 and 3600, the default being 60.

RMS Commands 5-29

Download from Www.Somanuals.com. All Manuals Search And Download.

rcontrol(1)

The attribute cpu-poll-stats-interval speciﬁes the interval between successive

polls for gathering node statistics. The interval is speciﬁed in seconds and must be in

the range 0 to 86400 (1 day).

The attribute rms-keep-core determines whether core ﬁles are deleted or saved. By

default, it is set to 1 so that core ﬁles are saved. Change this to 0 to delete core ﬁles. The

attribute local-corepath speciﬁes the directory in which core ﬁles are saved. By

default, it is set to /local/core/rms.

EXAMPLES

The following command line creates a partition called par1 with eight nodes called

atlas1, atlas2 and so on in the conﬁguration called day.

# rcontrol create partition=par1 configuration=day nodes=’atlas[1-8]’

The partition is started and stopped as follows:

# rcontrol start partition = par1

# rcontrol stop partition = par1

Stopping the partition in this way will leave the jobs running. Alternatives are to wait

for them to exit or to kill them.

# rcontrol stop partition = par1 option = wait

# rcontrol stop partition = par1 option = kill

If the system has several operating conﬁgurations, for example, one for the prime shift

(called day) and another for evening and weekends (called night) then the set of

partitions making up a conﬁguration can be started and stopped together:

# rcontrol stop configuration = day

# rcontrol start configuration = night

To suspend or resume the jobs running on a speciﬁed resource:

# rcontrol suspend resource = 2212

# rcontrol resume resource = 2212

To set the priority of a resource:

# rcontrol set resource = 2212 priority = 4

To kill the jobs running on some speciﬁed resources:

5-30 RMS Commands

Download from Www.Somanuals.com. All Manuals Search And Download.

rcontrol(1)

# rcontrol kill resource = 2212 2213

# rcontrol kill batchid = 44 45

To instruct a Partition Manager to reread the user, projects and access_controls

tables:

# rcontrol reload partition = par1

To enable debug reporting from the RMS scheduler for the partition called par1:

# rcontrol reload partition = par1 debug = 41

RMS Commands 5-31

Download from Www.Somanuals.com. All Manuals Search And Download.

rinfo(1)

NAME

rinfo – Displays resource usage and availability information for parallel jobs

SYNOPSIS

rinfo [-chjlmnpqr] [-L [partition] [statistic]] [-s daemon

[hostname] | all] [-t node | name]

OPTIONS

-c

-h

-j

-l

-m

-n

-p

List the conﬁguration names.

Display the list of options.

List current jobs.

Give more detailed information.

Show the machine name.

Show the status of each node. This can be combined with -l.

Identify each active partition by name and indicate the number of

CPUs in each partition.

-q

-r

Print information on the user’s quotas and projects.

Show the allocated resources.

-L [partition] [statistic]

Print the hostname of a lightly loaded node in the machine or the

speciﬁed partition. RMS provides a load-balancing service,

accessible through rmsexec, that enables users to run their processes

on lightly loaded nodes, where loading is evaluated according to a

given statistic. (See Page 5-39).

-s daemon [hostname] | all

Show the status of the daemon. Used with the argument all, rinfo

shows the status of all daemons running on the rmshost node. For

daemons that run on multiple nodes, such as rmsd, the optional

hostname argument speciﬁes the hostname of the node on which the

daemon is running.

5-32 RMS Commands

Download from Www.Somanuals.com. All Manuals Search And Download.

rinfo(1)

-t node | name

Where node is the network identiﬁer of a node, rinfo translates it

into the hostname; where name is a hostname, rinfo translates it

into the network identiﬁer. See Section A.1 for more information on

network identiﬁers.

DESCRIPTION

The rinfo program displays information about resource usage and availability. Its

default output is in four parts that identify: the machine, the active conﬁguration,

resource requests and the current jobs. Note that the latter sections are only displayed if

jobs are active.

$ rinfo

MACHINE

atlas

CONFIGURATION

day

PARTITION

root

parallel

CPUS

2/4

STATUS

running

STATUS

TIME

TIMELIMIT NODES

atlas[0-2]

atlas[0-1]

01:02:29

RESOURCE

CPUS

TIME

USERNAME NODES

parallel.996

2 allocated

00:05

user atlas0

JOB

CPUS

STATUS

running

TIME

00:04

USERNAME NODES

user atlas0

parallel.1115

The machine section gives the name of the machine and the active conﬁguration.

For each partition in the active conﬁguration, rinfo shows the number of CPUs in use,

the total number of CPUs, the partition status, the time since the partition was started,

any CPU time limits imposed on jobs, and the node names. This information is extracted

from the partitions table. See Section 10.2.16. The description of the root partition

shows the resources of the whole machine.

The resource section identiﬁes the resource allocated to the user, the number of CPUs

that the resource includes, the user name, the node names and the status of the

resource. The time ﬁeld speciﬁes how long the resource has been held in hours, minutes

and seconds.

The job section gives the job identiﬁer, the user name, the number of CPUs the job is

using, on which nodes and the status of the job. The time ﬁeld speciﬁes how long the job

has been running in hours, minutes and seconds.

RMS Commands 5-33

Download from Www.Somanuals.com. All Manuals Search And Download.

rinfo(1)

EXAMPLES

When used with the -q ﬂag, rinfo prints information on the user’s projects, CPU usage

limits, memory limits and priorities.

$ rinfo -q

PARTITION

parallel

CLASS

project

NAME

default

divisionA 16/64

CPUS

0/8

MEMLIMIT

100

PRIORITY

none

In this example, the access controls allow any user to run jobs on up to 8 CPUs with a

memory limit of 100MB. Jobs submitted for the divisionA project run at priority 1,

have no memory limit and can use up to 64 CPUs. 16 of these 64 CPUs are in use.

When used with the -s option, rinfo prints information on the status of the RMS

servers.

$ rinfo -l -s all

SERVER

tlogmgr

eventmgr

mmanager

swmgr

HOSTNAME

rmshost

STATUS

PID

running

239241

239246

239260

239252

239175

pmanager-parallel

$ rinfo -l -s rmsd

SERVER

rmsd

HOSTNAME

atlas0

atlas1

atlas2

atlas3

atlasms

STATUS

PID

running

740600

1054968

1580438

2143669

239212

In the above example, the system is functioning correctly. In the following example, one

of the nodes has crashed.

$ rinfo -l -s rmsd

SERVER

rmsd

HOSTNAME

atlas0

atlas1

atlas2

atlas3

atlasms

STATUS

running

not responding

running

PID

740600

1054968

2143669

239212

running

SEE ALSO

allocate, prun

5-34 RMS Commands

Download from Www.Somanuals.com. All Manuals Search And Download.

rmsbuild(1)

NAME

rmsbuild – Creates and populates an RMS database

SYNOPSIS

rmsbuild [-dhv] [-I list] [-m machine] [-n nodes | -N list]

[-p ports] [-t type]

OPTIONS

-d

-h

Create a demonstration database.

Display the list of options.

-I list

Speciﬁes the names of any interactive nodes.

Speciﬁes a name for the machine.

-m machine

-n nodes

-N list

-p ports

-t type

-v

Speciﬁes the number of nodes in the machine.

Speciﬁes the nodes in the machine by name.

Speciﬁes the number of ports on a terminal server (default 32).

Speciﬁes the node type.

Speciﬁes verbose operation.

Nodes can be speciﬁed by number (-n) or by name (-N) but not both. Lists of node names

should be quoted, for example ’atlas[0-15]’

DESCRIPTION

rmsbuild creates a database for a machine of a given size, adding default entries to the

nodes table and modules table. For detailed information on these tables see

Section 10.2.14 and Section 10.2.12 respectively.

rmsbuild is used during the initial installation of a machine. It should be run on the

rmshost node. rmsbuild runs rmstbladm to create a new database or update an

existing one. (See Page 5-44).

RMS Commands 5-35

Download from Www.Somanuals.com. All Manuals Search And Download.

rmsbuild(1)

Detailed information about each node (number of CPUs, amount of memory and so on) is

added later by rmsd as it starts on each node.

The machine name is speciﬁed with the -m option. Machines should be given a short

name that does not end a digit. Node names are generated by appending a number to

the machine name.

Database entries for the nodes are generated by the -n or -N options. Use -n with a

number to generate entries for nodes 0 through n-1. Use -N to generate entries for a

named list of nodes such as atlas[4-8].

Some systems include a management server. You should use the -I option to specify the

management server name and create a description of the management server in the

RMS database. To devise the management server name, append the letters ms to the

machine name; for example, atlasms.

rmsbuild is run after the system is installed, creating database entries for all installed

nodes. Additional entries can be added later if further nodes are installed.

If the demonstration mode is selected with the -d option, rmsbuild constructs the

entries for a demonstration database; that is to say, a database that does not necessarily

correspond to the physical resources of the system. Attributes of the nodes that would

normally be set by rmsd are set to representative values and a default partition is

created. The -d option is primarily for testing purposes but can be useful when

demonstrating RMS. When creating such a database, you should take care to give it a

different name from that of your system.

EXAMPLES

To create a description of a 64-node system called atlas with one management server,

use rmsbuild as follows:

# rmsbuild -m atlas -I ’atlasms’ -N ’atlas[0-63]’

To create a machine description for a 128-node system called demo, use rmsbuild as

follows:

# rmsbuild -d -m demo -n 128

SEE ALSO

rmstbladm, msqladmin

5-36 RMS Commands

Download from Www.Somanuals.com. All Manuals Search And Download.

rmsctl(1)

NAME

rmsctl – Stops, starts or shows the status of the RMS system.

SYNOPSIS

rmsctl [-aehv] [start | stop | restart | show]

OPTIONS

-a

-e

-h

-v

Show all servers, when used with the show command.

Only show errors, when used with the show command.

Display the list of options.

Verbose operation

DESCRIPTION

The rmsctl script is used to start, stop or restart the RMS system on all nodes in a

machine, and to show status information.

rmsctl starts and stops RMS by executing the /sbin/init.d/rms script on each node.

Note that rsh must be enabled for root users in order for this to function correctly.

rmsctl start starts all of the partitions in the active conﬁguration and sets their

autostart ﬁelds in the servers table to 1. rmsctl stop stops all of the partitions and

sets the autostart ﬁelds to 0. (See Section 10.2.19).

This contrasts with the behavior of the /sbin/init.d/rms script, run from the

rmshost node, which preserves the current state of the active conﬁguration over a

stop/start cycle. (See Section 9.3.1).

When used with the command show, rmsctl shows the current status of the system.

EXAMPLES

To stop the RMS system, use rmsctl as follows:

# rmsctl stop

RMS service stopped on atlas1

RMS Commands 5-37

Download from Www.Somanuals.com. All Manuals Search And Download.

rmsctl(1)

RMS service stopped on atlas0

RMS service stopped on atlas3

RMS service stopped on atlas2

RMS service stopped on atlasms

To start the RMS system, use rmsctl as follows:

# rmsctl stop

RMS service started on atlas0

RMS service started on atlas1

RMS service started on atlasms

RMS service started on atlas2

RMS service started on atlas3

pmanager-parallel: cpus=16 (4 per node) maxfree=4096MB swap=5171MB no memory limits

pstartup.OSF1: general partition parallel starting

pstartup.OSF1: enabling login on partition parallel

Enabling login on node atlas1.quadrics.com

Enabling login on node atlas3.quadrics.com

Enabling login on node atlas0.quadrics.com

Enabling login on node atlas2.quadrics.com

To show the status of the RMS system, use rmsctl as follows:

# rmsctl show

SERVER

tlogmgr

eventmgr

mmanager

swmgr

HOSTNAME

rmshost

STATUS

PID

778

780

789

799

33357

running

pmanager-parallel

STATUS

NODES

running

atlas[0-3] atlasms

CPUS

NODES

atlas[0-3] atlasms

MEMORY

4096

1024

NODES

atlas[0-3]

atlasms

SWAP SPACE

5171

NODES

atlas0[0-3] atlasms

TMP SPACE

6032

5703

NODES

atlas[0-3]

atlasms

SEE ALSO

rcontrol

5-38 RMS Commands

Download from Www.Somanuals.com. All Manuals Search And Download.

rmsexec(1)

NAME

rmsexec – Runs a sequential program on a lightly loaded node

SYNOPSIS

rmsexec [-hv] [-p partition] [-s stat] [hostname] program [args ...]

OPTIONS

-h

-v

Display the list of options.

Speciﬁes verbose operation.

-p partition Speciﬁes the target partition. The request will fail if load-balancing is

not enabled on the partition. (See Section 10.2.16).

-s stat

Speciﬁes the statistic on which to base the load-balancing calculation

(see below).

DESCRIPTION

The rmsexec program provides a mechanism for running sequential programs on

lightly loaded nodes – nodes, for example, with free memory or low CPU usage. It locates

a suitable node and then runs the program on it.

The user can select a node from a speciﬁc partition (of type login or general) with the

-p option. Without the -p option, rmsexec uses the default load-balancing partition

(speciﬁed with the lbal-partition attribute in the attributes table). In addition,

the hostname of the node can be speciﬁed explicitly. The request will fail if this node is

not available to the user. System administrators may select any node.

The -s option can be used to specify a statistic on which to base the loading calculation.

Available statistics are:

usercpu

syscpu

Percentage of CPU time spent in the user state.

Percentage of CPU time spent in the system state - a measure of the

I/O load on a node.

idlecpu

Percentage of CPU time spent in the idle state.

RMS Commands 5-39

Download from Www.Somanuals.com. All Manuals Search And Download.

rmsexec(1)

freemem

users

Free memory in megabytes.

Lowest number of users.

By default, usercpu is used as the statistic. Statistics can be used on their own, in

which case a node is chosen that is lightly loaded according to this statistic, or you can

specify a threshold using statistic < value or statistic > value

EXAMPLES

Some examples follow:

$ rmsexec -s usercpu myprog

$ rmsexec -s "usercpu < 50" myprog

$ rmsexec -s "freemem > 256" myprog

SEE ALSO

rinfo

5-40 RMS Commands

Download from Www.Somanuals.com. All Manuals Search And Download.

rmshost(1)

NAME

rmshost – Prints the name of the node running the RMS management daemons

SYNOPSIS

rmshost [-hl]

OPTIONS

-h

-l

Display the list of options.

Prints the fully qualiﬁed domain name.

DESCRIPTION

The rmshost command prints the name of the node that is running (or should run) the

RMS management daemons. It is used by the RMS system.

RMS Commands 5-41

Download from Www.Somanuals.com. All Manuals Search And Download.

rmsquery(1)

NAME

rmsquery – Submits SQL queries to the RMS database

SYNOPSIS

rmsquery [-huv] [-d name] [-m machine] [SQLquery]

OPTIONS

-d name

Select database by name.

-h

Display the list of options.

Select database by machine name.

-m machine

-u

Print dates as seconds since January 1st 1970. The default is to print

dates as a string created with localtime(3).

-v

Verbosely prints ﬁeld names above each column of output.

DESCRIPTION

rmsquery is used to submit SQL queries to the RMS database. Users are restricted to

using the select statement to extract information from the database. System

administrators may also submit SQL statements that update the database: create,

delete, drop, insert and update. Note that queries modifying the database are

logged.

When used without arguments, rmsquery operates interactively and a sequence of

commands can be issued.

When used interactively, rmsquery supports GNU readline and history mechanisms.

Type history to see recent commands, use Ctrl/p and Ctrl/n to step back and

forward through them. The tables command lists the tables in the selected database.

The command fields followed by the name of a table lists the ﬁelds in a table. The

command verbose toggles printing of ﬁeld names. To quit interactive mode, type

Ctrl/d or exit or quit.

rmsquery is distributed under the terms of the GNU General Public License. See

http://www.gnu.org for details and more information on GNU readline and history.

5-42 RMS Commands

Download from Www.Somanuals.com. All Manuals Search And Download.

rmsquery(1)

The source is provided in /usr/opt/rms/src. Details of the SQL language can be

found on the Quadrics support page http://www.quadrics.com/web/support.

EXAMPLES

An example follows of a select statement that results in a list of the names of all of the

nodes in the machine. Note that the query must be quoted. This is because rmsquery

expects a single argument.

$ rmsquery "select name from nodes"

atlas0

atlas1

atlas2

atlas3

In the following example, rmsquery is used to print information on all jobs run by a

user:

$ rmsquery "select name,status,hostnames,cpus,startTime,endTime from \

resources where username=’user’"

finished atlas[0-3]

finished atlas0

finished atlas[0-3]

12/21/99 11:16:44 12/21/99 11:16:46

12/21/99 11:54:23 12/21/99 11:54:29

12/21/99 11:54:35 12/21/99 11:54:39

The -v option prints ﬁeld names. In the following example, rmsquery is used to print

resource usage statistics:

$ rmsquery -v "select * from acctstats"

name uid

project started

etime

atime

utime

stime ...

-----------------------------------------------------------------------

1507

12/21/99 11:16:44

12/21/99 11:54:23

12/21/99 11:54:35

2.00

6.65

4.27

8.00

13.30

16.63

0.10

10.62

12.28

0.22 ...

0.10 ...

0.44 ...

When used without arguments, rmsquery operates interactively and a sequence of

commands can be issued.

$ rmsquery -v

sql> select name, status from partitions

name

status

------------------

parallel

sql>

running

RMS Commands 5-43

Download from Www.Somanuals.com. All Manuals Search And Download.

rmstbladm(1)

NAME

rmstbladm – Database administration

SYNOPSIS

rmstbladm [-BcdDfhmuv] [-r file] [-t table] [machine]

OPTIONS

-B

Dump the ﬁrst ﬁve rows of each table to stdout as a sequence of SQL

statements. A speciﬁc table can be dumped if the -t option is used.

-c

Clean out old entries from the node statistics (node_stats) table, the

resources table, the events table and the jobs table. (See

Chapter 10 (The RMS Database). rmstbladm uses the

data-lifetime and stats-lifetime attributes, speciﬁed in the

attributes table, to determine how many entries are to be removed.

The default is to keep statistics for 24 hours and job descriptions for

48 hours.

-d

-D

-f

Dump the contents of the database to stdout as a sequence of SQL

statements. A speciﬁc table can be dumped if the -t option is used.

Dump the contents of the database to stdout as plain text. A speciﬁc

table can be dumped if the -t option is used.

Recreate the database from scratch. A speciﬁc table can be recreated

if the -t option is used.

-h

-m

Displays the list of options.

Displays the names of machines in the RMS databases managed by

the msqld server.

-u

-v

By default, rmstbladm checks the consistency of the database. If the

-u ﬂag is speciﬁed, the database is updated to the current revision

level. A speciﬁc table can be updated if the -t option is used.

Speciﬁes verbose operation.

-r file

Restore database tables from the named ﬁle.

Speciﬁes a single table to be worked on.

-t table

5-44 RMS Commands

Download from Www.Somanuals.com. All Manuals Search And Download.

rmstbladm(1)

DESCRIPTION

The command rmstbladm is used to administer the RMS database. It creates the tables

and their default entries. It can be used to back up individual tables (or the whole

database) to a text ﬁle, to restore tables from ﬁle or to force the recreation of tables.

Unless a speciﬁc machine is speciﬁed, rmstbladm operates on the database of the host

machine.

When installing or upgrading a system, rmstbladm is used to check the consistency of

the database, to change the structure of the tables and to add default entries. Once the

system is installed and working correctly, the database should be backed up using

rmstbladm with the -d option. The backup should be kept safely so that the database

can be restored later should this prove necessary.

Certain tables in the RMS database (the resources, jobs, events, acctstats and

node_stats tables in particular) grow over time and as each job is run. To remove old

entries from the database, use rmstbladm with the -c option. Note that this does not

remove entries from the accounting statistics table. These should be removed once the

accounting data has been processed. (See Section 9.4.5).

Access to rmstbladm options that update the database is restricted to administrative

users.

EXAMPLES

To backup the contents of the RMS database or a selected table to a text ﬁle as a

sequence of SQL statements:

$ rmstbladm -d > backup_full.sql

$ rmstbladm -d -t nodes > backup_nodes.sql

To update the database on installing a new version of RMS:

$ rmstbladm -u

RMS Commands 5-45

Download from Www.Somanuals.com. All Manuals Search And Download.

Access Control, Usage Limits and

Accounting

6.1 Introduction

RMS access controls and usage limits operate on a per-user or per-project basis (a project

is a list of named users). Each partition may have its own controls. This mechanism

allows system administrators to control the way in which the resources of a machine are

allocated amongst the user community.

RMS accounts for resource usage by user and by project. As each request is allocated

CPUs, an accounting record is created, containing the uid of the user, the project name,

the resource identiﬁer and information on resource usage (see Section 6.5). This record

is updated periodically while the CPUs remain allocated.

6.2 Users and Projects

When a system is ﬁrst installed, there is only one project, called the default project. All

users are members of this project and anyone who has logged into the system can

request all of the CPUs. This simple setup is intended for a single class of cooperating

users.

To account for resource usage by user or by project, the administrator must create

additional user and project records in the RMS database. To control the resource usage

of individuals or groups of users, the administrator must, in addition, create access

Access Control, Usage Limits and Accounting 6-1

Download from Www.Somanuals.com. All Manuals Search And Download.

Access Controls

control records.

When submitting requests for CPUs, users can select any project of which they are a

member (by setting the RMS_PROJECT environment variable or by using the -P ﬂag

when executing prun or allocate). RMS rejects requests to use projects that do not

exist or requests to use projects of which the user is not a member. Users without an

RMS user record are subject to the constraints on the default project.

In general, each user is a member of several projects, while projects may have many

users. Membership of a project is speciﬁed in the users table with the projects ﬁeld

(see Section 10.2.24). The value of projects may be either a single name or list of

project names, separated by commas or space. The wildcard character, *, may be entered

as a project name, denoting that the user is a member of all projects. The ordering of the

names in the list is signiﬁcant: the ﬁrst project speciﬁed becomes the user’s default

project.

User and project records are created by the system administrator and stored in the

users and projects tables (see Section 10.2.24 and Section 10.2.17).

6.3 Access Controls

Access control records specify the maximum resource usage of a user or project on a

given partition. They are created by the system administrator using rcontrol or

rmsquery and stored in the access_controls table (see Section 10.2.1).

Each entry speciﬁes the following attributes:

name

The name of the user or project.

class

Whether the entry refers to a user or a project.

The partition to which the access control applies.

partition

priority

The default priority of requests submitted by this user or project.

Priorities range from 0, the lowest priority, to 100. The default is 50.

maxcpus

memlimit

The total number of CPUs that this user or project can have allocated

at any time.

The maximum amount of memory in megabytes per CPU that can be

allocated.

A suspended request does not count against a user’s or project’s maximum number of

CPUs. However, when the request is resumed, a usage check is performed and the

request is blocked if starting it would take the user or project over their usage limit.

6-2 Access Control, Usage Limits and Accounting

Download from Www.Somanuals.com. All Manuals Search And Download.

Access Controls

The access controls for individual users must set lower limits than those of the projects

of which they are a member. That is to say, they must have a lower priority, smaller

number of CPUs, smaller memory limit and so on than the access control record for the

project. Where a memory limit exists for a user or project, it takes precedence over any

default limit set on the partition (see Section 10.2.16).

When the system is installed, there are no access control records. If there is no default

access control record in the database when a Partition Manager starts, it creates one

using information from the partition. The memory limit is set to that of the partition,

the priority is 0 and the CPU usage limit is equal to the number of CPUs in the partition.

If the partition has no memory limit then all jobs run with memory limits disabled until

access control records are created.

6.3.1 Access Controls Example

To illustrate how the RMS access controls mechanism works, we consider an example in

which a system is primarily intended for use by Jim, Mary and John, members of the

project called design. When they are not using the system, anyone else can submit small

jobs.

First, create a project record for design:

rcontrol create project = design description = "System Design Team"

name

design

id description

System Design Team

Now create user records for Jim, Mary and John:

rcontrol create user = jim project = design

rcontrol create user = mary project = design

rcontrol create user = john project = design

name projects

jim

design

mary design

john design

Now create access controls for the design project and for the default project (all other

users):

rcontrol create access_control = design class = project partition = \

Access Control, Usage Limits and Accounting 6-3

Download from Www.Somanuals.com. All Manuals Search And Download.

How Access Controls are Applied

parallel priority = 5

rcontrol create access_control = default class = project partition = \

parallel priority = 0 memlimit = 256

name

class

partition

priority maxcpus memlimit

design

project parallel

Null

256

default project parallel

Requests submitted by Jim, Mary and John run at priority 5, causing other users’ jobs to

be suspended if running. These requests are not subject to CPU or memory limits.

Requests submitted by other users run at priority 0 and are subject to a memory limit of

256MB per CPU. Note that on a system with 4 CPUs and 4GB of memory per node, it

would be necessary for each node to have at least 5GB of swap space to ensure that jobs

submitted by the design group were not blocked by other users (see Section 7.4.2 for

details).

In this example, we have not set the maxcpus limit as we do not mind how many CPUs

the users allocate. This limit could be set if there were two groups of users of equal

priority and you wanted to bound the number of CPUs that each could allocate.

6.4 How Access Controls are Applied

The rules governing memory limits, priority values and CPU usage limits are described

in more detail in the following sections.

6.4.1 Memory Limit Rules

Memory limits for a resource request are derived by applying the following rules in

sequence until an access control record with a memory limit is found.

1. The root user has no memory limits.

2. If the user has an access control record for the partition, the memory limit in the

access control record applies.

3. The access control record for the user’s current project determines the memory limit.

4. The access control record for the default project determines the memory limit.

Having selected an access control record, the memory limit for the program is set by the

value of its memlimit ﬁeld. A null value disables memory limits. Other values are

interpreted as the memory limit in megabytes for each CPU. A process with one CPU

6-4 Access Control, Usage Limits and Accounting

Download from Www.Somanuals.com. All Manuals Search And Download.

How Access Controls are Applied

allocated has its memory limits set to this value. A process with more than one CPU

allocated has proportionately higher memory limits.

The RMS_MEMLIMIT environment variable can be used to reduce the memory limit set by

the system, but not to raise it.

By default, the memory limit is capped by the minimum value for any node in the

partition of the smaller of these two amounts:

1. The amount of memory on the node.

2. The amount of swap space.

If lazy swap allocation is enabled (see Section 7.4.2), the memory limit is capped by the

minimum value for any node in the partition of the amount of memory per node.

6.4.2 Priority Rules

The priority of a resource request is derived by applying the following rules in sequence

until an access control record with a priority is found.

1. The root user has priority over all other users.

2. If the user has an access control record for the partition then this record determines

the priority.

3. The access control record for the user’s current project determines the priority.

4. The access control record for the default project determines the priority.

Having selected an access control record, the priority of the resource request is set by

the value of its priority ﬁeld. A null value sets the priority to 50, the default. Higher

priority jobs are scheduled ﬁrst. The user can instruct rcontrol to lower the initial

priority but not to raise it. An administrator can raise or lower priorities.

6.4.3 CPU Usage Limit Rules

RMS keeps track of the number of CPUs in use by each user and each project. A request

to allocate additional CPUs is blocked if it would cause the usage limit for the user or the

usage limit for the user’s current project to be exceeded. The request remains blocked

until the user or other users in the user’s current project free enough CPUs to allow the

request to be granted. The CPUs can be freed either because the resources are

deallocated or because the user suspends the resource using rcontrol.

The CPU usage limit is derived by applying the following rules in sequence until an

access control record with a CPU usage limit is found.

Access Control, Usage Limits and Accounting 6-5

Download from Www.Somanuals.com. All Manuals Search And Download.

Accounting

1. No CPU usage limits are set on jobs run by the root user.

2. If the user has an access control record for the partition, the CPU usage limit is

determined by the maxcpus ﬁeld in this record.

3. The access control record for the user’s current project determines the CPU usage

limit.

4. The access control record for the default project determines the CPU usage limit.

CPU usage limits can be set to a higher value than the actual number of CPUs available

in the partition. This is useful if gang scheduling and time slicing are in operation on

the partition. For example, if a partition has 16 CPUs and the usage limit for a given

user is 32 then RMS will allow two 16 CPU jobs to run (see Section 7.4.3 for details).

6.5 Accounting

As each request is allocated CPUs, an entry is added to the accounting statistics

(acctstats) table (see Section 10.2.2) specifying the following details about the job:

name

uid

Resource name (see Section 10.2.18).

Identiﬁer of the user.

project

started

etime

atime

Name of the user’s current project.

Time at which resources were allocated (UTC).

Elapsed time (in seconds) since CPUs were allocated.

Time (in CPU seconds) for which CPUs have been allocated. Note that

atime stops ticking while a request is suspended.

utime

stime

Time (in seconds) for which processes were executing in user state.

Time (in seconds) for which processes were executing in system state.

Number of CPUs allocated.

cpus

mem

Maximum memory extent of the program in megabytes.

Number of page faults requiring I/O summed over processes.

Memory integral for the program in megabyte hours.

Set to show that the CPUs are in use.

pageﬂts

memint

running

6-6 Access Control, Usage Limits and Accounting

Download from Www.Somanuals.com. All Manuals Search And Download.

Accounting

Accounting records are updated periodically until the CPUs are deallocated. The

running ﬂag is set to 0 at this point.

The atime statistic is summed over all CPUs allocated to the resource request. The

utime and stime statistics are accumulated over all processes in all jobs running on the

allocated CPUs.

Note

The memint statistics are not implemented in the current release. All values

for this ﬁelds are 0.

Access Control, Usage Limits and Accounting 6-7

Download from Www.Somanuals.com. All Manuals Search And Download.

RMS Scheduling

7.1 Introduction

The Partition Manager (see Section 4.4) is responsible for scheduling resource requests

and enforcing usage limits. This chapter describes the RMS scheduling policies and

explains how the Partition Manager responds to resource requests.

7.2 Scheduling Policies

The scheduling policy in use on a partition is controlled by the type attribute of the

partition. The type attribute can take one of four values:

Normal UNIX time-sharing applies. This scheduling policy is used for

partitions that do not run parallel programs, such as interactive login

partitions.

In addition, RMS supports load-balanced sequential processing,

whereby users can request to have sequential programs executed on a

lightly loaded node. Load balancing is enabled on a per-partition basis

by an entry in the partitions table (see Section 10.2.16). rmsexec

(see Page 5-39) can be used to run a program with load balancing.

parallel

A gang scheduling policy is used. This is for partitions intended for

production runs of parallel programs. With gang scheduling, the

scheduling decisions apply to all processes in a parallel program

RMS Scheduling 7-1

Download from Www.Somanuals.com. All Manuals Search And Download.

Scheduling Constraints

together. That is to say, all of the processes in a program are either

running or suspended at the same time.

Gang scheduling is required for tightly coupled parallel programs

which communicate frequently. It becomes increasingly important as

the rate of interprocess communication increases. For example, if a

program is executing a barrier synchronization, all processes must be

scheduled before the barrier completes.

Effective scheduling of parallel programs requires that user access

through commands such as rsh, rlogin and telnet is disabled.

This is carried out by the partition startup script (see Section 4.4).

general

batch

The scheduling policy supports UNIX time-sharing with load

balancing and gang scheduling. It is appropriate for a login partition

that is used for developing and debugging parallel programs.

The scheduling policy is determined by a batch system. It is

appropriate for partitions that are for the exclusive use of a batch

system. The batch system may run sequential or parallel programs as

it wishes but interactive use is prohibited.

7.3 Scheduling Constraints

The scheduling decisions made while gang scheduling are controlled by a number of

parameters. These parameters can be speciﬁed for individual users and for projects

(groups of users) in the access_controls table (see Section 10.2.1). Restrictions on the

partition itself are speciﬁed in the partitions table (see Section 10.2.16). The

parameters are as follows:

Priority

Each resource request is assigned a priority taken from the priority ﬁeld of the

access_controls table. The Partition Manager schedules resource requests in order

of priority. Where a number of requests are queued with the same priority, they are

scheduled by order of submission time. The submission into the queue of a high priority

request may cause existing low priority jobs to be suspended. Changing the priority of a

request requires administrator privileges.

Maximum Number of CPUs

An upper limit can be set on the number of CPUs that may be allocated to a user or

project at any point in time. Requests that take the usage count for the user or project

above this limit are blocked. Requests for more CPUs than the limit on a user or project

are rejected.

7-2 RMS Scheduling

Download from Www.Somanuals.com. All Manuals Search And Download.

What Happens When a Request is Received

Time Limit

Jobs are normally run to completion or until they are preempted by a higher priority

request. Each partition may have a time limit associated with it which restricts the

amount of time the Partition Manager may allow for a parallel job. On expiry of this

time limit, the job is sent a SIGXCPU signal. A period of grace is allowed following this

signal for the job to clean up and exit. After this period, the job is killed and the resource

deallocated. The duration of the grace period is speciﬁed in the attributes table (see

Section 10.2.3) and can be set using rcontrol.

Memory Size

The Partition Manager can enforce memory limits that restrict the size of a job. The

default memory limits are designed to prevent memory starvation (a node having free

CPUs but no memory) and to control whether parallel jobs page or not.

7.4 What Happens When a Request is Received

A user’s request for resources, made through the RMS commands prun or allocate,

speciﬁes the following parameters:

cpus

The total number of CPUs to be allocated.

nodes

The number of nodes across which the CPUs are to be allocated. This

parameter is optional.

base node

hwbcast

The identiﬁer of the ﬁrst node to be allocated. This parameter is

optional.

A contiguous range of nodes. This parameter is optional. When a

contiguous range of nodes is allocated to a job, messages can be

broadcast in hardware. This offers advantages of speed over a

software implementation if the job makes use of broadcast operations.

memory

The amount of memory required per CPU. This parameter is optional

(set through the environment variable RMS_MEMLIMIT) but jobs with

low memory requirements may be scheduled sooner if they make

these requirements explicit.

time limit

samecpus

The length of time for which the CPUs are required. This parameter is

optional (set through the environment variable RMS_TIMELIMIT).

The same set of CPUs on each node. This parameter is optional.

RMS Scheduling 7-3

Download from Www.Somanuals.com. All Manuals Search And Download.

What Happens When a Request is Received

immediate

The request should fail rather than block if resources are not

available immediately.

Note

The RMS scheduler attempts to allocate CPUs on a contiguous range of nodes. If

a contiguous range of nodes is not available then requests that explicitly specify

a contiguous range with the hwbcast parameter will block if the requested

CPUs cannot be allocated.

When the Partition Manager receives a request, it ﬁrst checks to see if the partition has

sufﬁcient resources. If the resources are available, the next check is on the resource

limits applied to the user and the project. If these checks fail, the request is rejected.

If the checks succeed, the scheduler attempts to allocate CPUs from those that are

currently free. If sufﬁcient CPUs are free but allocating them would exceed the user’s

CPU usage limit, the request is marked as blocked (or, if the immediate parameter is

set, the request fails). If CPUs can be allocated, the resource request is marked as

allocated and job(s) may use the CPUs. If the request cannot be met, it is added to the

list of active requests and marked as queued. The scheduler than re-evaluates the

allocation of CPUs to all of the requests in the list.

The list of resource requests is sorted in priority order. Requests of the same priority are

sorted by submission time. When evaluating the list, the scheduler works down the

requests trying to allocate CPUs to them. The highest priority request is allocated CPUs

ﬁrst except when doing so would cause the system to run out of swap space (see

Section 7.4.2).

In considering each request, the scheduler ﬁrst looks at whether it has already been

allocated CPUs (a bound request). CPUs remain allocated to a request unless they are

preempted by a higher priority request, in which case the request of lower priority is

suspended together with any jobs that were running on it. If the request is not yet

bound then CPUs are allocated, if sufﬁcient are free.

The list of requests is re-evaluated when free CPUs cannot be found for a new request,

when an existing request completes or on the expiry of the time-slice period (see

Section 7.4.3).

Consider what happens when a high priority request is submitted to a partition that is

already running jobs. If there are sufﬁcient CPUs free (matching the constraints of the

request) then the job(s) start. If there are not enough free CPUs, the list of requests is

re-evaluated. CPUs are allocated to the high priority request and its job(s) are allowed to

start. The jobs of the lower priority requests, whose CPUs were taken for the high

priority request, are suspended. Any of the low priority jobs for which CPUs are

available continue.

7-4 RMS Scheduling

Download from Www.Somanuals.com. All Manuals Search And Download.

What Happens When a Request is Received

7.4.1 Memory Limits

If memory limits are enabled (by setting the memlimit attribute of a partition or access

control) then a request is only allocated CPUs on nodes that have sufﬁcient memory

available. RMS enforces memory limits by setting the data and stack size limits on a

process. If the process exceeds the allowed size, it is killed (and the parallel program

terminated).

Users, whose programs consume a known amount of memory each time they run, can set

their own memory limit with the environment variable RMS_MEMLIMIT. Setting this

variable (especially if the new value is much smaller than their default memory limit)

may cause their jobs to be scheduled sooner than would otherwise be the case. Users

cannot raise their memory limits above the level set by the system administrator. They

can only lower them.

The default memory limit is calculated by dividing the memory available by the number

of CPUs per node. For example, if a node has 4GB of memory and 4 CPUs then each CPU

that is allocated comes with 1GB of memory. Larger memory limits can be set but this

risks having CPUs idle through memory starvation, unless there is a plentiful supply of

jobs requesting small amounts of memory.

If memory limits are enabled, the RMS scheduler keeps track of the maximum memory

usage per node. The ratio of memory limit to memory size determines how many

requests can be present (allocated or suspended) before jobs start to page.

7.4.2 Swap Space

By default, the operating system reserves swap space as a program allocates memory.

Hence, a process requiring 1GB of memory must also have 1GB of swap space. If

memory limits are enabled, RMS does not allocate CPUs to new requests if the addition

of their maximum memory usage to that already allocated would cause the total for the

node to exceed the swap space available.

Each node normally has signiﬁcantly more swap space than memory. The ratio of

memory limit to swap space determines how many requests (allocated or suspended) can

be present on each node.

Tru64 UNIX supports a lazy swap allocation policy in which swap space is only

allocated when required. If this policy is enabled then RMS uses the total memory

available on the node to limit the size and number of jobs run. This enables large

memory jobs to run on nodes with relatively little swap space.

Warning

If lazy swap allocation is enabled, there must be sufﬁcient swap space for the

UNIX daemons and any other processes running on such nodes. Without this,

RMS Scheduling 7-5

Download from Www.Somanuals.com. All Manuals Search And Download.

What Happens When a Request is Received

processes (including those belonging to the system) will be killed if the system

runs out of swap space.

7.4.3 Time Slicing

Time slicing is enabled on a partition by setting its timeslice attribute; values of

15–120 seconds are recommended. If a timeslice is set, the Partition Manager

evaluates the list of requests periodically. The list of requests is still sorted by priority

but requests of the same priority are sorted on the number of time slices since they were

last scheduled (rather than the submission time). When the system has requests for

more CPUs than are available, the scheduler suspends requests at the end of each time

slice so that others can use the CPUs.

When setting up a system for time slicing, it is important to set memory limits that

ensure that all jobs remain resident in memory. System performance will be poor if time

slicing between large jobs causes paging. The ratio of memory limit to memory size

controls how many requests can progress concurrently. For example, on a node with 4

CPUs and 4GB of memory, setting a memory limit of 512MB will allow two jobs to be

time sliced without paging.

The scheduler processes resource requests as it receives them. It tries to ﬁt new

requests to free CPUs. If no CPUs are available the request blocks, at least until the next

time slice.

7.4.4 Suspend and Resume

The allocation of CPUs to a request can be suspended using rcontrol. Doing this

reduces the CPU usage counts for the user and project, enabling other jobs to start.

Either the user or administrator can resume the allocation at a later time. To resume

the jobs, the CPUs are reallocated unless doing so would exceed a CPU usage limit. In

this case, the request is marked as blocked and CPUs will only be allocated and the

jobs restarted when sufﬁcient CPUs become available. Note that requests that are

suspended by the administrator cannot be resumed by their owner.

7.4.5 Idle Time

The amount of time that the resources allocated to a request can remain idle can be

constrained by using rcontrol to set an idle timer (see also Section 10.2.3). By default,

no timer is set. If an idle timer is set, it starts timing as soon as the resource has been

allocated. It is stopped if the request is suspended and restarted when the request is

resumed. If the idle timeout expires the CPUs are deallocated.

7-6 RMS Scheduling

Download from Www.Somanuals.com. All Manuals Search And Download.

Event Handling

8.1 Introduction

RMS includes a general mechanism for posting, waiting on and handling events. This

functionality is provided by the Event Manager, eventmgr (see Section 4.7).

Events are speciﬁed by RMS class, name, type and description.

class

The class of object generating the event

The instance of the object

The event type

name

type

description

A text description of the event

Generally, the description is either a single word that an event handler script can act on

or a full description of some problem.

Events have a string representation as follows:

class:name:type:description

For example, the following event signiﬁes that the new status of the node atlas0 is

running.

node:atlas0:status:running

The Event Manager writes events to the events table in the RMS database. The

following query prints the contents of the events table in time order:

Event Handling 8-1

Download from Www.Somanuals.com. All Manuals Search And Download.

Introduction

$ rmsquery -v "select * from events order by ctime"

id name class type ctime handled description

---------------------------------------------------------------

20 atlas0 node

21 atlas0 node

status 05/04/01 15:53:02

status 05/05/01 11:27:29

running

not responding

8.1.1 Posting Events

Events are normally posted by RMS servers but they can also be generated by the

command line utility rmspost. This is useful for testing the response of the system to

rare events. It can be run with a single argument as follows:

rmspost "class:name:type:description"

Alternatively, it can be run with 4 arguments as follows:

rmspost class name type "long description"

Note that the multiple word description, given as the fourth argument, must be quoted.

8.1.2 Waiting on Events

The command line utility rmswait waits on events. It can be run with a single

argument as follows:

rmswait "class:name:type"

rmswait ":name:"

Alternatively, ﬂags can be used to specify the class, name and type. The following

example speciﬁes the class with the -c option and the name with the -n option. The -t

ﬂag is used to specify the type of the event.

$ rmswait -c node -n atlas0

rmswait completes when a matching event is posted, after printing the event details on

stdout.

Two events match if their class, name and type are same. They also match if one or more

of the class, name and type is null. For example:

node:atlas0:status

node::status

node::

matches node:atlas0:status

8-2 Event Handling

Download from Www.Somanuals.com. All Manuals Search And Download.

Event Handling

:::

matches node:atlas0:status

Note that the class, name, type and description must all be speciﬁed when posting

events but one or more of the class, name and type can be null when waiting on events.

8.2 Event Handling

Event handler scripts are speciﬁed in the event_handlers table. The default handlers

installed by RMS are as follows:

rmsquery -v "select * from event_handlers"

name class

type

timeout handler

-------------------------------------------------------------------------------

node

status

temphigh

tempwarn

fan

600

300

/usr/opt/rms/etc/rmsevent_node

/usr/opt/rms/etc/rmsevent_env

/usr/opt/rms/etc/rmsevent_escalate

/usr/opt/srasysman/bin/rmsevent_partition

psu

event

partition status

escalation -1

600

The script rmsevent_node is run for all node status events. rmsevent_env is run for

all environment events (temperature warnings, fan failures and PSU failures).

A timeout can be associated with each event handler. If the timeout is exceeded, the

handler is killed and an event escalation event posted. rmsevent_escalate is run

when one of the other handlers does not complete in the time allowed.

The eventmgr daemon runs on the rmshost node.

The handler scripts can send mail to users warning them of events. To enable this, set

the users-to-mail attribute in the machine attributes table.

By default, event handler scripts are ﬁled in /usr/opt/rms/etc. Local scripts should

be ﬁled in /usr/local/rms as the contents of the bin directory may change when a

new release is installed.

Each of the scripts tests for the existence of both site-speciﬁc and OS-speciﬁc handler

scripts before performing the default action.

Event handler scripts are called with ﬁve arguments: the event identiﬁer, the class,

name and type of the event and the event description. For example:

#!/bin/sh

# default OSF1 handler for node status events

# args:

id class name type description

Event Handling 8-3

Download from Www.Somanuals.com. All Manuals Search And Download.

List of Events Generated

program=‘basename $0‘

id=$1

class=$2

name=$3

type=$4

description=$5

# format event description message

message()

{

echo "‘date ’+%h %e %X’‘ OSF1 event $id $type $class $name $description"

}

# log the event

message >> /var/rms/adm/log/event.log

# execute OSF1 specific handler

/usr/opt/srasysman/bin/checkout.exp -I -R -i $id -c $class -n $name -t $type -d $description

8.3 List of Events Generated

The following events are generated by RMS:

class = node type = status

The name ﬁeld contains the name of the node. The description

contains one of the following:

unknown

node status cannot be determined

not responding node does not respond to ping

active

running

node is booted but RMS is not running

RMS is running

class = module type = temperature

The name ﬁeld contains the name of the module. The description

contains one of the following:

8-4 Event Handling

Download from Www.Somanuals.com. All Manuals Search And Download.

List of Events Generated

ambient=value DS20, ES40,

QM-S16, QM-S128

class = module type = temphigh

If the temperature exceeds the threshold value, the event type is

temphigh and the description contains the above report and, in

addition, the words threshold exceeded.

In the event of multiple failures, the reports are concatenated.

class = module type = psu

The name ﬁeld contains the name of the module. The description ﬁeld

contains one of the following where value is a bitmap that identiﬁes

the PSUs:

psu value failure QM-S128

In the event of multiple failures, the reports are concatenated.

class = module type = fan

The name ﬁeld contains the name of the module. The description ﬁeld

contains one of the following where value is a bitmap that identiﬁes

the component (fan or PSU).

enc fan value failure DS20, ES40, QM-S128

In the event of multiple failures, the reports are concatenated.

class = partition type = status

The name ﬁeld contains the name of the partition. The description

contains one of the following:

running partition is running

blocked partition is blocked

closing partition is closing down

down

partition has been shut down

class = transaction type = status

The name contains the unique identiﬁer for the transaction (the

transaction handle) and the description contains one of the following:

Event Handling 8-5

Download from Www.Somanuals.com. All Manuals Search And Download.

List of Events Generated

submitted transaction submitted

started

complete

failed

error

transaction being executed

transaction completed successfully

transaction failed to execute

transaction completed but there were errors

In the case of a transaction completing with errors (a link error test or

boundary scan, for example), details of the failures are added to the

transaction outputs table.

class = event type = escalation

The name contains the name of the event being escalated and the

description contains did not complete. If a handler is registered,

it is called with a description of the event that was not handled. The

handler should pass the event to an external management agent.

class = server type = status

The description contains information on errors that occurred when

starting the server.

8.3.1 Extending the RMS Event Handling Mechanism

The RMS event handling mechanism is open and extensible. New event types and

handlers for them can be added on a site-by-site basis. For example, you might run a

periodic ﬁle system status check on each node, execute a local cleanup script and post a

ﬁle system event to RMS if the free space in the ﬁle system dropped below a prescribed

level. The handler script could perform partition-wide or machine-wide cleanup and post

a notiﬁcation of the problem via email or an SNMP message.

8-6 Event Handling

Download from Www.Somanuals.com. All Manuals Search And Download.

Setting up RMS

9.1 Introduction

This chapter describes how to set up RMS and carry out routine operations. The

information is organized as follows:

• Planning the installation (see Section 9.2).

• Starting RMS and conﬁguring the system (see Section 9.3).

• Carrying out day-to-day operations and establishing backup and archive procedures

(see Section 9.4).

• Customizing RMS (see Section 9.5).

• Dealing with log ﬁles (see Section 9.6).

9.2 Installation Planning

Before you install RMS, think about how the resources of the system will be used and

who is going to use them. Ask yourself the following questions:

• Will the system be open to anyone to use or is it for a speciﬁc group of users?

• Will the machine run a constant workload or do you expect cyclical patterns in usage,

for example, a prime shift versus evenings and weekends?

Setting up RMS 9-1

Download from Www.Somanuals.com. All Manuals Search And Download.

Setting up RMS

• Is the machine primarily for running parallel jobs or do you expect a signiﬁcant

workload from sequential jobs?

• Will some of your users have jobs that consume all of the resources of the system for

extended periods of time? If so, are you happy for other users to wait until the

machine is available or do they need access to resources of their own?

• How do you wish to process the accounting data?

The answers to these questions should help you to determine how to conﬁgure the

system. It may be that you cannot answer these questions, in which case you should

start with one of the basic conﬁgurations described below.

9.2.1 Node Names

While planning the machine installation give some thought to its name and the names

of its nodes. We recommend selection of a short name for the machine (for example

atlas). Node names should use the name of the machine as a base and their network

port number as a sufﬁx (for example atlas0, atlas1, ..., atlas63). RMS will

compress such lists of hostnames (for example atlas[0-63]).

Some machines have a management server node that is connected to the management

network but is not connected to the Compaq AlphaServer SC Interconnect. By

convention, this node is given the sufﬁx ms (for example atlasms).

9.3 Setting up RMS

RMS should be installed according to the instructions in the Compaq AlphaServer SC

Installation Guide.

For the purposes of this section, we assume a machine with 64 nodes, where each node

has 4 CPUs, 4GB of memory and an 18GB disk. You should make adjustments for the

actual number of nodes in your system. If RMS is already running on the machine, skip

to Section 9.3.2.

9.3.1 Starting RMS

The RMS initialization script, /sbin/init.d/rms, is run on each node with the

argument start as the node boots. Conversely, when the node halts, the script is run

with the stop argument.

To start or stop RMS manually on all of the nodes at once, run rmsctl on the rmshost

node with the appropriate argument (start or stop). This command runs

/sbin/init.d/rms on each of the nodes in turn. rsh must be enabled for root users

9-2 Setting up RMS

Download from Www.Somanuals.com. All Manuals Search And Download.

Setting up RMS

for this command to work correctly. This should have been enabled as part of the

installation.

# rmsctl start

Conﬁgure all of the nodes into the machine using rcontrol.

# rcontrol configure in ’atlas[0-63]’

Use rinfo with the -n option to check the status of the nodes. The output should show

that all of the nodes are running.

# rinfo -n

running atlas[0-63], atlasms

If any of the nodes show a status other than running, restart them by running

/sbin/init.d/rms on the nodes in question. For example, to restart RMS on atlas3,

enter the following:

# /sbin/init.d/rms stop

# /sbin/init.d/rms start

If necessary, conﬁgure out any nodes that fail:

# rcontrol configure out atlas3

Restarting RMS

RMS daemons such as the Machine Manager and the Partition Manager can be stopped

and started by executing the /sbin/init.d/rms script on the rmshost node. When

run on the rmshost node, the init script checks the status of each of the partitions in the

active conﬁguration. If a partition is in the running state or blocked state, the

partition is stopped and its autostart ﬁeld in the servers table is set to 1, otherwise

the ﬁeld is set to 0. When the node boots, only those partitions that have their

autostart ﬁeld set to 1, are restarted. This means that the state of the conﬁguration is

preserved.

By contrast, if rmsctl is used to start and stop the machine, all of the partitions in the

active conﬁguration are started: when rmsctl stops RMS, it sets all of the autostart

ﬁelds to 0; when it starts RMS, it sets them to 1.

9.3.2 Initial Setup with One Partition

This example describes the simplest possible setup. All nodes are in a single partition

and there are no memory limits, time limits or access controls. Any user can run a job

using all of the CPUs.

Setting up RMS 9-3

Download from Www.Somanuals.com. All Manuals Search And Download.

Setting up RMS

Once RMS is running on all of the nodes, you set up a single partition as follows:

# rcontrol create partition=parallel configuration=day nodes=’atlas[0-63]’

# rcontrol start partition=parallel

You should now be able to run a parallel program across all 64 nodes, for example:

# prun -N64 hostname

...

# prun -N64 dping 0 32

...

9.3.3 Simple Day/Night Setup

In this example, the system is set up with two operating conﬁgurations: one called day

and the other called night. During the day, the resources are split into two partitions: a

parallel) for execution of short parallel programs. At night, all of the nodes are

assigned to a single partition (again called parallel) with a longer time limit for

running parallel jobs.

Use the following commands to create this pair of conﬁgurations:

# rcontrol create partition=login configuration=day type=login nodes=’atlas[0-7]’

# rcontrol create partition=parallel configuration=day type=parallel \

timelimit=600 nodes=’atlas[8-63]’

# rcontrol create partition=parallel configuration=night type=parallel \

timelimit=3600 nodes=’atlas[0-63]’

To start the day conﬁguration, enter the following:

# rcontrol start configuration=day

...

To switch to the night conﬁguration, use this command:

# rcontrol start configuration=night

...

Note that, after the switch, any jobs running on the parallel partition will continue to

run as the parallel partition in the conﬁguration night has more nodes. However,

when changing back from night to day, you must decide what to do with any jobs that

are running on nodes ’atlas[0-7]’. The options are to wait for them to ﬁnish or to kill

them. To wait for them to ﬁnish, stop the partition with the wait option.

# rcontrol stop partition=parallel option=wait

# rcontrol start configuration=day

...

9-4 Setting up RMS

Download from Www.Somanuals.com. All Manuals Search And Download.

Day-to-Day Operation

Note

In the current release, any requests that are suspended when a partition is

stopped must be resumed manually if the partition is restarted.

9.4 Day-to-Day Operation

Once the system is up and running, give some thought to automating some routine or

day-to-day operations:

• Periodic shift changes

• Backing up the database

• Summarizing accounting data

• Archiving data

• Database maintenance

You may also want to conﬁgure nodes out of the system in the event of failures.

9.4.1 Periodic Shift Changes

The commands for switching between conﬁgurations are described in Section 9.3.3.

When you are satisﬁed with the shift changes, install a cron job to perform them

automatically.

9.4.2 Backing Up the Database

As soon as the RMS installation is stable, back up the database to a text ﬁle so that it

can be recovered in the event of failure. Do this as follows, using rmstbladm, the table

administration program (see Page 5-44).

$ rmstbladm -d > database_backup.txt

The backup ﬁle contains the sequence of SQL statements required to recover the current

state of the database.

The RMS database is stored in /var/rms/msqldb on the rmshost node. The database

server will exit if this ﬁle system ﬁlls up. RMS will not operate until sufﬁcient space has

been created in this ﬁle system. Ensure that there is at least 100MB free to allow for

updates. The database server can be restarted using the script /sbin/init.d/msqld.

Setting up RMS 9-5

Download from Www.Somanuals.com. All Manuals Search And Download.

Day-to-Day Operation

9.4.3 Summarizing Accounting Data

Accounting records accumulate in the RMS database as each job is run. By default, they

are not processed as each site has its own requirements in this respect. A simple

example script to produce a summary of resource usage is included in the release in

/usr/opt/rms/examples/scripts/accounting_summary. See

Appendix E (Accounting Summary Script) for a listing. The script produces the following

output.

Accounting Summary of Machine atlas at 16:01 Wed 21 Feb 2001

Usage by Project/User For Previous Day

Project

Name

User

Name

CPU

Secs

User

Secs

Sys Number

Secs Sessions

-------------------------------------------------------------------------

default

addy

596

540

29272

286

533

227

duncan

johnt

root

stephen

134

-------------------------------------------------------------------------

Total default 30751 885 201 122

-------------------------------------------------------------------------

Grand Total 30751 885 201 122

-------------------------------------------------------------------------

When the accounts have been processed, the script can optionally delete the accounting

records for resource requests that have completed.

This script (or one based on it) can be run nightly with a cron job, as shown in the

following example.

0 0 * * * /usr/opt/rms/examples/accounting_summary

9.4.4 Archiving Data

To keep the database to a reasonable size, old entries should be removed on a regular

basis as described in Section 9.4.5. Before clearing old entries from the database,

archive any data you want to preserve. Generally, this is data from the following tables:

resources

jobs

Descriptions of each request to allocate resources

Descriptions of each job

node_stats

acctstats

Utilization statistics for each node

Accounting statistics logged by RMS

9-6 Setting up RMS

Download from Www.Somanuals.com. All Manuals Search And Download.

Day-to-Day Operation

The data can be archived as a sequence of SQL statements using rmstbladm. The

following example archives data from the node statistics (node_stats) table (see

Section 10.2.15):

$ rmstbladm -d -t node_stats > nodestats.sql

Alternatively, you can execute a SQL query to extract the data, as follows:

$ rmsquery -v -u "select * from node_stats"

name

ctime

usercpu syscpu freemem ubc wired freeswap pages interrupts ...

--------------------------------------------------------------------------

atlas0 973162911

atlas3 973162917

atlas2 973162922

atlasms 973162907 22

atlasms 973163027 23

295 483 162

5103

5108

137

531

566

...

117

124

61 301

62 301

138

Use the ctime ﬁeld to select old data. For example, select data that was collected 2 days

ago or more as follows:

now=‘rmsgettime‘

old=‘expr $now - 172800‘

rmsquery -v "select * from node_stats where ctime <= $old \

order by ctime" > node_stats.sql

The following queries return data from the jobs, resources and accounting statistics

tables. Accounting statistics can also be managed using the script described in

Section 9.4.3.

rmsquery -v "select * from jobs where endTime <> 0 and \

endTime < $old order by startTime" > jobs.dat

rmsquery -v "select * from resources where endTime <> 0 and \

endTime < $old order by startTime" > resources.dat

rmsquery -v "select * from acctstats where running = 0 and \

started < $old order by started" > acctstats.dat

After executing these queries, run rmstbladm to clean up the database as described in

Section 9.4.5.

9.4.5 Database Maintenance

Certain tables in the RMS database grow over time or as jobs are submitted, in

particular, the node statistics (node_stats) table, the resources table, the events

table and the jobs table. These tables can be kept to a reasonable size by periodically

Setting up RMS 9-7

Download from Www.Somanuals.com. All Manuals Search And Download.

Day-to-Day Operation

instructing the table administration program, rmstbladm, to remove old entries. Before

running rmstbladm, archive any data you want to keep as described in Section 9.4.4.

Remove old entries as follows:

# rmstbladm -c

rmstbladm clears out all entries that are older than a speciﬁed lifetime. The lifetime for

job data and the lifetime for statistical data are speciﬁed in the attributes table (see

Section 10.2.3).

Failure to clear old entries can cause problems as described at the end of this section.

See Section 9.4.3 for details about the accounting statistics table which also grows over

time.

A cron job can be set up to clear out the tables. In the following example, this task is

performed at 2 a.m. each weekday morning.

0 2 * * 1-5 /usr/bin/rmstbladm -c

Troubleshooting

If the tables are not cleared out on a regular basis, the database continues to grow until

the performance of RMS is affected. Indications that this is happening include the

following:

• The database server, msqld, uses more memory.

• The table join operations performed by rinfo take longer.

• Queries acting on large tables may exceed normal user memory limits.

• rmstbladm takes a long time to clear out old entries or may fail, although insert

operations succeed and the tables continue to grow.

The point at which memory limits are exceeded varies with the number of nodes in the

machine and the amount of memory on the rmshost node. To check that the size of the

database is within operating limits, enter the following query:

$ rmsquery; "select * from node_stats" > /tmp/stats.sql

If this fails, follow these steps to recover from the problem:

1. Log in to the rmshost node as root and stop the database server, as follows:

# /sbin/init.d/msqld stop

MSQL: service stopped

9-8 Setting up RMS

Download from Www.Somanuals.com. All Manuals Search And Download.

Day-to-Day Operation

2. Change to the directory that contains the database, for example:

# cd /var/rms/msqldb/rms_atlas

Delete the following ﬁles: node_stats.dat, node_stats.def, node_stats.idx

and node_stats.ofl.

# rm node_stats.*

3. Restart the database server, as follows:

# /sbin/init.d/msqld start

MSQL: daemon started

4. Create a new node statistics table, as follows:

# rmstbladm -u

After this, rmstbladm should succeed in cleaning out old entries.

9.4.6 Conﬁguring Nodes Out

If a node fails and cannot be rebooted, it must be conﬁgured out while it is being

repaired. The procedure for this is as follows:

1. Stop the partition containing the failed node. Any jobs that are running on the failed

node when the partition is stopped will be killed. Other jobs will continue to run.

# rcontrol stop partition=parallel

2. Conﬁgure out the node (atlas2 in this case). Note that RMS reports an error if you

try to conﬁgure a node in or out while the partition is running.

# rcontrol configure out node=atlas2

3. Restart the partition:

# rcontrol start partition=parallel

After this procedure, the partition runs without the node. This reduces temporarily the

maximum size of job that can run.

When the node has been repaired, stop the partition again and conﬁgure the node back

in as follows:

1. Stop the partition containing the failed node:

# rcontrol stop partition=parallel

2. Conﬁgure in the repaired node (atlas2 in this case):

Setting up RMS 9-9

Download from Www.Somanuals.com. All Manuals Search And Download.

Local Customization of RMS

# rcontrol configure in node=atlas2

3. Restart the partition:

# rcontrol start partition=parallel

This brings the partition back up to its full complement of nodes.

9.5 Local Customization of RMS

RMS can be customized to suit local operating patterns in a variety of ways.

Customization is done through site-speciﬁc scripts in /usr/local/rms/etc. The

following site-speciﬁc customizations are supported:

• Core ﬁle analysis

• Partition startup

• Event handling

• Switch manager conﬁguration

If site-speciﬁc scripts exist then they override the defaults supplied with RMS.

9.5.1 Partition Startup

The default partition startup script enables or disables logging in to a node according to

the partition type. Site-speciﬁc variants might check whether users are logged in to the

node and warn them of changes. They might also check on the availability of space in

local temporary ﬁle systems.

To create a site-speciﬁc partition startup script, copy the default script

/opt/rms/etc/pstartup to /usr/local/rms/etc and modify it as required.

9.5.2 Core File Handling

By default, RMS instructs the operating system to dump core ﬁles to local temporary ﬁle

space under /local/core/rms. Change the attribute local-corepath in the

attributes table to select an alternative default directory for core ﬁles. Subdirectories

are created in local_corepath/resource-id for each resource request. Change the

attribute rms-keep-core to disable the dumping of core ﬁles.

If dumping is disabled, a core ﬁle analysis script is run on at least one node before the

core ﬁles are deleted. The default script prints a backtrace showing why the program

9-10 Setting up RMS

Download from Www.Somanuals.com. All Manuals Search And Download.

Log Files

crashed. A site-speciﬁc variant might copy core ﬁles from the local temporary directory

to a global ﬁle system for subsequent analysis.

To create a site-speciﬁc core ﬁle analysis script, copy the default script

/opt/rms/etc/core_analysis to /usr/local/rms/etc and modify it as required.

9.5.3 Event Handling

The default event handlers check for the existence of a site-speciﬁc handler of the same

name in /usr/local/rms/etc. If such a script exists, it will be executed in preference

to the default handler. To make site-speciﬁc changes, copy the default scripts to this

directory and amend them to your needs. Use rmspost to test their correct operation.

9.5.4 Switch Manager Conﬁguration

The switch network manager (swmgr) must be run on the node to which the switch

network control cable is connected. By default, this is the rmshost node. Depending on

the conﬁguration of your system, you may need to change this default.

The swmgr process consumes CPU time while sampling the network for errors.

Therefore, it should ideally be run on a lightly loaded node that is not used to run

parallel jobs; for example, a management server. Use rcontrol to stop the running

swmgr, and run rmsquery to set the node that should run the swmgr, as follows:

# rcontrol stop server=swmgr

# rmsquery "update servers set hostname=’atlasms’ where name = ’swmgr’"

# rmsquery "select name,hostname from servers where name = ’swmgr’"

swmgr atlasms

# rcontrol start server=swmgr

If your system does not have a suitable management server, you should run the swmgr

on the rmshost node. If rmshost is an alias for one node of a resilient pair, the swmgr

should run on the primary node. Under these circumstances, you should set the rate at

which the swmgr polls so as to reduce the impact on other processes, by changing the

polling interval from the default value (30 seconds) to 15 minutes. Use rcontrol to do

this, as follows:

# rcontrol create attribute name=swmgr-poll-intervalval=900

The change in polling frequency will take effect next time the swmgr is started. To force

the change to occur immediately, use rcontrol to stop and start the server, as described

above.

Setting up RMS 9-11

Download from Www.Somanuals.com. All Manuals Search And Download.

Log Files

9.6 Log Files

The RMS daemons output reports to log ﬁles in the directory /var/rms/adm/log. The

amount of detail is controlled for each daemon by setting a reporting level. By default,

the reporting level is set to 0.

The reporting level is a bitmap that turns on different reports. Values for the reports are

as follows:

Symbolic Name

Value Description

INIT_DEBUGGING

REQ_DEBUGGING

JOB_DEBUGGING

Server initialization and shutdown messages

Requests made to servers

Job startup and change of state

RESOURCE_DEBUGGING

EDIT_DEBUGGING

MALLOC_DEBUGGING

32 Resource allocation and change of state

64 SQL queries

256 Monitor server memory allocation

The level of reporting can be controlled in three ways.

1. On an individual user basis, by setting the environment variable RMS_DEBUG.

2. Using rcontrol to reload the daemon with a speciﬁed debug value. For example, the

following command reloads the Machine Manager with a reporting level of 1:

# rcontrol reload server=mmanager debug=1

The following example reloads the Partition Manager for the par1 partition with a

reporting level of 41 (initialisation, job and resource information)

# rcontrol reload partition=par1 debug=41

3. Using rmsquery to set the args ﬁeld of the daemon’s entry in the servers table (see

Section 10.2.19) to -r value, where value is the required reporting level.

The following example gives the Partition Manager for the par1 partition a reporting

level of 33.

# rmsquery "update servers set args=’-r 33’ where name=’pmanager-par1’"

Then restart the Partition Manager. This change remains in place each time the

partition is restarted. The output ﬁles in /var/rms/adm/log can grow in size

rapidly when debug options are enabled. Take care not to ﬁll the ﬁle system.

9-12 Setting up RMS

Download from Www.Somanuals.com. All Manuals Search And Download.

The RMS Database

10.1 Introduction

This chapter describes the tables which make up the RMS database. Each machine has

its own database, called rms_ machine, where machine is the name of the machine.

This allows a single database server to support multiple machines.

The database contains tables storing information on the following:

• The conﬁguration of the machine: its nodes and the Compaq AlphaServer SC

Interconnect

• The users of the machine: the access controls and resource quotas applied to them;

their requests to run jobs; the accounting records for these jobs

• The operation of the machine, including its current state and performance statistics

10.1.1 General Information about the Tables

• All of the ﬁeld names in the database are case sensitive.

• Fields are of these types:

char(length) This denotes a character string of the speciﬁed length.

int

This denotes an integer value.

This denotes a percentage value stored in an integer ﬁeld.

This denotes a UTC time value stored in an integer ﬁeld.

UTC

The RMS Database 10-1

Download from Www.Somanuals.com. All Manuals Search And Download.

Introduction

x-y

This denotes a range of possible integer values.

This denotes a character string of arbitrary length.

text

• Fields of type text can be selected by the ﬁeld name but the text entry cannot be

matched.

If the text is a list of items, for example, a list of node names, the items in the list

may be separated by white space. A list of names, all of which share a common base,

for example, atlas0 atlas1 atlas2, may also be represented by a glob-like

expression, in this example, atlas[0-2]. The normal glob(7) expression syntax is

relaxed to include multiple digit numbers. For example, atlas[0-10,14]

represents the nodes numbered from atlas0 to atlas10 inclusive plus atlas14.

• Information on time is stored as UTC time in integer ﬁelds. Client programs should

convert time to local time and output the result as a string.

10.1.2 Access to the Database

There are three levels of access to the database:

1. Users can extract information from all of the tables but cannot update them.

2. Operators and administrators can extract information from all of the tables and, in

addition, update a limited selection of ﬁelds in some tables.

3. RMS itself can extract and update information in all ﬁelds of all tables. The

description of the tables in Section 10.2 includes information about which RMS

programs create and update each ﬁeld.

10.1.3 Categories of Table

This chapter describes the tables in the database, listing them in alphabetical order.

This section groups the tables by category.

Conﬁguration of Nodes

The following tables contain information about the individual nodes and about the

machine as a whole.

nodes

describes the attributes of each node

node statistics contains performance statistics for each node

partitions

modules

module types

deﬁnes each partition and its scheduling parameters

physical location and environmental data for each module

describes the characteristics of each supported module

10-2 The RMS Database

Download from Www.Somanuals.com. All Manuals Search And Download.

Introduction

Operational State

The following tables hold details of the current state of the machine.

events

records changes to the state of the machine

lists the handlers used to act on events

holds site-speciﬁc attribute-value pairs

event handlers

attributes

ﬁelds

speciﬁes how objects and attributes may be modiﬁed

servers

transactions

software products

holds details on each daemon (hostname, port number, pid)

records requests to change the machine conﬁguration

describes the components of each software product

installed components describes the components installed on each node

User Details

The following tables contain information about the users of the RMS: their privileges

and priorities and their usage of the system.

users

lists the projects to which users belong

lists the projects

describes limits on user access to resources

describes the allocation of resources to users

describes the users’ jobs

projects

access controls

resources

jobs

accounting statistics contains an accounting record for each resource

Conﬁguration of the Network

The following tables describe the network components. Deﬁnitions of terms used in

describing the Compaq AlphaServer SC Interconnect can be found in

Appendix A (Compaq AlphaServer SC Interconnect Terms).

elans

elites

records the position and state of the Elan network adapters

records the position and state of the Elite switches

switch boards records the position and state of each switch board

link errors logs network errors

Internal Tables

The RMS database includes a number of tables that are mainly used internally. These

are noted in this chapter as being of internal use but are not described in any detail.

The RMS Database 10-3

Download from Www.Somanuals.com. All Manuals Search And Download.

Listing of Tables

transaction outputs contains output from requests posted to the transaction log

request types

statistics

services

describes output formats in the transaction outputs table

lists the performance statistics available in the current release

describes the services available and who can use them

10.2 Listing of Tables

This section lists the tables in alphabetical order.

10.2.1 The Access Controls Table

The access_controls table shown in Table 10.1, contains access control and usage

limit descriptions for users and projects.

Table 10.1: Access Controls Table

Field

name

class

Type

char(16) name of the user or project

char(8) class of control: user or project

Description

partition char(16) partition to which access controls apply

priority

maxcpus

memlimit

int

default scheduling priority

maximum number of CPUs

maximum memory per CPU in megabytes

An entry for the reserved partition name default speciﬁes the priority, maxcpus

and maxmem that should apply for any partition names not explicitly speciﬁed.

The priority ﬁeld stores the default scheduling priority of jobs submitted by a user or

project. The higher the value, the more likely the job is to run. Priority values range

from 0 to 100, the default being 50.

The maxcpus ﬁeld stores the maximum number of CPUs that a user or project may have

allocated at once. Requests for more than this number of CPUs fail. Once this number of

CPUs is allocated, additional requests block until some CPUs are freed.

The memlimit ﬁeld stores the default memory limit per CPU for jobs submitted by the

named user or project.

10.2.2 The Accounting Statistics Table

Each time CPUs are allocated to a request, a record is created in the accounting

statistics (acctstats) table shown in Table 10.2. Records are updated periodically and

10-4 The RMS Database

Download from Www.Somanuals.com. All Manuals Search And Download.

Listing of Tables

at the end of each job by the Partition Manager, pmanager (see Section 4.4).

Table 10.2: Accounting Statistics Table

Field

name

uid

project

started

etime

atime

utime

stime

cpus

Type

char(16) name of the allocated resource

int user ID

char(16) project name

Description

UTC

int

time when CPUs were allocated

elapsed time in seconds

CPU seconds allocated

user time in seconds

system time in seconds

number of CPUs allocated

maximum memory extent in megabytes

number of page faults requiring I/O

memory integral in megabyte hours

CPUs allocated and running jobs

mem

pageflts int

memint

running

int

0 | 1

The etime ﬁeld stores the elapsed time (in seconds) since CPUs were ﬁrst allocated to

the resource, including any time during which the resource was suspended.

If a partition is stopped, while a job is running, and the partition is restarted before the

job completes, the etime ﬁeld will correctly show the total elapsed time of the running

job including the time when the partition was down. If a partition is stopped, while a job

is running, and the job completes before the partition is restarted, the acctstats table

entry will reﬂect only the time when the partition was running. Any additional time

that the job was running while the partition was down is not included in the acctstats

table entry. If a job is terminated because it exceeds its timelimit or a job is terminated

with rcontrol, the etime ﬁeld reﬂects the time for which CPUs were allocated.

The atime ﬁeld stores the total elapsed time (in seconds) that CPUs have been allocated

– this excludes time during which the resource was suspended, but includes any time

when the partition was down while jobs were running. The value stored is the total for

all CPUs used by the resource.

The utime and stime ﬁelds are summed over all processes in the program.

The running ﬁeld is set to 1 while CPUs are allocated. It is set to 0 when the CPUs are

deallocated, at which point no further updates take place.

The pageflts ﬁeld shows the number of page faults requiring I/O summed over all of

the processes in the parallel program. It is normally 0. A non-zero and growing value

indicates that the program is paging on some or all nodes.

The RMS Database 10-5

Download from Www.Somanuals.com. All Manuals Search And Download.

Listing of Tables

The memint ﬁeld is set to 0 in AlphaServer SC Version 2.0.

The number of entries in the accounting statistics table can grow rapidly. The table

should be cleared periodically of old entries as described in Section 9.4.3.

10.2.3 The Attributes Table

The attributes table shown in Table 10.3, stores information speciﬁc to the site or the

release. This information is stored as attribute-value pairs. The table is created by the

table administration program, rmstbladm (see Page 5-44), which adds a minimal set of

default entries. Further attributes are added by RMS daemons. The values can be

modiﬁed by the administrator.

The entries in the attributes table can be grouped into four sections:

1. Machine attributes

2. Performance statistics attributes

3. Server attributes

4. Parallel processing attributes

The machine attributes in the following table are supported:

Table 10.3: Machine Attributes

Attribute

Default Description

network-type

network-levels

network-layers

racks

units-per-rack

rmshost

QM-S16 data network type (QM-S16 or QM-S128)

number of levels of switch network

number of layers (rails) of switch network

number of 19" racks in the machine

height of a 19" rack in units

node running the RMS daemons

The performance statistics attributes shown in Table 10.4 control the collection and

lifetime of performance statistics. The statistics are collected by rmsd at the intervals

given in this table. In the current release, only CPU statistics are gathered.

The number of entries in the jobs table, the resources table, the accounting statistics

table, the events table and the node statistics table can grow rapidly, especially on a

large busy machine or if the value of cpu-stats-poll-interval is very small. The

lifetime entries in Table 10.4 assign a ﬁnite life to this data. Once this lifetime has

been reached, the RMS table administration program, rmstbladm, will clean out old

10-6 The RMS Database

Download from Www.Somanuals.com. All Manuals Search And Download.

Listing of Tables

entries, if called with the -c option (see Page 5-44). Note that the accounting statistics

table is not cleared out (see Section 10.2.2).

Table 10.4: Performance Statistics Attributes

Attribute

Default Description

node-statistics

cpu-stats-poll-

interval

cpu

120

statistics collected per node

time in seconds between CPU samples

data-lifetime

stats-lifetime

time in hours to keep job data

time in hours to keep statistical data

The server attributes in Table 10.5 control the behavior of the RMS daemons. All of the

modiﬁcation times are in UTC. Client applications should convert this to local time and

print it as a string.

If the attribute node-status-poll-interval is not set or set to zero, the value of

rms-poll-interval is used instead.

Table 10.5: Server Attributes

Attribute

Default Description

rms-poll-interval

node-status-poll-

interval

polling interval for RMS daemons

time between monitoring node status

status-modify-time

resource-modify-

time

last time the status changed

last time a resource was modiﬁed

version

RMS version number

altzone

shift in seconds to apply to UTC time to

get local time on rmshost

The attributes in Table 10.6 control the scheduling of parallel programs. If the number

of resource requests reaches pmanager-queuedepth, subsequent requests either block

or fail immediately (if the immediate option to prun has been selected). The blocked

requests do not appear in the database. If the pmanager-idletimeout is exceeded, the

resource times out with an exit status of 125.

The RMS Database 10-7

Download from Www.Somanuals.com. All Manuals Search And Download.

Listing of Tables

Table 10.6: Scheduling Attributes

Attribute

Default

Description

default-partition

parallel

the partition used by default for parallel

programs

default-priority

grace-period

the default scheduling priority

the time allowed in seconds for a parallel

program to exit after a CPU time signal

the default partition for load balancing

requests

lbal-partition

exit-timeout

pmanager-queuedepth

pmanager-

default exit timeout (absent by default)

maximum number of queued requests

number of seconds an allocated resource

may remain idle

idletimeout

rms-keep-core

local-corepath

keep (1) or remove (0) core ﬁles

/local/core/rms directory path for core ﬁles

10.2.4 The Elans Table

The elans table shown in Table 10.7, contains one entry for each Elan network adapter

connected to the Compaq AlphaServer SC Interconnect. Entries are created and

maintained by the rmsd running on the node containing the Elan.

Table 10.7: Elans Table

Field

Type

Description

name

char(8)

unique identiﬁer for the adapter

hostname

layer

netid

revision

ecount

ecount10

status

linkstate

char(16) name of node containing the adapter

0–31

int

layer (or rail) number

network address within the layer

chip revision level

int

char(8)

char(2)

error count for the last sample

error count for the last 10 samples

Elan status (ok, unknown, error)

state of the link

linkerrors text

description of errors in the last 10 samples

Entries in the linkerrors ﬁeld give the ID of the link and then, in brackets, a vector of

error counts (see Appendix A (Compaq AlphaServer SC Interconnect Terms)).

10-8 The RMS Database

Download from Www.Somanuals.com. All Manuals Search And Download.

Listing of Tables

10.2.5 The Elites Table

The elites table shown in Table 10.8, contains one entry for each switch in the

network. Its entries are created and maintained by the Switch Network Manager,

swmgr (see Section 4.5).

Table 10.8: Elites Table

Field

Type

Description

name

char(8) Elite name, a unique ID for each switch

layer

level

0–31

0–3

layer (or rail) number

level number

netid

plane

0–255

0–63

network address within the layer

plane number

board

char(8) name of the board containing the switch

chip

0–7

int

chip number on the board

chip revision number

error count at the last sample

error count for the last 10 samples

revision

ecount

ecount10

status

linkstate

char(8) Elite status (ok, unknown, error)

char(8) state of each link

linkerrors text

description of errors in the last 10 samples

The linkstate ﬁeld contains a character for each of the 8 links. Each link can be in one

of the states shown in Table B.2.

Entries in the linkerrors ﬁeld give the ID of the link and then, in brackets, a vector of

counts for each of the supported error types (for more details on Compaq AlphaServer

SC Interconnect terms see Appendix A (Compaq AlphaServer SC Interconnect Terms)).

10.2.6 The Events Table

Entries are added to the events table shown in Table 10.9, each time an object

managed by the RMS changes state, for example, when a node status changes, a

partition starts or a component fails.

Table 10.9: Events Table

Field

Type

int

Description

unique identiﬁer for each event

name

char(16) name of object that has changed state

(continued on next page)

The RMS Database 10-9

Download from Www.Somanuals.com. All Manuals Search And Download.

Listing of Tables

Table 10.9: Events Table (cont.)

Field

Type

Description

class

type

char(16) class of the object, such as node or partition

char(16) type of event

ctime

handled

description text

UTC

0 | 1

time at which the event occurred

whether the event has been handled or not

description of the event

Table 10.10 shows three typical events. The ﬁrst shows a node status change as RMS

starts on node cfs1, the second shows a temperature change on module mod2 and the

third shows the partition parallel starting.

Table 10.10: Example of Status Changes

name

cfs1

mod2

class

node

module

ctime

893427468 status

894991521 temperature ambient=15

running

type

description

running

parallel partition 894991490 status

The events table can grow rapidly. Running the table administration program,

rmstbladm, with the -c option removes old entries. This should be done periodically

using a cron script. See Page 5-44 for details. The data-lifetime attribute in the

attributes table (see Section 10.2.3) determines how old the entries must be before

they are removed.

Events are discussed in detail in Chapter 8 (Event Handling).

10.2.7 The Event Handlers Table

The event_handlers shown in Table 10.11, deﬁnes the handler scripts that are run in

response to events. Event handling is discussed in detail in Chapter 8 (Event Handling).

Table 10.11: Event Handlers Table

Field

name

class

type

Type

Description

char(16) name of object that has changed state

char(16) class of the object, such as node or partition

char(16) type of event

timeout int

(continued on next page)

10-10 The RMS Database

time limit in seconds for the handler to complete

Download from Www.Somanuals.com. All Manuals Search And Download.

Listing of Tables

Table 10.11: Event Handlers Table (cont.)

Field

Type

Description

handler char(32) handler script to run

10.2.8 The Fields Table

The fields table shown in Table 10.12, deﬁnes which RMS objects and attributes can be

created and modiﬁed using rcontrol (see Page 5-20), identifying them by a table name

and ﬁeld name within that table.

Table 10.12: Fields Table

Field

Type

Description

name

char(16) name of the ﬁeld

tablename char(16) name of the table

access

char(8)

currently unused; always set to admin

type

char(16) deﬁnes the type of value

rangemin

rangemax

textattr

int

text

minimum value

maximum value

speciﬁes how values are validated

The value of the type ﬁeld determines how rcontrol checks the validity of values

entered by an administrator. The type ﬁeld may hold one of the values shown in

Table 10.13.

Table 10.13: Type Values

Value

Description

null

no checking

selectedtext textattr gives a comma-separated list of valid values

integer

relation

entry must be in range bounded by min and max

textattr gives a tablename.fieldname pair; entry must

be a value of fieldname in tablename

Values in the attributes table are not checked using this method; the valid values for

attributes are built into rcontrol.

The RMS Database 10-11

Download from Www.Somanuals.com. All Manuals Search And Download.

Listing of Tables

10.2.9 The Installed Components Table

The installed_components shown in Table 10.14, contains information about

software components installed on each node.

Table 10.14: Installed Components Table

Field

Type

Description

hostname

char(16) hostname of the node on which the component is in-

stalled

product

char(16) name of the software product to which the component

belongs

prodversion char(16) version of the software product to which the compo-

nent belongs

component

compversion char(32) version of the component

ctime UTC time the component was installed

char(16) name of the component

10.2.10 The Jobs Table

The jobs table shown in Table 10.15, contains one entry for each parallel job. The

entries are maintained by the Partition Manager, pmanager (see Section 4.4).

The jobs table can grow rapidly. Running the table administration program,

rmstbladm, with the -c option removes old entries. This should be done periodically

using a cron script. See Page 5-44 for details. The data-lifetime attribute in the

attributes table (see Section 10.2.3) determines how old the entries must be before

they are removed.

Table 10.15: Jobs Table

Field

Type

Description

name

char(16) unique identiﬁer for each job

char(16) the name of the resource on which the job is running

char(16) status of the job

resource

status

cpus

text

UTC

list of CPUs allocated to this job

list of nodes allocated to this job

list of hostnames allocated to this job

time the job started

nodes

hostnames

startTime

endTime

contexts

time the job completed

char(16) range of Elan contexts allocated to the job

(continued on next page)

10-12 The RMS Database

Download from Www.Somanuals.com. All Manuals Search And Download.

Listing of Tables

Table 10.15: Jobs Table (cont.)

Field

Type

Description

exitStatus int

exit status of the job

session

cmd

int

text

UNIX session ID of the allocating process

command being executed

Job names are sequence numbers generated automatically. The status ﬁeld holds one

of the values shown in Table B.1.

While the job is running, endTime is set to the time by which the job must end,

assuming there is a timelimit on the partition. If there is no time limit, the endTime

is set to 0. Finally, endTime is updated to show the time the job completed.

The nodes and cpus ﬁelds contain lists of node and CPU numbers in use by a job. Each

pair of values deﬁnes a cpus x nodes box allocated to the job. The total number of

CPUs allocated is the sum of the area of these boxes. See also Section 2.4.2.

A command name, cmd, passed to prun, may be up to MAXPATHLEN in length. In the

jobs table, the command name is truncated to 32 characters, including three dots (...)

appended to the name to show that it has been truncated.

10.2.11 The Link Errors Table

The link_errors shown in Table 10.16, contains one entry for each link error detected

by the Switch Network Manager, swmgr (see Section 4.5).

Table 10.16: Link Errors Table

Field

Type

int

Description

unique identiﬁer for each error

name

class

ctime

char(16) name of the chip detecting the fault

char(16) type of chip detecting the fault (elan, elite)

UTC

time at which the error was detected

description of the error

description text

Entries in this table are updated by the swmgr. High error counts in the description ﬁeld

indicate that an error is persistent. Increasing counts indicate that it is current.

Entries in the description ﬁeld give the ID of the link and then, in brackets, a vector

of counts for each of the supported error types. Compaq AlphaServer SC Interconnect

link errors are described in Appendix A (Compaq AlphaServer SC Interconnect Terms).

The RMS Database 10-13

Download from Www.Somanuals.com. All Manuals Search And Download.

Listing of Tables

10.2.12 The Modules Table

The modules table shown in Table 10.17, contains descriptions of each hardware

module in a machine. The modules may be nodes, network components or storage

devices. It is created by rmsbuild. Entries are added and removed by rcontrol and

updated by rmsd and the Switch Network Manager, swmgr.

Table 10.17: Modules Table

Field

Type

Description

name

char(16) name of the module

type

class

rack

unit

psus

char(16) type of the module, from the module types table

char(16) class of module (node, network)

int

ID of the rack that contains the module

location of the module in the rack

bitmap of the functioning power supply units

bitmap of the functioning fans

fans

estatus

environment text

char(16) environmental status of the module

description of the environmental status

Valid values for the type ﬁeld are listed in the modules type table (see Section 10.2.13).

The psus and fans ﬁelds are bitmaps; their width is controlled by the corresponding

values in the module types table (see Section 10.2.13).

rmsd collects environmental data from the kernel on each node. The operational status

of the cooling fans and power supplies is logged along with the temperature status of

vital system components. This is used to generate an environment status, estatus, and

an environment string, environment. The environment status can take one of the

values shown in Table B.3.

If the environment status is recorded as ok, the environment string contains

temperature readings from the CPU, power supply unit and enclosure. If a node has

more than one instance of each type of temperature sensor, the maximum of their values

is recorded.

Temperature information is recorded as a list of attribute-value pairs, for example:

ambient=15 cpu=40 psu=20

If an error occurs, the environment string contains details of what has failed, for

example, the following string indicates that the CPU fan number 1 has failed on the

node.

cpu fan 1 failure

10-14 The RMS Database

Download from Www.Somanuals.com. All Manuals Search And Download.

Listing of Tables

10.2.13 The Module Types Table

The module_types table shown in Table 10.18, contains descriptions of each of the

module types supported in a given release of the RMS. It is updated by the table

administration program, rmstbladm (see Page 5-44), when a new release is installed.

Table 10.18: Module Types Table

Field

Type

Description

name

class

units

cpus

psus

fans

char(16) name of the module type

char(16) class of module (node, network, storage)

int

height of the module in units

number of CPUs in the module

number of PSUs in the module

number of fans in the module

number of thermistors in the module

description of the module

thermistors int

description text

The module types supported in the current release are shown in Table 10.19.

Table 10.19: Valid Module Types

Name

Class

Units CPUs PSUs Fans Therm Description

DS20

node

AlphaServer

DS20

AlphaServer

ES40

QM-S16 switch

network

ES40

node

QM-S16

network

QM-S128 network

24 QM-S128

switch net-

work

See Appendix A (Compaq AlphaServer SC Interconnect Terms) for more details of the

network modules.

10.2.14 The Nodes Table

The nodes table shown in Table 10.20, contains conﬁguration information on each node

in a machine. The entries are created by the RMS clients rmsbuild and rcontrol.

Fields are updated by rmsd when the node is booted or RMS is restarted and by the

The RMS Database 10-15

Download from Www.Somanuals.com. All Manuals Search And Download.

Listing of Tables

Machine Manager, mmanager, when the node’s status or run level changes.

Table 10.20: Nodes Table

Field

Type

Description

name

char(16) the name of the node

type

char(8)

int

node type, such as ES40

maxmem

maxfree

maxswap

maxtmp

cpus

maximum memory available in megabytes

maximum free memory available in megabytes

maximum swap space available in megabytes

temporary ﬁle system space in megabytes

number of CPUs

1–32

cpus_reserved 0–32

number of CPUs reserved for OS services

mask of Elan devices present

elans

int

netid

configured

status

runlevel

boot_time

swap_eager

console

0–255

0 | 1

physical network ID (if applicable)

whether node is conﬁgured in or out

char(16) current node status

char(16) UNIX run level

UTC

time when node was last booted

swap allocation is lazy(0) or eager(1)

0 | 1

char(32) command line to connect to console

The type ﬁeld takes a value from the module types table (see Section 10.2.13).

The cpus_reserved ﬁeld speciﬁes the number of CPUs that are not available for

running parallel programs. These CPUs are reserved for running system services.

The configured ﬁeld indicates whether a node is active (1) or conﬁgured out for repair

or upgrade (0).

The status ﬁeld indicates the service level being provided by a node. Valid values are

shown in Table B.4. State changes are logged in the events table (see Section 10.2.6);

entries are keyed by class=node.

The runlevel may have one of the values shown in Table B.5.

The elans ﬁeld is a mask of the Elan devices present in the node. It has one bit set for

each device. In previous releases, only device 0 was supported.

10.2.15 The Node Statistics Table

The node statistics (node_stats) table shown in Table 10.21, contains performance

statistics collected periodically by the rmsd daemon running on each node.

To enable the collection of these statistics, the node-statistics ﬁeld in the

10-16 The RMS Database

Download from Www.Somanuals.com. All Manuals Search And Download.

Listing of Tables

attributes table (see Section 10.2.3) must be set to cpu. This is the default setting.

The interval at which the nodes are sampled for CPU statistics is controlled by the

attribute cpu-stats-poll-interval in the attributes table; the default is to

sample every 2 minutes.

The node statistics (node_stats) table can grow rapidly, especially on a large machine.

Running the table administration program, rmstbladm, with the -c option removes old

entries. This should be done periodically using a cron script. See Page 5-44 for details.

The stats-lifetime attribute in the attributes table (see Section 10.2.3)

determines how old the entries must be before they are removed.

Table 10.21: Node Statistics Table

Name

Type

Description

name

char(16) name of the node

ctime

usercpu

syscpu

freemem

ubc

wired

freeswap

pages

UTC

int

time at which sample was collected

user CPU time since last sample

system CPU time since last sample

free memory in megabytes

size of the uniﬁed buffer cache in megabytes

wired memory in megabytes

free swap space in megabytes

page fault rate

interrupts int

interrupts rate (except clock)

system call rate

number of users logged in

syscalls

users

int

freetmp

free temporary ﬁle space in megabytes

The usercpu and syscpu statistics are percentages calculated over the period since the

last sample.

The interrupts, pages and syscalls statistics are rates averaged over the interval

since the last sample.

10.2.16 The Partitions Table

The nodes in a machine are grouped into partitions according to their function. For

example, there may be an administrative partition, a login partition and a parallel

programming partition. A set of partitions spanning the machine is called a

conﬁguration. Different conﬁgurations may be appropriate to different times of the day

or week. For example, one for daytime running and another for nights and weekends.

Only one conﬁguration, the active conﬁguration, can be running at a time.

The RMS Database 10-17

Download from Www.Somanuals.com. All Manuals Search And Download.

Listing of Tables

The partitions table shown in Table 10.22, describes how nodes are allocated to

partitions in each of the conﬁgurations. It also contains scheduling parameters (see also

Section 7.3) for each partition.

The entries in the partitions table are created by rcontrol. The information is

updated by the Partition Manager, pmanager, as it starts.

Table 10.22: Partitions Table

Field

Type

Description

name

configuration

nodes

char(16) name of the partition, such as par

char(16) name of the conﬁguration, such as day

text

list of nodes in the partition

configured_nodes text

list of nodes conﬁgured into the partition

number of CPUs conﬁgured in

number of free CPUs

whether partition is active (1) or not (0)

time partition was last started

cpus

int

freecpus

active

startTime

status

timelimit

type

int

0 | 1

UTC

char(16) status of the partition

int

char(16) partition type (parallel, login, general,

batch)

time limit in seconds for a parallel job

timeslice

mincpus

int

time slice interval in seconds

minimum number of CPUs that can be allo-

cated

memlimit

int

default memory limit in megabytes

Partition names do not have to be unique but the combination of a partition and a

conﬁguration name must be unique. For example, there may be a partition named

partitions table.

Valid values for the status of the partition are shown in Table B.6.

The type ﬁeld controls how jobs are scheduled on the partition (see also Section 7.2). If

the partition type is parallel then it is exclusively for gang-scheduled parallel

programs. Partitions of type login support interactive logins and load-balanced

sequential program execution. Partitions of type general support login shells,

load-balanced sequential program execution and parallel programs. Partitions of type

batch are under the exclusive control of a batch system. The batch system can use them

for sequential or parallel jobs but interactive use is prohibited.

The freecpus ﬁeld stores the count of the number of CPUs available in the partition. It

is updated by the pmanager each time CPUs are allocated or freed.

10-18 The RMS Database

Download from Www.Somanuals.com. All Manuals Search And Download.

Listing of Tables

The configured_nodes ﬁeld stores the subset of nodes that were conﬁgured in when

the partition was started.

The timeslice ﬁeld stores the interval in seconds between periodic rescheduling of

parallel jobs. Time slicing is disabled when this ﬁeld is null, the default.

The timelimit ﬁeld stores the maximum interval in seconds for which CPUs in a

partition may remain allocated. Time limits are disabled when this ﬁeld is null, the

default.

The memlimit ﬁeld stores the default memory limit in megabytes per CPU for jobs

running on this partition.

10.2.17 The Projects Table

The projects table shown in Table 10.23, lists all of the projects that have been

deﬁned. A project is a list of users. Membership of the project is speciﬁed in the

projects ﬁeld of the users table (see Section 10.2.24). All accounting records include

the project to which a user’s job is being billed (see Table 10.2).

Table 10.23: Projects Table

Field

Type

Description

name

char(16) project name

description text

label describing the project

10.2.18 The Resources Table

The resources table shown in Table 10.24, contains one entry for each current resource

request. The entries in this table are maintained by the Partition Manager, pmanager

(see Section 4.4).

The resources table can grow rapidly. Running the table administration program,

rmstbladm, with the -c option removes old entries. This should be done periodically

using a cron script. See Page 5-44 for details. The data-lifetime attribute in the

attributes table (see Section 10.2.3) determines how old the entries must be before

they are removed.

Table 10.24: Resources Tables

Field

Type

Description

name

char(16) resource name

partition char(16) partition name

(continued on next page)

The RMS Database 10-19

Download from Www.Somanuals.com. All Manuals Search And Download.

Listing of Tables

Table 10.24: Resources Tables (cont.)

Field

username

hostnames text

status

cpus

Type

char(16) name of the user

list of hostnames allocated

char(16) status of the resource

Description

text

list of CPUs allocated

list of nodes allocated

nodes

startTime UTC

time resources were allocated

time resources were deallocated

current priority of the request

scheduler ﬂags for the resource

number of cpus allocated

endTime

priority

flags

UTC

int

ncpus

batchId

memlimit

project

pid

batch system identiﬁer for the request

memory allocated per CPU in megabytes

char(16) name of the project associated with the resource

int pid of the allocating process (prun or allocate).

Resource names are sequence numbers generated automatically.

The hostnames ﬁeld lists the names of the nodes allocated to this request.

Valid values for the status ﬁeld are given in Table B.7.

The cpus and nodes ﬁelds contain lists of CPU and node numbers in use by a job. Each

pair of values deﬁnes a cpus x nodes box allocated to the job. The total number of

CPUs allocated is the sum of the area of these boxes.

The batchid ﬁeld contains the batch system identiﬁer for this request. If the request

was made by LSF then the ﬁeld contains LSB_JOBID. If the request was made by DPCS

then this ﬁeld contains PSUB_JOBID.

10.2.19 The Servers Table

The servers table shown in Table 10.25, contains one entry for each RMS daemon. The

table administration program, rmstbladm (see Page 5-44), creates the entries in the

table. The daemons update their entries when they start up.

Table 10.25: Servers Table

Field

Type

Description

name

char(16) server (daemon) name

(continued on next page)

10-20 The RMS Database

Download from Www.Somanuals.com. All Manuals Search And Download.

Listing of Tables

Table 10.25: Servers Table (cont.)

Field

Type

Description

hostname

port

pid

char(16) host on which the daemon is running

int

IP port number to bind to for this server

process ID of the server

int

rms

0 | 1

where daemon is an RMS server (1) or not (0)

time at which the daemon was started

startTime int

autostart 0 | 1

where daemon starts automatically (1) or not (0)

server status

status

args

char(8)

char(32) site-speciﬁc arguments for the server

The hostname ﬁeld contains the name of the node on which the daemon should run, or

rmshost if it should run on the rmshost node.

The rms ﬁeld speciﬁes whether the server is an RMS daemon or a conventional UNIX

daemon. This controls the method used to determine whether or not the process is

running.

The autostart ﬁeld determines whether a daemon should be restarted automatically if

it exits or is killed by a signal.

10.2.20 The Services Table

The services table shown in Table 10.26, is an internal table used by RMS to deﬁne the

command to execute for each service, the names of the hosts that support the command

and which users have permission to use the service. It contains one entry for each of the

RMS clients that provides a conﬁguration management service (for example, rmsquery

and rcontrol). The entries are created by the table admininstration program,

rmstbladm. See Chapter 5 (RMS Commands) for details of these services.

Table 10.26: Services Table

Field

Type

Description

name

char(16) name of the service

hostname

group

sequential 0 | 1

cmd int

char(16) host on which the service runs, such as rmshost

char(8)

group(s) with access to the service

commands must wait for this command to ﬁnish

command to execute

The hostname ﬁeld contains the name of the host on which the service should run.

The group ﬁeld holds the name of the UNIX group which is allowed to run this service.

The RMS Database 10-21

Download from Www.Somanuals.com. All Manuals Search And Download.

Listing of Tables

Currently, only rms is valid.

Some services, such as rcontrol, must have exclusive access to the database, requiring

that other transactions wait until they complete. The sequential ﬁeld should be set to

1 for these services. Others such as swctrl may run for long periods of time and should

not block the execution of other transactions. sequential should be set to 0 for these

services.

Sample records from the services table are shown in Table 10.27.

Table 10.27: Entries in the Services Table

name

rcontrol rmshost

sql rmshost

hostname group sequential command

rms

/usr/opt/rms/bin/rcontrol

/usr/opt/rms/bin/rmsquery

10.2.21 The Software Products Table

The software_products shown in Table 10.28, contains information about the

components that make up a software product.

Table 10.28: Software Products Table

Field

Type

Description

name

char(16) name of the product

char(16) version of the product

char(16) name of the component

char(16) type of component

version

component

comptype

compversion char(32) version of the component

compattr text component attributes

The only valid value for the comptype ﬁeld is subset.

The compattr ﬁeld currently contains one value which dictates where a software

component will be installed. The possible values are shown in Table 10.29.

Table 10.29: Component Attribute Values

Value Description

opt

Component should only be installed on rmshost

root Component should be installed on rmshost and the

cluster root node(s)

10-22 The RMS Database

Download from Www.Somanuals.com. All Manuals Search And Download.

Listing of Tables

10.2.22 The Switch Boards Table

The switch_boards shown in Table 10.30, contains one entry for each switch board in

the Compaq AlphaServer SC Interconnect. It is created and maintained by the Switch

Network Manager, swmgr (see Section 4.5).

Table 10.30: Switch Boards Table

Field

Type

Description

name

char(8)

board name

module

layer

slot

char(16) name of module containing the board

0–31

layer (or rail) number

slot number in the module

type

status

char(16) board type, such as QM401 or QM402

char(8) board status (ok, absent, unknown, error)

environment char(32) temperature data from thermistors on the board

10.2.23 The Transactions Table

Changes to the state of the machine are made through a request entered in the

transactions table shown in Table 10.31. This table records who made each change,

when it was made and whether or not the operation was successful.

The Transaction Log Manager, tlogmgr (see Section 4.6), actions requests in the

transactions table, running commands on the user’s behalf (in practice, the user here

is a system administrator). This mechanism provides an audit trail, and sequential

ordering of changes in state.

Table 10.31: Transaction Log Table

Field

Type

Description

name

char(16) name of the service

char(16) transaction status

status

ctime

mtime

handle

logfile

UTC

int

creation time

last modiﬁcation time

unique identiﬁer for the transaction

char(32) stdout or stderr log for the transaction

username char(16) user issuing the command

args text arguments for the command

Valid values for the transaction status ﬁeld are given in Table B.8.

The RMS Database 10-23

Download from Www.Somanuals.com. All Manuals Search And Download.

Listing of Tables

An example of the transaction to add a partition is shown below in Table 10.32.

The handle is a unique number, generated automatically, which is passed to both the

service and the client. The service uses the handle to label any output resulting from the

transaction; the client uses the handle to select the resulting entries.

If the service fails, the output log (conventionally in the directory /var/rms/adm/log)

may contain useful diagnostics. Client applications wait for the transaction to complete

and then cat the logﬁle.

Table 10.32: Entry in the Transactions Table

name

status

handle logﬁle

44 /var/rms/adm/log/tr44.log

rcontrol complete

username args

rms

create partition=login nodes=’n[0-3]’

10.2.24 The Users Table

The users table shown in Table 10.33, contains information on each user’s projects.

Table 10.33: Users Table

Field

name

projects text

Type

char(16) login name

list of the user’s projects

Description

The projects ﬁeld may contain a single project name or a comma-separated list of

project names. The wildcard, *, may be speciﬁed as a project name denoting that the

user is a member of all projects.

The ordering of the names in the list is signiﬁcant: the ﬁrst project speciﬁed is the user’s

default project. For purposes of accounting, access control and scheduling, the default

project is assumed unless the user explicitly speciﬁes another project. A project can be

speciﬁed explicitly by using the environment variable RMS_PROJECT or by using the -P

option to prun or allocate.

10-24 The RMS Database

Download from Www.Somanuals.com. All Manuals Search And Download.

Compaq AlphaServer SC Interconnect

Terms

A.1 Introduction

RMS includes support for programs that use Compaq AlphaServer SC Interconnect. This

appendix provides an introduction to Compaq AlphaServer SC Interconnect, deﬁning

terms used elsewhere in this manual.

Before an application process can use Compaq AlphaServer SC Interconnect, it must be

given an Elan capability (see Section C.2), describing the nodes and communications

contexts that it is allowed to use. In general, processes present this capability to the

kernel as they start.

Having granted a request for CPUs, RMS generates an appropriate capability and pushes

it into the RMS kernel module on each of the allocated nodes. The capabilities together

with information on the processes that make up the program can then be accessed

through the librmscall system call library (see Section C.3 for details).

Compaq AlphaServer SC Interconnect is a multistage switch network, also known as a

fat tree network. It is built from 8-way crosspoint switches, known as Elites. Each node is

connected to the network by a network adapter, the Elan. The connection of a 2-stage

(16-node) switch network is shown in Figure A.1.

Compaq AlphaServer SC Interconnect Terms A-1

Download from Www.Somanuals.com. All Manuals Search And Download.

Introduction

Figure A.1: A 2-Stage, 16-Node, Switch Network

Plane 3

Top Switches

Level 0

Plane 0

Uplinks

Level 1

Network Adapters

Net id 15

Level 2

Net id 0

The level is the index of the stage, starting with 0 at the top. Note that in a 2-stage

switch network the Elans are at level 2. Each component has a network ID that

describes how to reach it from the top of the network. The plane is the index of switches

that have the same switch network ID. The interconnection of a 3-stage (64-node) switch

network is shown in Figure A.2.

Figure A.2: A 3-Stage, 64-Node, Switch Network

Plane 15

Plane 0

Level 0

Top Switches

Uplinks

Plane 3

Plane 0

Level 1

Plane 0

Level 2

Plane 0

Level 3

Net id 0

Net id 63

Net id 48

Net id 32

Net id 16

The 3-stage switch network is composed from four 2-stage networks. Each 2-stage

network connects sixteen nodes and has sixteen free uplinks. These uplinks connect the

four 2-stage networks to an additional level of switches to form a 3-stage network,

connecting up to 64 nodes.

A-2 Compaq AlphaServer SC Interconnect Terms

Download from Www.Somanuals.com. All Manuals Search And Download.

Introduction

Four such 64-node networks and an additional stage of switches can be used to construct

a 256-way network. Alternatively, the unused uplinks can be used to double the number

of nodes a switch can connect. This avoids the need to add an additional switch stage but

the resulting network cannot be expanded further. This technique is used in the

128-node network, shown in Figure A.3.

Figure A.3: A 3-Stage, 128-Node, Switch Network

There are switch network modules that connect up to 16 or 128 nodes. The 16-node

network is a rack-mountable module containing a single network board. The 128-node

network is a rack-mountable chassis containing up to 24 network boards: 8 at the front

connecting the nodes to the lower stages; and 16 at the rear providing the upper stages

of the network. A central backplane joins the stages. These switch modules may be

partially populated for networks containing fewer than 128 nodes.

The number of nodes and switches in these networks is shown in Table A.1. The number

of switches refers to the total number of Elite ASICs required to construct the network.

The number of hops refers to the maximum number of links traversed for nodes that

have to communicate through a top switch. The bidirectional nature of the links means

that trafﬁc can be localized to a subtree large enough to span both nodes.

Table A.1: Switch Network Parameters

Name

Levels Nodes Switches Hops

QM-S16

QM-S128

128

The Elan performs automatic routing and broadcast communications. Using the switch

Compaq AlphaServer SC Interconnect Terms A-3

Download from Www.Somanuals.com. All Manuals Search And Download.

Link Errors

network, data can be broadcast directly to a contiguous range of processors: data is

routed up to a node in the tree from which all processors can be reached; then the data is

routed down to all switch outputs in the broadcast range on the way down. Data can be

recombined as it travels through the network to support global reduction operations and

barrier synchronization.

Multiple Elan network adapters may be installed per node, each connected to a different

switch network. This replication can increase fault tolerance and bisectional bandwidth,

assuming each Elan is attached to a separate PCI bus. Each separate Elan/Elite network

attached to a node is known as a layer (or a rail).

The switch network is described by three tables in the database. The switch_boards

table (see Section 10.2.22) gives details of each board, its status and its position in the

machine. The elans table (see Section 10.2.4) and the elites table (see Section 10.2.5)

describe the position in the switch network of each component, its attributes and its

current link state and errors.

RMS includes the control and monitoring daemon, swmgr (see Section 4.5), for managing

the switch network. swmgr probes the switch network control interface for switch boards

to determine the size of the network. It then creates or updates the entries in the elans

table and the elites table. Having done this, the swmgr uses the switch network

control interface to extract error and performance data. This interface is also used for

link continuity (boundary scan) testing.

A.2 Link States

The state of each link in the switch network is recorded in the linkerrors ﬁeld in the

elites table (see Section 10.2.5). Valid values for the states are shown in Section B.4.

Links are normally in the connected state (C). Unconnected links will be in the reset

state (R). Links will be in the unknown state (U) if the swmgr has not run or if the control

cable is not attached to the switch. The states acking (A) and nacking (N) are set by the

switch control software.

A.3 Link Errors

The swmgr logs network errors to the link_errors table (see Section 10.2.11). The

description contains information that should be used in reporting a problem with the

switch network.

A-4 Compaq AlphaServer SC Interconnect Terms

Download from Www.Somanuals.com. All Manuals Search And Download.

RMS Status Values

B.1 Overview

This appendix lists the various states that RMS objects can enter. State information is

stored in the status ﬁeld of the RMS table for the object in question. For example, the

current state of a partition is held in the partitions table (see Section 10.2.16). and

the current state of a node is entered in the nodes table (see Section 10.2.14).

Status changes are recorded in the events table (see Section 10.2.6). Entries in the

events table are identiﬁed by class=X and name=N where X is the class of object and N

is its name. For most tables, the name ﬁeld forms the primary key. In the case of a

partition, both the name and the configuration ﬁelds are required to deﬁne a unique

entry. In the case of an access control, both the name and partition ﬁelds are required.

Status values are shown for the following objects:

• Jobs (see Section B.3)

• Links (see Section B.4)

• Modules (see Section B.5)

• Nodes (see Section B.6)

• Partitions (see Section B.7)

• Resources (see Section B.8)

• Transaction (see Section B.9)

RMS Status Values B-1

Download from Www.Somanuals.com. All Manuals Search And Download.

Link Status Values

B.2 Generic Status Values

There are three generic status values:

This state means that an object is functioning correctly as far as the

relevant RMS daemon can tell.

error

unknown

This state means that one or more errors have been detected. A

description of the problem will be found in the event record.

This state means that the RMS daemon responsible for an object

either has not run or is unable to determine the state of the object.

Where an object, such as a switch board, has many component status values, the ok

state means that all component values are ok. If one or more of the components are in

error then the status will be error.

B.3 Job Status Values

The status of each job is stored in the status ﬁeld of the jobs table. It is updated by

the partition manager when the job is started, suspended, resumed or completed. Valid

job status strings are shown in Table B.1.

Table B.1: Job Status Values

Status

Description

running

suspended

finished

hung

aborted

failed

Processes are scheduled

All processes are suspended

All processes have exited

One or more nodes is not responding

User aborted job

Job failed

killed

Job was killed by a signal

Time limit expired

Job was killed by an administrative user

Partition is blocked or down

expired

syskill

unknown

If a job is killed because one of the nodes it was running on has crashed or was

conﬁgured out then its ﬁnal status value will be failed.

B-2 RMS Status Values

Download from Www.Somanuals.com. All Manuals Search And Download.

Module Status Values

B.4 Link Status Values

Each switch (see Appendix A (Compaq AlphaServer SC Interconnect Terms)) has an

entry in the elites table. Each switch has eight links and the state of each of these

links is recorded in the linkstate ﬁeld of the elites table. The ﬁeld holds eight

characters, one for each link. Valid values for the characters are as shown in Table B.2.

SEE ALSO

rms_getprgid(3)

RMS Kernel Module C-5

Download from Www.Somanuals.com. All Manuals Search And Download.

rms_prgids(3)

NAME

rms_prgids, rms_prginfo, rms_getprgid – Get information on a program or programs

SYNOPSIS

cc [ flag ...

]

file ... -lrmscall [ library ...

]

#include <rms/rmscall.h>

int rms_prgids(int maxids, int *ids, int *nids);

int rms_prginfo(int id, int maxids, pid_t *pids, int nids);

int rms_getprgid(int pid, int *id);

PARAMETERS

Program identiﬁer.

pid

Process identiﬁer.

maxids

ids

Maximum number of identiﬁers to be returned.

Array of program identiﬁers.

pids

nids

Array of process identiﬁers.

Number of program or process identiﬁers returned.

DESCRIPTION

rms_prgids() returns the identiﬁers of each active program. rms_prginfo() returns

the identiﬁers for each process belonging to a particular parallel program – the current

program if id is negative. rms_getprgid() returns the program identiﬁer (if any) for a

particular process – the current process if pid is negative.

RETURN VALUES

Upon successful completion, rms_prgids(), rms_prginfo() and rms_getprgid()

return 0. Otherwise, they return -1 and set errno to indicate the error.

EACCESS

Caller is not permitted to perform this operation.

C-6 RMS Kernel Module

Download from Www.Somanuals.com. All Manuals Search And Download.

rms_prgids(3)

EINVAL

EFAULT

ENOMEM

ESRCH

Count of number of array elements is invalid.

Array address is invalid.

Insufﬁcient kernel memory to perform this operation.

Process or program does not exist.

SEE ALSO

rms_prgcreate(3)

RMS Kernel Module C-7

Download from Www.Somanuals.com. All Manuals Search And Download.

rms_prgsuspend(3)

NAME

rms_prgsuspend, rms_prgresume, rms_prgsignal – Suspend or resume the

processes in a program, deliver a signal to all processes in a program

SYNOPSIS

cc [ flag ...

]

file ... -lrmscall [ library ...

]

#include <rms/rmscall.h>

int rms_prgsuspend(int id);

int rms_prgresume(int id);

int rms_prgsignal(int id, int signo);

PARAMETERS

Program identiﬁer.

Signal number.

signo

DESCRIPTION

rms_prgsuspend() suspends all of the processes in a program. The RMS suspends a

parallel program by calling rms_prgsuspend() on each node that it is using.

rms_prgsuspend() requires administrator privileges. rms_prgresume() resumes all

of the processes in a program. The RMS resumes a parallel program by calling

rms_prgresume() on each node that it is using. rms_prgresume() requires

administrator privileges.

rms_prgsignal() sends a signal to all of the processes in a program. The RMS delivers

signals to a parallel program by calling rms_prgsignal() on each node that it is using.

The function is also used to conﬁrm that all processes belonging to a program have

exited. rms_prgsignal() can be called by the owner of the program or a process with

administrator privileges.

RETURN VALUES

Upon successful completion, rms_prgsuspend(), rms_prgresume() and

rms_prgsignal() return 0. Otherwise, they return -1 and set errno to indicate the

error.

C-8 RMS Kernel Module

Download from Www.Somanuals.com. All Manuals Search And Download.

rms_prgsuspend(3)

EACCESS

ESRCH

Caller is not permitted to perform this operation.

No such program identiﬁer.

EINVAL

Invalid signal number.

SEE ALSO

rms_prgcreate(3)

RMS Kernel Module C-9

Download from Www.Somanuals.com. All Manuals Search And Download.

rms_prgaddcap(3)

NAME

rms_prgaddcap, rms_setcap – Associate Elan capabilities with a program or process

SYNOPSIS

cc [ flag ...

]

file ... -lrmscall [ library ...

]

#include <rms/rmscall.h>

int rms_prgaddcap(int id, int index, ELAN_CAPABILITY *cap);

int rms_setcap(int index, int context);

PARAMETERS

Program identiﬁer.

index

cap

Index of the capability for this program.

Pointer to a capability.

context

Context number for this process.

DESCRIPTION

rms_prgaddcap() and rms_setcap() associate Elan capabilities with a program and

its processes. The function rms_prgaddcap() adds a new capability to a program. It is

called once for each rail in use by the program. Each capability deﬁnes the range of node

numbers and Elan hardware contexts available to a parallel program. Capabilities are

indexed from 0 to ncaps-1 where ncaps is the number of capabilities allocated.

rms_prgaddcap() requires administrator privileges. It is called by rmsd as it creates a

parallel program.

The function rms_setcap() assigns Elan hardware context numbers to the current

process. It is called by the RMS application loader, rmsloader, as it creates each new

application process. The contexts assigned must lie within a previously assigned

capability for the program.

RETURN VALUES

Upon successful completion, rms_prgaddcap() and rms_setcap() return 0.

Otherwise, they return -1 and set errno to indicate the error.

C-10 RMS Kernel Module

Download from Www.Somanuals.com. All Manuals Search And Download.

rms_prgaddcap(3)

EACCESS

ENOMEM

ESRCH

Caller is not permitted to perform this operation.

There was insufﬁcient memory to perform this operation.

Program does not exist.

EFAULT

EINVAL

Capability has invalid address.

Invalid context number (rms_setcap() only).

SEE ALSO

rms_ncaps(3)

RMS Kernel Module C-11

Download from Www.Somanuals.com. All Manuals Search And Download.

rms_ncaps(3)

NAME

rms_ncaps, rms_getcap – Return information on the Elan capabilities allocated to a

process in a parallel program

SYNOPSIS

cc [ flag ...

]

file ... -lrmscall [ library ...

]

#include <rms/rmscall.h>

int rms_ncaps(int *ncaps);

int rms_getcap(int index, ELAN_CAPABILITY *cap);

PARAMETERS

ncaps

Number of capabilities allocated.

Index of a capability to be returned.

Pointer to a capability.

index

cap

DESCRIPTION

rms_ncaps() returns the number of Elan capabilities allocated to a program.

rms_getcap() returns a speciﬁed Elan capability. Capabilities are indexed from 0 to

ncaps-1.

RETURN VALUES

Upon successful completion, rms_ncaps() and rms_getcap() return 0. Otherwise,

they return -1 and set errno to indicate the error.

EFAULT

EINVAL

EEXIST

Invalid address.

Invalid capability identiﬁer.

Calling process is not part of a parallel program.

SEE ALSO

rms_prgaddcap(3)

C-12 RMS Kernel Module

Download from Www.Somanuals.com. All Manuals Search And Download.

rms_prggetstats(3)

NAME

rms_prggetstats – Return resource usage information for a program

SYNOPSIS

cc [ flag ...

]

file ... -lrmscall [ library ...

]

#include <rms/rmscall.h>

int rms_prggetstats(int id, prgstats_t *stats);

PARAMETERS

Program identiﬁer.

stats

Pointer to a program statistics structure.

DESCRIPTION

rms_prggetstats() returns resource usage information for the processes of a parallel

program on the calling node. The RMS kernel module sums resource usage over the

processes in a program. The statistics returned by rms_prggetstats() are the sum

over all processes belonging to program id on this node, including those that have

already exited.

Setting id to -1 instructs the RMS kernel module to return values for the caller’s

program. Resource utilization statistics are available to the owner of the program and to

any process with administrator privileges.

* program statistics

typedef struct {

uint64_t etime;

uint64_t atime;

uint64_t utime;

uint64_t stime;

int ncpus;

int flags;

int mem;

int pageflts

uint64_t memint;

/* elapsed cpu time (millisecs)

/* allocated cpu time (millisecs)

/* user cpu time (millisecs)

/* system cpu time (millisecs)

/* number of cpus allocated

/* program status flags

/* max memory size in megabytes

/* number of page faults

/* memory integral

} prgstats_t;

RMS Kernel Module C-13

Download from Www.Somanuals.com. All Manuals Search And Download.

rms_prggetstats(3)

The elapsed time statistic etime is the time in millisecs since the program was created.

The allocated time statistic atime is the time in millisecs for which CPUs have been

allocated multiplied by the number of CPUs allocated. The utime and etime statistics

are summed over the processes that make up the program (on this node).

If one or more processes belonging to the program is still running, the flags ﬁeld will

contain the value PRG_RUNNING. This will be replaced by PRG_ZOMBIE when the last

process has exited. The program description should be destroyed when this value is seen.

The Partition Manager periodically sums these statistics over the nodes used to run a

parallel program, updating its entry in the accounting statistics (acctstats) table.

RETURN VALUES

Upon successful completion, rms_prggetstats() returns 0. Otherwise, it returns -1

and sets errno to indicate the error.

EACCESS

EFAULT

ESRCH

Caller is not permitted to perform this operation.

Invalid address for statistics array.

No such program.

SEE ALSO

rms_prginfo(3)

C-14 RMS Kernel Module

Download from Www.Somanuals.com. All Manuals Search And Download.

RMS Application Interface

D.1 Introduction

The RMS application interface is provided so that external scheduling modules can make

inquiries about the availability of resources, allocate and deallocate CPUs and perform

job control operations.

The application interface is provided as a dynamic library librmsapi.so. Function

prototypes are deﬁned in the header ﬁle <rms/rmsapi.h>.

RMS Application Interface D-1

Download from Www.Somanuals.com. All Manuals Search And Download.

rms_allocateResource(3)

NAME

rms_allocateResource, rms_deallocateResource – Allocate or deallocate a resource

SYNOPSIS

cc [ flag ...

]

file ... -lrmsapi -lrms [ library ...

]

#include <rms/rmsapi.h>

int rms_allocateResource(char *partition, int cpus, int baseNode,

int nodes, uid_t uid, char *project,

char *requestFlags);

int rms_deallocateResource(int rid);

PARAMETERS

partition

Partition containing the resources.

Total number of CPUs to allocate.

ID of the ﬁrst node to allocate.

cpus

baseNode

nodes

uid

Number of nodes to allocate.

User on whose behalf the resource should be allocated.

User’s project name.

project

requestFlags The request ﬂags currently supported are as follows:

immediate=0 | 1

With a value of 1, this speciﬁes that the request

should fail if resources are not available

immediately.

hwbcast=0 | 1 With a value of 1, this speciﬁes a contiguous range

of nodes and constrains the scheduler to queue the

request until a contiguous range becomes available.

rails=n

In a multirail system, this speciﬁes the number of

rails required, where 1 ≤ n ≤ 32.

Multiple request ﬂags can be given as a comma-separated list;

immediate=1,hwbcast=1, for example.

D-2 RMS Application Interface

Download from Www.Somanuals.com. All Manuals Search And Download.

rms_allocateResource(3)

rid

ID of the resource to deallocate.

DESCRIPTION

rms_allocateResource() allocates CPUs from a named partition. If partition is

NULL, the default partition is used, otherwise the named partition must exist. You can

optionally specify the base node and the number of nodes (as with the allocate and

prun commands). Alternatively, this can be left to the scheduler by passing the value

RMS_UNASSIGNED. If the requested CPUs are not available, the request will block unless

the immediate ﬂag has been entered, in which case it will fail.

If the caller belongs to the rms group then rms_allocateResource() can be used to

allocate CPUs on behalf of another user identiﬁed by uid. In this case, the CPUs will be

available to this user only. If project is not null, the request is subject to the usage

restrictions of, and is accounted to, the speciﬁed project, which must exist. If project is

null, the user’s default project applies.

To run a program on the speciﬁed resource, the environment variable RMS_RESOURCEID

must be set to the value partition.rid (where partition is the name of the partition

and rid is the resource id returned by rms_allocateResource) before executing

prun.

rms_deallocateResource() deallocates a resource that is no longer in use.

RETURN VALUES

Upon successful completion, rms_allocateResource() returns the ID of the resource

allocated. This value should be passed to subsequent calls. A negative integer is

returned on error. The supported error codes are as follows:

-1 Request cannot be met.

-2 Request cannot be met now and immediate was not set to zero.

rms_deallocateResource() returns 0 on success and -1 on error.

SEE ALSO

rms_suspendResource(3), rms_defaultPartition(3)

RMS Application Interface D-3

Download from Www.Somanuals.com. All Manuals Search And Download.

rms_run(3)

NAME

rms_run – Run a program on an allocated resource

SYNOPSIS

cc [ flag ...

#include <rms/rmsapi.h>

int rms_run(int rid, char *cmd, char **args, char *jobFlags);

]

file ... -lrmsapi -lrms [ library ...

]

PARAMETERS

rid

Resource id.

cmd

Command to execute.

args

Arguments for the command.

The job ﬂags currently supported are as follows:

jobFlags

tag=0 | 1

With a value of 1, this speciﬁes that output from

each process should be tagged by the process id.

verbose=n

Set the level of verbose output from the program.

Supported values are 0 quiet, 1 minimal output,

and 2 full output.

Multiple request ﬂags can be given as a comma-separated list;

tag=1,verbose=1, for example.

DESCRIPTION

rms_run() starts a parallel program on a previously allocated resource. Any stdio to

and from the program is forwarded while one or more processes is running.

The call returns when the program completes.

RETURN VALUES

Upon successful completion, rms_run() returns the global OR of the exit status values

of the processes in the parallel program.

D-4 RMS Application Interface

Download from Www.Somanuals.com. All Manuals Search And Download.

rms_run(3)

SEE ALSO

rms_allocateResource(3),

RMS Application Interface D-5

Download from Www.Somanuals.com. All Manuals Search And Download.

rms_suspendResource(3)

NAME

rms_suspendResource, rms_resumeResource, rms_killResource – Job control

operations on allocated resources

SYNOPSIS

cc [ flag ...

]

file ... -lrmsapi -lrms [ library ...

]

#include <rms/rmsapi.h>

int rms_suspendResource(int rid);

int rms_resumeResource(int rid);

int rms_killResource(int rid, int signo);

PARAMETERS

rid

ID of the resource.

Signal to send.

signo

DESCRIPTION

rms_suspendResource() and rms_resumeResource() suspend and resume a

resource speciﬁed by rid. The caller must be either the owner of the resource or a

member of the rms group.

rms_killResource() sends a signal to all of the processes in all of the jobs running on

a speciﬁed resource. The caller must be either the owner of the resource or a member of

the rms group.

RETURN VALUES

Upon successful completion, rms_suspendResource(), rms_resumeResource() and

rms_killResource() return 0. On error they return a negative integer.

SEE ALSO

rms_allocateResource(3)

D-6 RMS Application Interface

Download from Www.Somanuals.com. All Manuals Search And Download.

rms_defaultPartition(3)

NAME

rms_defaultPartition, rms_numCpus, rms_numNodes, rms_freeCpus – Provide

information on RMS partitions

SYNOPSIS

cc [ flag ...

]

file ... -lrmsapi -lrms [ library ...

]

#include <rms/rmsapi.h>

char *rms_defaultPartition();

int rms_numCpus(char *partition);

int rms_numNodes(char *partition);

int rms_freeCpus(char *partition);

PARAMETERS

partition

Name of an active partition.

DESCRIPTION

rms_defaultPartition() assigns the name of the default partition, if one exists, to

partition. rms_numCpus() returns the total number of CPUs in the named partition.

rms_numNodes() returns the total number of nodes in the named partition.

rms_freeCpus() returns the number of free CPUs in the named partition.

The calling process must run on a node in the Compaq AlphaServer SC system.

RETURN VALUES

rms_defaultPartition() returns NULL on error. Other functions return 0 or greater

on success or -1 on error.

SEE ALSO

rms_allocateResource(3)

RMS Application Interface D-7

Download from Www.Somanuals.com. All Manuals Search And Download.

Accounting Summary Script

E.1 Introduction

This appendix describes the example accounting summary script included in

/usr/opt/rms/examples/scripts/accounting_summary and referred to in

Section 9.4.3.

• Section E.2 describes the command line interface.

• Section E.3 shows a sample of output from the script.

• Section E.4 is a listing of the script.

E.2 Command Line Interface

The script has the following command line interface:

accounting_summary [ -hd [-u | -p] [-M | -H] ] [days]

The options are as follows:

-h

-d

Display help on the options.

Delete the accounting records of all resource requests that have

completed after outputting the accounting summary.

-u

Sort the records by user name and then by project name.

Accounting Summary Script E-1

Download from Www.Somanuals.com. All Manuals Search And Download.

Listing of the Script

-p

Sort the records by project name and then by user name. This is the

default.

-M

Show time in minutes rather than seconds.

Show time in hours rather than seconds.

-H

days

Show statistics for the speciﬁed number of days. By default, statistics

are shown for the previous day only.

The script processes the arguments passed to it on the command line and generates a

SQL query which it passes to rmsquery. The query acts on two tables in the RMS

database: the accounting statistics (acctstats) table, and the resources table. The

information returned by the query is formatted to produce output as shown in

Section E.3.

If the -d option is speciﬁed on the command line, after printing the accounting

summary, the script generates another SQL query to delete all accounting records that

have their running ﬁeld set to 0, denoting that the resource request has completed.

If a query fails, the script outputs an error message.

E.3 Example Output

An example of using the script, together with the output produced, follows. After

running the script, all of the accounting records for resource requests that have ﬁnished

are deleted.

# accounting_summary -d

Accounting Summary of Machine atlas at 16:01 Wed 21 Feb 2001

Usage by Project/User For Previous Day

Project

Name

User

Name

CPU

Secs

User

Secs

Sys Number

Secs Sessions

-------------------------------------------------------------------------

default

addy

596

540

29272

286

533

227

duncan

johnt

root

stephen

134

-------------------------------------------------------------------------

Total default 30751 885 201 122

-------------------------------------------------------------------------

Grand Total 30751 885 201 122

-------------------------------------------------------------------------

E-2 Accounting Summary Script

Download from Www.Somanuals.com. All Manuals Search And Download.

Listing of the Script

E.4 Listing of the Script

#!/bin/sh

#######################################################################

# accounting_summary

#######################################################################

usage() {

echo "Usage : $sname [ -u -p -d [ -M | -H ] ] [ days ]"

}

help() {

usage

echo "\t-h\tThis help message"

echo "\t-p\t’Project’ is primary sort field (default)"

echo "\t-u\t’User’ is primary sort field"

echo "\t-M\tShow time in minutes rather than seconds"

echo "\t-H\tShow time in hours rather than seconds"

echo "\t-d\tDelete all ’not running’ accounting records after producing summary"

exit 0

}

# main

sname="accounting_summary"

OS=‘uname‘

if [ "$OS" = "Linux" ]; then

RMSPATH="/usr/bin"

else

if [ "$OS" = "OSF1" ]; then

RMSPATH="/usr/opt/rms/bin"

else

RMSPATH="/opt/rms/bin"

RMSGETTIME=${RMSPATH}/rmsgettime

RMSQUERY=${RMSPATH}/rmsquery

tmpfile="/tmp/accounting_summary_$$"

if [ -x /bin/gawk ]; then

AWK="/bin/gawk"

else

AWK="/usr/local/bin/gawk"

primary="project"

delete=""

hours=""

minutes=""

Accounting Summary Script E-3

Download from Www.Somanuals.com. All Manuals Search And Download.

Listing of the Script

# parse the options

while [ $# -gt 0 ]; do

option=‘echo $1 | sed "s/ˆ-//"‘

if [ "$option" = "$1" ]; then

break

if [ "$option" = "p" ]; then

primary="project"

elif [ "$option" = "u" ]; then

primary="user"

elif [ "$option" = "d" ]; then

delete="1"

elif [ "$option" = "M" ]; then

if [ "$hours" = "1" ]; then

echo "$sname: ERROR : -M and -H are mutually exclusive"

exit 1

minutes="1"

elif [ "$option" = "H" ]; then

if [ "$minutes" = "1" ]; then

echo "$sname: ERROR : -M and -H are mutually exclusive"

exit 1

hours="1"

elif [ "$option" = "h" ]; then

help

else

echo "$sname: ERROR : invalid option $1"

usage

exit 1

shift

done

if [ $# -gt 0 ]; then

days=$1

shift

else

days=1

if [ $# -gt 0 ]; then

usage

exit 1

now=‘$RMSGETTIME‘

secsperday=‘expr 60 \* 60 \* 24‘

daysecs=‘expr $secsperday \* $days‘

E-4 Accounting Summary Script

Download from Www.Somanuals.com. All Manuals Search And Download.

Listing of the Script

starttime=‘expr $now - $daysecs‘

if [ "$primary" = "project" ]; then

primarytitle="Project"

secondarytitle="User"

querystr="select \

acctstats.project,resources.username, \

acctstats.atime,acctstats.utime, acctstats.stime \

from resources,acctstats \

where acctstats.started > $starttime and resources.name=acctstats.name \

order by acctstats.project,resources.username"

else

primarytitle="User"

secondarytitle="Project"

querystr="select \

resources.username,acctstats.project, \

acctstats.atime,acctstats.utime,acctstats.stime \

from resources,acctstats \

where acctstats.started > $starttime and resources.name=acctstats.name \

order by resources.username,acctstats.project"

machine=‘rinfo -m‘

/bin/rm -f $tmpfile

$RMSQUERY $querystr > $tmpfile

if [ $? -ne 0 ]; then

echo "$sname : ERROR : $RMSQUERY $querystr FAILED"

exit 1

cat $tmpfile | \

$AWK ’BEGIN {

primary

secondary

nvalues

= ""

= 3

for (i=1; i<=nvalues; i++) {

values[i]

primvalues[i]

= 0

grandvalues[i] = 0

}

recs

= 0

primrecs

grandrecs

printprim

= 0

= 1

}

function printsortfields() {

if (printprim == 1) {

printf ("%-10.10s %-8.8s ", primary, secondary)

printprim = 0

} else {

Accounting Summary Script E-5

Download from Www.Somanuals.com. All Manuals Search And Download.

Listing of the Script

printf ("\t

%-8.8s ", secondary)

}

function printdashes() {

printf

("---------------------------------------------------------------------\

----\n")

}

function printvals(vals,

i) {

for (i=1; i<=nvalues; i++) {

if (hours == 1 || minutes == 1) {

printf (" %13.2f", vals[i])

} else {

printf (" %13.0f", vals[i])

}

NF > 0 {

if ($1 != primary) {

if (primary != "") {

printsortfields()

printvals(values)

printf (" %6d\n", recs)

printdashes()

printf ("Total %-10.10s

printvals(primvalues)

printf (" %6d\n", primrecs)

printdashes()

", primary)

} else {

datestr = strftime("%H:%M %a %d %b %Y")

titlestr = sprintf ("Accounting Summary of Machine %s at %s", \

machine, datestr)

print titlestr

if (days > 1) {

daystr = sprintf ("%d Days",days)

} else {

daystr = "Day"

}

printf ("Usage by %s/%s For Previous %s\n", primtitle, sectitle, daystr)

printf ("\n")

printf ("%-10.10s %-8.8s

primtitle, sectitle)

if (hours == 1) {

printf ("Name

CPU

User

Sys Number\n", \

Name

Hours

Mins

Hours Sessions\n")

} else {

if (minutes == 1) {

printf ("Name

Name

Mins

Mins Sessions\n")

} else {

E-6 Accounting Summary Script

Download from Www.Somanuals.com. All Manuals Search And Download.

Listing of the Script

printf ("Name

Name

Secs

Secs Sessions\n")

}

printdashes()

}

primary = $1

secondary = $2

for (i=1; i<=nvalues; i++) {

values[i]

primvalues[i]

= 0

}

recs

= 0

primrecs = 0

printprim = 1

} else {

if ($2 != secondary) {

printsortfields()

printvals(values)

printf (" %6d\n", recs)

secondary = $2

for (i=1; i<=nvalues; i++) {

values[i] = 0

}

recs = 0

}

for (i=1; i<nvalues; i++) {

if (hours == 1) {

val = $(i+2) / 3600

} else {

if (minutes == 1) {

val = $(i+2) / 60

} else {

val = $(i+2)

}

values[i] = values[i] + val

primvalues[i] = primvalues[i] + val

grandvalues[i] = grandvalues[i] + val

}

recs++

primrecs++

grandrecs++

}

END {

printsortfields()

printvals(values)

printf (" %6d\n", recs)

printdashes()

printf ("Total %-10.10s

printvals(primvalues)

", primary)

Accounting Summary Script E-7

Download from Www.Somanuals.com. All Manuals Search And Download.

Listing of the Script

printf (" %6d\n", primrecs)

printdashes()

printf ("Grand Total

printvals(grandvalues)

printf (" %6d\n", grandrecs)

printdashes()

}’ primtitle="$primarytitle" sectitle="$secondarytitle" machine=$machine \

days=$days hours=$hours minutes=$minutes

/bin/rm $tmpfile

if [ "$delete" ]; then

echo "$sname : Deleting accounting statistics records"

querystr="delete from acctstats where running=0"

$RMSQUERY $querystr

if [ $? -ne 0 ]; then

echo "$sname : ERROR : $RMSQUERY $querystr FAILED"

exit 1

else

echo "$sname : Accounting statistics records deleted"

exit 0

E-8 Accounting Summary Script

Download from Www.Somanuals.com. All Manuals Search And Download.

Glossary

Abbreviations

API

Application Program Interface — speciﬁcation of interface to software

package (library).

CFS

CGI

Cluster File System — the ﬁle system for Tru64 UNIX clusters.

Common Gateway Interface — a standard method for generating

HTML pages dynamically from an application so that a Web server

and a Web browser can exchange information. A CGI script can be

written in any language and can access various types of data, for

example, a SQL database.

CPU

Central Processing Unit — the part of the computer that executes the

machine instructions that make up the various user and system

programs.

CRC

CVS

Cyclic Redundancy Check — a method of error detection.

Concurrent Versions System — a revision control utility for managing

software releases and controlling the concurrent editing of ﬁles by

multiple software developers.

DIMM

DMA

Dual In-Line Memory Module.

Direct Memory Access — high performance I/O technique where

peripherals read/write memory directly and not through a CPU.

GNU

GNU’s Not UNIX — A UNIX-like development effort of the Free

Software Foundation, headed by Richard Stallman.

Glossary-1

Download from Www.Somanuals.com. All Manuals Search And Download.

HTML

HTTP

HyperText Markup Language — a generic markup language,

comprising a set of tags, that enables structured documents to be

delivered over the World Wide Web and viewed by a browser.

HyperText Transfer Protocol — a communications protocol commonly

used between a Web server and a Web browser together with a URL

(Uniform Resource Locator).

LED

Light-Emitting Diode.

MIMD

Multiple Instruction, Multiple Data — parallel processing computer

architecture characterized as having multiple processors each

(potentially) executing a different instruction sequence on different

data.

MMU

Memory Management Unit — part of CPU that provides protection

between user processes and support for virtual memory.

MPI

Message Passing Interface — parallel processing API.

MPP

Massively Parallel Processing — processing that involves the use of a

large number of processors in a coordinated fashion.

PCI

Peripheral Component Interconnect — the Elan is connected to a

node through this interface.

PDF

Portable Document Format — the page description language used by

Adobe Acrobat, derived from PostScript, for displaying pages on the

screen.

PTE

Page Table Entry — an entry in the page table which maps the base

address of a page to physical memory.

RISC

Reduced Instruction Set Computer — a computer whose machine

instructions represent relatively simple operations that can be

executed very quickly.

RMS

Resource Management System — Quadrics software for managing

clusters of UNIX nodes.

SDRAM

Synchronous Dynamic Random Access Memory — high performance

computer memory architecture.

Glossary-2

Download from Www.Somanuals.com. All Manuals Search And Download.

Shmem

SMP

A one-sided (put/get) inter-process communication interface used on

high-performance parallel systems.

Symmetric MultiProcessor — a computer whose main memory is

shared by more than one processor.

SNMP

Simple Network Management Protocol — a protocol used to monitor

and control devices on the Internet.

SQL

TLB

Structured Query Language — a database language.

Translation Lookaside Buffer — part of the MMU that caches the

result of virtual to physical address translations to minimize

translation times in subsequent accesses to the same page.

URL

UTC

Uniform Resource Locator — a standard protocol for addressing

information on the World Wide Web.

Coordinated Universal Time¹— on UNIX systems it is represented as

the time elapsed in seconds since January 1^st, 1970 at 00:00:00.

Terms

barrier

A synchronization point in a parallel computation that all of the

processes must reach before they are allowed to continue.

bisectional bandwidth

The worst case bandwidth across the diameter of the network.

block

A thread that blocks without relinquishing the processor until a

speciﬁed event occurs.

critical section A section of program statements that can yield incorrect results if

more than one thread tries to execute the section at the same time.

Elan memory

event

The SDRAM on the Elan card.

A parallel-processing synchronization primitive implemented by the

Elan card.

¹Used to be called GMT.

Glossary-3

Download from Www.Somanuals.com. All Manuals Search And Download.

Flit

A communications cycle unit of information.

HTTP cookies

Cookies provide a general mechanism that HTTP server-side

connections use to store and to retrieve information on the client side

of the connection.

main memory The memory normally associated with the main processor, that is to

say, memory on the CPU’s high speed memory bus.

main processor The main CPU (or CPUs for a multi-processor) of a node, typically an

Alpha 21264.

management network

A private network used by the RMS daemons for control and

diagnostics.

multirail system

A system that has more than one Elan card connected to each node,

each Elan card being connected to a different switch network.

multi-threaded program

A multi-threaded program is one that is constructed such that, during

its execution, multiple sequences of instructions are executed

concurrently (possibly by different CPUs). Each thread of execution

has a separate stack but otherwise they all share the same address

space.

node

A system with memory, one or more CPUs and one or more Elan cards

running an instance of the operating system.

poll

Loop and check on each loop whether a speciﬁed event has occurred.

rank

An integer value that identiﬁes a single process from a set of parallel

processes.

reduce

Combine the results of a parallel computation into a single value.

remote memory The memory (Elan card or main) of a node when accessed by another

node over the network.

resource

Glossary-4

A set of CPUs allocated to a user to run one or more parallel jobs.

Download from Www.Somanuals.com. All Manuals Search And Download.

slice

A local copy of a global object.

switch network The network constructed from the Elan cards and Elite cards.

thread

An independent sequence of execution. Every host process has at

least one thread.

virtual memory A feature provided by the operating system, in conjunction with the

MMU, that provides each process with a private address space that

may be larger than the amount of physical memory accessible to the

CPU.

virtual process A (possibly multi-threaded) component of a parallel program

executing on a node.

word

A 64-bit value.

Glossary-5

Download from Www.Somanuals.com. All Manuals Search And Download.

Index

access controls

capability, A-1, C-1

commands, 2-5, 5-1

allocate, 5-3

CPU usage, 6-5, 7-2

memory limits, 6-4, 7-3, 7-5

priority, 6-5, 7-2

records, 6-2

system services, 2-5

table, 10-4

msqladmin, 5-9

nodestatus, 5-8

prun, 5-11

rcontrol, 5-20

accounting

rinfo, 5-32

record, 2-10, 6-1, 6-6

statistics, 10-4

rmsbuild, 5-35

rmsctl, 5-37

allocate, 5-3

rmsexec, 5-39

application node, 2-1

attributes

rmshost, 5-41

rmspost, 8-2

cpu-poll-stats-interval, 5-29

default-priority, 5-29

grace-period, 5-29

rmsquery, 5-42

rmstbladm, 5-44

rmswait, 8-2

node-status-poll-interval, 4-3

pmanager-idletimeout, 5-29

pmanager-queuedepth, 5-28

rms-keep-core, 5-30

rms-poll-interval, 4-3

tables, 10-6

conﬁgurations, 2-10, 10-17

cyclic distribution, 3-1

daemons, 2-4

users-to-mail, 8-3

Database Manager (msqld), 4-2

Event Manager (eventmgr), 4-6

in database, 10-20

Machine Manager (mmanager), 4-3

Partition Manager (pmanager), 4-3

Process Manager (rmsmhd), 4-7

rmsd, 4-7

block distribution, 3-1

Index-1

Download from Www.Somanuals.com. All Manuals Search And Download.

rmsloader, 3-3

Switch Network Manager (swmgr),

4-5

jobs, 10-12

Transaction Log Manager

(tlogmgr), 4-5

database, 2-2, 2-6

administration, 5-44

building, 5-35

load balancing, 5-39

log ﬁles, 10-24

ﬁeld names, 10-1

name, 10-1

SQL interface, 2-6

SQL queries, 5-42

tables, 10-2

Machine Manager, 4-3

machine name, 5-36

management functions, 2-3

access control, 2-9

accounting, 2-9, 6-6, 10-4

resource allocation, 2-7

scheduling, 2-8, 7-1

management server, 2-2

mmanager, 4-3

Database Manager, 4-2

documentation

feedback, 1-3

online, 1-3

Elan, A-1

modules, A-3

Elite, A-1

msqladmin, 5-9

Event Manager, 4-6

eventmgr, 4-6

events, 8-1

msqld, 4-2

handlers, 8-3

mail alerts, 8-3

posting, 8-2

string, 8-1

table, 10-9

waiting, 8-2

network

external, 2-1

management, 2-2

nodes, 2-1, 10-15

switch, 2-2, 4-5

nodes

application, 2-1

interactive, 2-1

naming, 5-36

statistics, 10-16

status, 4-3

gang scheduling, 7-1

table, 10-15

installed components, 10-12

interactive node, 2-1

nodestatus, 5-8

Index-2

Download from Www.Somanuals.com. All Manuals Search And Download.

rms_prgsuspend, C-8

rms_resumeResource, D-6

rms_run, D-4

rms_setcap, C-10

rms_setcorepath, C-3

rms_suspendResource, D-6

rmsbuild, 5-35

Partition Manager, 4-3

partitions, 2-7, 4-3, 10-17

root, 2-7

scheduling, 2-8

pmanager, 4-3

priority, 7-2

rmsctl, 5-37

Process Manager, 4-7

project, 2-9, 6-1

default, 6-1

rmsd, 4-7

rmsexec, 5-39

rmshost, 2-2, 5-41

rmsloader, 3-3, 4-7

rmsmhd, 4-7

membership, 6-2

specifying, 10-24

table, 10-19

rmspost, 8-2

prun, 5-11

rmsquery, 5-42

rmstbladm, 5-44

rmswait, 8-2

rcontrol, 5-20

resources, 10-19

allocation, 2-7, 5-3

rinfo, 5-32

scheduling

algorithm, 2-9

batch, 7-2

constraints, 7-2

CPU usage, 7-2

gang, 7-1

rms_allocateResource, D-2

rms_deallocateResource, D-2

rms_defaultPartition, D-7

rms_freeCpus, D-7

rms_getcap, C-12

idle time, 7-6

rms_getcorepath, C-3

rms_getprgid, C-6

rms_killResource, D-6

rms_ncaps, C-12

rms_numCpus, D-7

rms_numNodes, D-7

rms_prgaddcap, C-10

rms_prgcreate, C-4

rms_prgdestroy, C-4

rms_prggetstats, C-13

rms_prgids, C-6

memory limits, 7-3, 7-5

parameters, 2-8

policies, 2-8, 7-1

preemptive, 2-9

priority, 7-2

queue, 7-4

suspending jobs, 7-6

time limit, 7-3

time sharing, 7-1

time slicing, 7-6

services, 10-21

software products, 10-22

swap space, 7-5

switch network

rms_prginfo, C-6

rms_prgresume, C-8

rms_prgsignal, C-8

Index-3

Download from Www.Somanuals.com. All Manuals Search And Download.

adapters, A-4

barrier synchronization, A-3

boards, 10-23

control interface, 4-5, 10-9

crosspoint switch, A-1

Elan, A-1

Elans, 10-8

Elite, A-1

Elites, 10-9

fat tree network, A-1

layer, A-4

level, A-1

links, A-3

multistage network, A-1

plane, A-1

rail, A-4

reduction, A-3

top switch, A-3

uplinks, A-2

Switch Network Manager, 4-5

swmgr, 4-5

system architecture, 2-1

tlogmgr, 4-5

Transaction Log Manager, 4-5

transactions, 10-23

user commands, 2-5

users, 10-24

Index-4

Download from Www.Somanuals.com. All Manuals Search And Download.

Crane Electric Heater EE 6491 User Manual
Cuisinart Coffeemaker PRC 4 User Manual
Cypress Computer Hardware CY62167E MoBL User Manual
Danby Range DER3099BL User Manual
ELAN Home Systems Video Gaming Accessories ES370 User Manual
EVGA Computer Hardware P55 LE User Manual
Extron electronic Stereo Amplifier MPA 122 User Manual
Fellowes Paper Shredder 120 2 User Manual
Fisher Price Motorized Toy Car 73610 User Manual
Garland Kitchen Grill XE24 User Manual