Hard and soft resources
Consumable resources
Forming resource requests
DQS is actually a simple system, which provides a
multitude of options to accommodate the requirements of a wide
variety of sites, and users. As the number of options increase,
as they do with each succeeding generation of DQS, a user might
mistakenly come to view the system as quite complex. This user
guide is intended to provide an introduction to DQS for the new
user as well as explaining those features most often used by the
experienced user. In particular the concept of "resources"
is explored with attention focused on the new DQS 3.3.2 feature,
"consumable resources".
Any job a user needs to execute on one or more
computers can be a "DQS job". For those whose sole contact
with computers has been through the means of personal UNIX workstations
the concept of running their jobs in a "batch" mode
may be somewhat disconcerting. Users accustomed to submitting
their jobs to mainframe computers will be more familiar with the
attributes of DQS. But, unlike the mainframe system the DQS batch
environment customarily includes multiples of autonomous UNIX
based computational platforms heterogeneous in hardware architecture
and operating system variant.
In its most fundamental form, a DQS job is an extension
of a UNIX script used to run an application, as one might even
on their own personal workstation. Let us use the "traditional"
example of a FORTRAN compilation and execution of a simple application:
f90 test.f -o test
where "./test" simply produces the classical "Hello
World" output which is sent to standard error.
If we then wish to run this same application within
a UNIX script we would create a file called "test.run"
with the following lines:
Note that we redirect the stdout and stderr files to "test.out"
and "test.errors" respectively.
This script would then be executed by the user on a machine of
their choice, most likely their own workstation.
What then is needed to turn this script into a DQS job? Nothing
as long as one doesn't care what machine it will be executed on.
All that is needed is to "submit" this job to the DQS
batch queuing system.
The simple example becomes a "DQS job"
by submitting it to the DQS system with the "qsub"
utility:
qsub test.run
The "qsub" ancillary utility will contact the qmaster
and request that the job be "validated" for execution
within the system. This "validation" process of
determining whether or not the job requires something that
does not exist in the current system. Since our test script makes
no obvious requests for resources (the f90 command is not recognized
as a request for a compiler resource known by DQS) all that is needed
is for any host in this hypothetical cell to be idle, and available
to execute the job.
Let us now take advantage of some basic DQS facilities.
First we would like to have an email message sent to us upon job
termination. We must instruct DQS to perform this task by inserting
a "DQS directive" into the test.run script. By default
DQS interprets any line of script as "DQS directive"
if the first two characters of the line are the string "#$".
This can be changed by the user (see "qsub -C" option in
the Reference Manual).
Thus we add one line to our script:
The DQS directive "#$ -me" tells the system that
a mail message should be sent to the person submitting the job
at the end of the job. We could also have directed that we wish
to have a mail message sent at the beginning of the job and also
if the job aborts with the directive "#$ -meab". The order
of the symbol 'e' 'a' and 'b' in this list is not significant.
Note that the directive could also be communicated
with DQS on the "qsub" command line. Instead of inserting
the directive in the script, we could perform the submission with:
qsub -me test.run
In cases where only a few directives are needed this
approach might be used, but as the user will see many job submissions
will benefit from more complex sets of DQS directives which are
better "captured" in the job script.
Once a user has relinquished their job to the "welcoming
arms" of a queuing system they need a means for monitoring
and controlling its destiny. A first step is to query the system
to establish the status and "DQS identity" of the job.
The "qstat" ancillary utility is used to display the state of
queues and jobs. There are three forms of this display:
default (no options) Displays the state of user
jobs in summary form
full listing (-f option) Displays summary queue
and job status
extended listing (-ext option ) Displays the full
queue and job descriptions
The simplest command then to get in touch with our
job is to execute the command:
qstat
and scan through the output looking for jobs we have
submitted. Instead of being deluged with information about every
other job in the system one can execute:
qstat -u <my user name>
where <my user name> is the login name of the
user who submitted the job.
The output of this variant might look like:
---Pending Jobs------------------------------------------------------
<my user name> my-job-name
dqs-job-number 0:0
QUEUED 03/25 20:40
Which would indicate with some dismay accruing to
<my user name> in that the job is not RUNNING on any machine
in the system. But it is queued with a priority of zero (the
leftmost digit from "0:0"). And our sub-priority is zero
(rightmost digit) indicating that there are no prior jobs for
this user.
or more optimistically the display might offer:
Queue Name Queue Type
Quan Load
  State
Which would hearten us in our endeavors, because
our job is (apparently) executing. The symbols on the output lines
may be a bit confusing because the first line shows the status
of the queue while the second describes "our job" .
Let us examine the queue description first:
Queue Name queue1 each queue
is given a unique name by the administrator
Queue Type batch the default
mode of all DQS queues
Quan 1/1 one resource
("1/ ") of one available (" /1") is utilized
Load 0.14 the load average
measured by the queue1 CPU is 0.14
er all of the queue states
are displayed in single character symbols. The most important of these are
presented between the headings of "Load" and "State".
The "e" shows that the queue is ENABLED. The "r"
shows that the queue is RUNNING.
State UP The normal more of
operation will be shown as "UP"
The job description is a bit less cryptic. The entry
begins with <my user name> and followed by the DQS
assigned job number (2183). The values 0:1 give
the submission priority of the job, defaulted to zero and the
sub-priority :1 which indicates that this is the first
job running for this user. The submission priority is assigned
by the user with the "qsub" option flag "-p" while the
sub-priority is an internal parameter computed during each scheduling
pass for all queues.
The command "qstat -ext" produces a comprehensive
display of queue and job parameters as well as the status obtained
with the "-f" option. Discussion of relevant portions
of these extended displays will appear in later sections.
Often a user will find one or more of their jobs
in the pending queue awaiting assignment to an execution queue.
After review of their pending jobs, this user may decide to change
the jobs submission parameters to affect the jobs future scheduling.
One method for this would be to delete the job and resubmit it.
A more convenient technique is to use the "qalter" ancillary
utility to modify one or more of the parameter,s which the user assigned
at the time of "qsub", or defaulted by DQS when not explicitly
designated by the user.
In the simple example given here, the user provided
no parameters to the QSUB command and hence the submission priority
has been set to the default value of zero. If the user wishes
to increase that priority the "qalter" utility would be invoked
with:
qalter -jid <job number> -p <new priority>
The <job number> is that which DQS assigned
to the job in the pending queue, and the <new priority>
value must be in the range -1024 to +1023.
Except for the job number, any parameter that can
be employed with the "qsub" command can be used with the
"qalter" command, including replacing the script file that
originally accompanied the "qsub" command. The
"qalter" command may not be used for jobs already in the
RUNNING state, with exception of the return of "consumable
resources" (see below).
The user has a number of tools available to work
with their jobs once the jobs are in the queuing system.
For example they may decide to place a "hold" on one
of their jobs in the pending queue so that another job may progress
ahead of it or to delay scheduling until some other event or job
has occurred. First the user may chose to submit a job to the
system with a "hold" placed on the job at the time of
the submission. This step involves the use of the "-h"
option in the "qsub" command. Once a job is submitted the user
can use the "qhold" ancillary utility to place a hold on a job
if it is still in the PENDING queue. The "qhold" uses the
same "-h" option.
The "-h" option is used for system administration
tasks as well as user access. Thus the DQS 3.3.2 Reference Manual
describes four alternatives. The user is permitted only the "u"
(or user hold) or the "n" (no hold) variants. Thus at
job submission the user might place a hold:
qsub . . -h u .. test.run
Or if the job is in the pending queue:
qhold -jid <job number> -h u
Once a "hold" has been placed on a job
in the pending queue it will not be considered eligible for scheduling
until it has either been "released" from the hold or
it is deleted from the queue entirely. A job can be released from
a user invoked "hold" with the "qrls" ancillary
utility:
qrls -jid <job number> -h u
or the user may modify the "hold" state
by using the "qalter" command:
qalter -jid <job number> .. .. -h n
Which will set the user accessible hold to "none".
A use may delete one or more of their own jobs
from the queuing system if the jobs are in either the pending
queue or the executing queue:
qdel <job number>
or:
qdel <job number>,<job number>,
..
Note that the job numbers are separated by commas (,)
and NOT spaces.
The simple example we have been using so far (test.run)
has made no unusual demands for system resources. It presumes
that all queues in the system have a FORTRAN compiler and that
the FORTRAN dialect in our test program is consistent with all
the compilers. Further, memory, disk-space and data-base locality
are also not consequential in this example. These are unrealistic
assumptions in most cases. Most sites using DQS contain heterogeneous
collections of hardware and software and often subdivide these
collections into types of use (long-term jobs, short-term jobs,
etc.) .
The DQS Administrator is supplied with many tools to
organize the system and define the resources available
to the user. Typical resources are CPU memory sizes, hardware
architecture and operating system versions.
Most jobs will have one or more imperative requirements. One of
the most common is the need for a particular hardware/software
system (i.e. AIX-4.3.3). By default requested resources are considered
essential (or "hard") unless the user precedes the request
in the "qsub" command with the option "-soft".
Requirements for multiples of various resources in parallel jobs,
such as 2 or more CPUs can be either "hard" or "soft".
Many users choose to request at least 2 CPUs to run their parallel
job and then request more processors following the option "-soft"
flag in the "qsub" command line or job script. While a non-parallel
user might expect to use the "-soft" option for a request
of the form "I need at least 32 MB of memory but would be
much happier with 64 MB), most site resource allocations will
not make effective use of such a request. The most common use
of the "-soft" option for non-parallel jobs is to state
a preference for a queue without making it a "hard"
demand.
Site resources are by and large static over periods of time like
days or weeks. CPU memory sizes and CPU computing power are not
subject to moment-by-moment changes. When they are modified the
DQS site manager can adjust the resource descriptions to match
the new configurations.
There is a class of resources that varies within short periods
of time. A very common commercial practice, these days, is to
manage software licenses for Compilers, Data Base Managers, etc.
dynamically at a given site. Many sites do not purchase licenses
for all of their extant platforms. A job submitted to DQS must
not be scheduled for execution if that job needs one or more software
licenses in order to complete but those licenses are already in
use by another job.
Another common form of a time-varying resource would be the amount
of shared memory available to a processor in a shared-memory multi-processor
system. Shared local disk space might be another resource which
is depleted and restored as jobs startup and terminate. Resources
of this type are called, by DQS, "consumable resources".
A user specifies the resources they require in the "qsub"
command line or in the DQS script file. A most direct method is to
identify a specific queue as the place for the submitted job to
execute:
qsub
-q <my queue>
That request will require <my queue> for execution. If
the user would prefer, but not insist on that queue they might
make the command line request:
qsub
-soft -q <my queue>
Note that DQS scans the command line and script commands from
left to right. During that process any resource requests to the
right of a "-hard" or "-soft" option flag
will be interpreted as requiring that type of resource. Hence
one could mix hard and soft resources thus:
qsub
-soft -q <my queue> .. .. -hard <some other
resource>..
The typical job request will not demand a specific queue. Instead
the user will request one or more classes of resources which have
been established by the DQS administrator. Let us presume a site
with three different hardware platform architectures for which
there are several CPUs available each. The site administrator
has named the resources with their operating system tags, AIX433,
IRIX65, SOLARIS27. In addition this example site will own one
FORTRAN license each for the different operating systems. The
administrator will name these, XLF, SGIFTN and FORTRAN.
To further complicate our example, each brand of CPU has a different
amount of memory on each of its three separate CPUs, 32 Megabytes,
64 Megabytes and 128 Megabytes.
The example we have been using (test.run) will now be submitted
in a more realistic manner:
qsub -me -l AIX433.and.(mem.gt.32).and (XLF.eq.1) test.run
The command line now has the resource request appended to it.
Requests for resources other than specific queue names begin with
the "-l" flag and consist of a string of resource names,
interspersed with logical and relational operators. Since the
string must have NO embedded blanks, parenthesis may be used
to aid readability.
The resource request is interpreted by DQS as follows:
A command line or DQS script may contain one or more request strings
beginning with the "-l" option flag. Each one of these
strings will request at least one queue to meet the requirement.
Thus:
qsub -l AIX433 -l AIX433
Would request that two queues/CPUs be allocate to this job. This
same request can be restated more simply:
qsub -l (qty.eq.2).and.AIX433
Depending upon the topology of the DQS site and the requirements
of a given job, resource requests can contain a number of elements.
Obviously parallel jobs will require more complex resource requests
than simple single-processor jobs.
Note: Relational operators can be given in FORTRAN or "C"
syntax (.eq. == , .ne. != , .lt. <, .gt.. > , .le. <=,
.ge. >= ). Logical operators can also be given in either language
syntax ( .and. &&, .or. ||, .not. !). For compatibility
with DQS 3.3.2 the comma (,) may be used in place of the logical
".and." operator.
The consumable resource "XLF" requested by the job can
be returned to the license pool by a RUNNING job by executing
the DQS command "qalter" with the "-rc" option:
qalter -rc XLF=1
This command would return one XLF license to the system.
DQS 3.3.2 performs a pre-validation of jobs before
accepting them into the queuing system. This pre-validation consists
of searching all queue definitions to see if the "hard"
resources requested for the job actually exist, even if they may
be in use by some other job at the time this job was submitted.
If all of the "hard" resources do not exist, the job
is rejected, and an error message with the reason for the rejection
returned to the "qsub" ancillary utility and displayed
for the user.
In some cases a user may be aware that a resource
(such as a new) queue will be added or returned to the DQS at
some point in the future. They may wish to submit their job and
place it into the pending queue to await the appearance of the
new resource. This can be accomplished by adding the "FORCE REQUEST"
flag ('-F')to the QSUB command line or DQS script:
qsub -F -l (wild_eyed_scheme).and.mem.gt.1000000
The "-F" flag should be used with care
as no pre-validation is performed and a job may have an erroneous
resource request which will leave it "orphaned" in the
pending queue until either the job's owner or the DQS Administrator
deletes it at a later time.
Once a job has been placed into the RUNNING state
and is executing in one or more queues its parameters cannot be
modified nor can it be moved to another location in the system.
Pending (non-executing) jobs can be moved from one target queue
to another by one of the following methods:
What is a cell? It is the collection of computer hosts and DQS
software which make up a single entity managed by a daemon called
the "qmaster". In the following diagram are displayed
four CPUs. One of these is executing the qmaster daemon. Two processors
are executing the dqs_execd daemons. These two processors are
related to the queues shown here and would execute any job assigned
to those queues. The computer labeled "dqs host" is
not running any of the DQS daemons. It is known to the qmaster
because the site administrator has added that name to the qmaster's
host list. This action makes this host a "trusted DQS host"
as are any hosts running the daemon.
A DQS site may have more than one "cell". The site administrator
may choose to keep each cell independent and separate from the
others. On the other hand they may organize the system so that
one or more cells will have authorized communications with others.
A user logged into a host in one cell can submit jobs to the other
cells, or they can perform the QSTAT function for the other cells.
The user can move one of their jobs in a pending queue in one
cell to the pending queue in another cell. The qmove ancillary utility is
provided for this inter-cell transfer purpose only. The usual
command would be:
qmove <job number>@CELL_C2
Which would move the numbered job from CELL_C2 to the cell in
which the qmove ancillary utility is being executed. Where a user in CELL_C3
wishes to move a job from CELL_C2 to CELL_C1 the command would
be:
qmove -cell CELL_C1 <job number>@CELL_C2
The effects of this move process can be somewhat surprising:
Queue-complexes are arbitrary resource definitions that once defined,
can be associated with queues.
These resource definitions can be combinations of available licenses, memory,
architecture, available software, etc.
Queue-complexes are used by the scheduler at job submission time to determine
a best fit between requested and available resources.
These resource specifications are completely arbitrary, allowing for
highly configurable systems.
A sample queue complex.
Once defined, the complex can be associated with a queue. An arbitrary number
of complexes can be assigned to queues. At job submission time, the qmaster
matches the users requests with the complex definitions to select the queues
which meet the users needs.
The user will note that from time to time one or
more queue may display the SUSPENDED status. When this occurs,
any job executing on that queue is suspended also, but NOT terminated.
As the queue is unsuspended the job is continued from the point
where it was suspended. During the period of its suspension, all
of its files remain open and all memory and paging space allocated
to the job remain in that state.
When does a queue get suspended? The DQS administrator
and anyone designated as the queue's owner can suspend that queue
using the "qmod" command. There is one additional method,
which may appear in some site configurations. If a queue is assigned to
a host, which is also serving as the personal workstation for some
user of the system, they may chose to use the "qidle" command
at that workstation. This ancillary utility is a X-windows facility
which monitors the keyboard and mouse on a workstation. If these
devices are being used the "qidle" facility will suspend the
queues on that workstation.
One additional means by which a queue may be suspended
is to designate it as a subordinate queue to another queue. The usual
application of this facility
is when a host serves both as a parallel and single processor
resource. The single processor queue is made subordinate to the
parallel queue. When a parallel job is started the subordinate
queue and any job being executed there will be suspended.
A major feature of DQS is its support for the scheduling
and management of parallel jobs to be run on two or more of the
hosts in a system. There are three components in submitting parallel
jobs:
qsub -me -l (qty.eq.4).and.(exec.eq.mpirun).and.AIX433
This will request four AIX433 hosts to run
a parallel job. After the job is put into execution, but before
the user's job script is executed, the function "mpirun"
will be executed in the working directory of that user.
The simple "test.run" example we have been
using so far will have operated with the following characteristics
For detailed instructions on changing the jobs'
environment please see "qsub" in the DQS reference manual.
Once a job has disappeared into the maw of DQS, it
is subjected to a variety of manipulations which are intended
to utilize the entire system resources in the most optimum way
while ensuring that each user is given "fair" access
to those resources. The default operation of the scheduler is
often adapted by each site to its own requirements. The basic
process consists of:
After a job has been validated as to requesting
"real" resources, it is tested against the site's queues
to determine which ones it would be eligible for. Of the eligible
queues , the values of the "maximum user jobs" for each
queue is extracted and the smallest one selected. At the same
time the number of jobs in RUNNING state for this user is computed.
If the minimum queue-maximum-user-jobs is not greater than the
number of that user's jobs RUNNING.. the job is rejected at QSUB
time and an error message returned to the user.
This last scheduling pre-validation most certainly
may confuse the reader but it is the core of the "fair play"
method developed at SCRI/CSIT and needs to be used for a while to
demonstrate its behavior and value.
Even when one starts with the simple test case with
which we began this User Guide, it is possible to get into one
or more dead-ends on one's first, second, or whatever, attempts
at using the DQS. We will proceed through a number of typical
problems, which a user may encounter along the way:
The DQS error file (err_file) and accounting file
(acc_file ) contain valuable information which can assist the
knowledgeable user the means for analyzing and correcting their
problems with the system. Refer to
DQS 3.3.2 Error Messages document
for further information.
Introduction
A DQS job
./test
go to toc
Submitting a job
go to toc
Querying job status
queue1
batch
1/1
0.14 er UP
<my user name> 2183 0:1
r RUNNING 02/12/96 19:25:56
go to toc
Modifying a job request
go to toc
Holding, deleting jobs
go to toc
Requesting resources
go to toc
Hard and soft resources
go to toc
Consumable resources
go to toc
Forming resource requests
go to toc
"Potential" resources
go to toc
Moving jobs
go to toc
Cells and queues
go to toc
Qmove
go to toc
Queue-Complexes
mem=128
ARC=RIOS
matlab
pvm3
p5
go to toc
Suspending queues and jobs
go to toc
Parallel jobs
go to toc
Job execution environment
go to toc
DQS scheduling strategies
go to toc
Problem Solving
go to toc