$GDDI group (parallel runs only)
This group controls the partitioning of a large set of
processors into sub-groups of processors, each of which
might compute separate quantum chemistry tasks. If there
is more than one processor in a group, the task assigned to
that group will run in parallel within that processor
group. Note that the implementation of groups in DDI
requires that the group boundaries be on SMP nodes, not
individual processor cores.
For example, the FMO method can farm out its different
monomer or dimer computations to different processor
subgroups. This is advantageous, as the monomers are
fairly small, and therefore do not scale to very many
processors. However, the monomer, dimer, and maybe trimer
calculations are very numerous, and can be farmed out on a
large parallel system.
At present, only a few procedures in GAMESS can
utilize processor groups, namely
a) the FMO method which breaks large calculations into
many small ones,
b) VSCF, which has to evaluate the energy at many
geometries,
c) numerical derivatives which do the same calculation
at many geometries (for gradients, see NUMGRD in $CONTRL,
for hessians, see HESS=SEMINUM/FULLNUM in $FORCE),
d) replica-exchange MD (see REMD in $MD).
NGROUP = the number of groups in GDDI. Default is 0 which
means standard DDI (all processes in one group).
The number of GDDI groups cannot exceed the number of
logical nodes. In most cases, a single SMP node is one
logical node, unless you split it. To split nodes for
socket DDI, make a proper node list and pass it as a third
argument to rungms (if you use a queuing system, a node
list should be got from it and then nodes can be split).
To split nodes in MPI, define DDI_LOGICAL_NODE_SIZE
(the simplest is to set to 1) to an appropriate place
in rungms.
NSUBGR = (a) If positive, the number of "subgroups" in GDDI/3.
All cores are first divided into NGROUP worlds, then
each world is divided into NSUBGR groups.
At present, only two types of runs can use GDDI/3:
1. semi-analytic FMO Hessian (RUNTYP=HESSIAN) and
2. minimum energy crossing point search with FMO.
(b) If negative, treat each compute process as a
separate group. The value of |NSUBGR| is not used.
Typically, use as NGROUP=1 NSUBGR=-1.
In the running script (rungms), run as "fat" nodes,
for example, node1:cpus=40 and NGROUP=1 NSUBGR=-1
will execute a 40 group GDDI/2 run on 1 node
(with 1 core per group).
Technically, a negative NSUBGR run is a 1 group GDDI,
masqueraded as N GDDI groups.
This can be very useful on a node that does not
allow ssh to it, because running as NGROUP=1 will
not use ssh for parallelization. And there will be
no data server either.
For multiple nodes, also use NGROUP=1 NSUBGR=-1.
Default: 0
PAROUT = flag to create punch and log files for all nodes.
If set to .false, these files are only opened on
group masters.
BALTYP = load balancing at the group level, otherwise
similar to the one in $SYSTEM. BALTYP in $SYSTEM
is used for intragroup load balancing and the one
in $GDDI for intergroup. It applies only to FMO
runs. (default is DLB)
NUMDLB = Do dynamic load balancing jobs in blocks of indices
of size numdlb.
By using values larger than 1, fewer requests for
DLB will be issued reducing the load on master node
(a possible acceleration on fat nodes).
Default: 1
MANNOD = manual node division into groups. Subgroups must
split up on node boundaries (a node contains one
or more cores). Provide an array of node counts,
whose sum must equal the number of nodes fired up
when GAMESS is launched.
Note the distinction between nodes and cores, also
called processers, If you are using six quad-core
nodes, you might enter
NGROUP=3 MANNOD(1)=2,2,2
so that eight CPUs go into each subgroup.
If MANNOD is not given (the most common case), the
NGROUP groups are chosen to have equal numbers of
nodes in them. For example, a 8 node run that
asks for NGROUP=3 will set up 3,3,2 nodes/group.
NODINP = a logical flag to turn on node-specific input,
(each node will read its own input; note that you
should change rungms to copy those files).
as required in REUS or REMD restarts.
Default: false.
Note that nodes with very large core counts may be too
large for good scaling with certain kinds of subgroup runs.
Any such 'fat' nodes can be divided into "logical nodes" by
using the kickoff option :cpus= for TCP/IP-based runs, or
the environmental variable DDI_LOGICAL_NODE_SIZE for MPI-
based runs. See the DDI instructions.
Note on memory usage in GDDI: Distributed memory MEMDDI is
allocated globally, MEMDDI/p words per computing process,
where p is the total number of processors. This means an
individual subgroup has access to MANNOD(i)*ncores*MEMDDI/p
words of distributed memory. Thus, if you use groups of
various sizes, each group will have different amounts of
distributed memory (which can be desirable if you have
fragments of various sizes in FMO).
124 lines are written.
Edited by Shiro KOSEKI on Tue May 17 15:19:38 2022.