Running programs in parallel mode ( From the userguide )

View previous topic View next topic Go down

Running programs in parallel mode ( From the userguide )

Post by Algerien1970 on Thu 2 Jul - 0:56

This section describes two methods for running WIEN2k on parallel computers.

One method, parallelizing k-points over processors, utilizes c-shell scripts, NFS-file system and passwordless login ((public/private keys). This method works with all standard flavors of Unix without any special requirements. The parallelization is very efficient even on heterogeneous computing environments, e.g. on heterogeneous clusters of workstations, but also on dedicated parallel computers and does NOT need large network bandwidth.

The other parallelization method, which comes new with version WIEN2k_07.3, is based on fine grained methods, MPI and SCALAPACK. It is especially useful for larger systems, if the required memory size is no longer available on a single computer or when more processors than k-points are available. It requires a fast network (at least Gb-Ethernet, better Myrinet or Infiniband) or a shared memory machine. Although not as efficient as the simple k-point parallelization, the current mpi-version has been enhanced a lot and shows very good scaling with the number of processors for most parts. In any case, the number of processors and the size of the problem (number of atoms, matrixsize due to the plane wave basis) must be compatible and typically [NMAT / sqrt(processors)] .gt. 2000 should hold.

The k-point parallelization can use a dynamic load balancing scheme and is therefore usable also on heterogeneous computing environments like networks of workstations or PCs, even if interactive users contribute to the processors' work load.

If your case is large enough, but you still have to use a few k-points, a combination of both parallelization methods is possible (always use k-point parallelism if you have more than 1 k-point).


Last edited by Algerien1970 on Thu 2 Jul - 1:10; edited 2 times in total
avatar
Algerien1970
Forum Manager
Forum Manager

Messages : 485
Date d'inscription : 2015-05-14

http://wien2k.forumalgerie.net

Back to top Go down

Re: Running programs in parallel mode ( From the userguide )

Post by Algerien1970 on Thu 2 Jul - 0:59

k-Point Parallelization


Parts of the code are executed in parallel, namely LAPW1, LAPWSO, LAPW2, LAPWDM, and OPTIC. These are the numerically intensive parts of most calculations.

Parallelization is achieved on the k-point level by distributing subsets of the k-mesh to different processors and subsequent summation of the results. The implemented strategy can be used both on a multiprocessor architecture and on a heterogeneous (even multiplatform) network.

To make use of the k-point parallelization, make sure that your system meets the following requirements:
NFS:All files for the calculation must be accessible under the same name and path. Therefore you should set up your NFS mounts in such a way, that on all machines the path names are the same.

Remote login:

rlogin or ssh to all machines must be possible without specifying a password. Therefore you must either edit your .rhosts file to include all machines you intend to use (not necessary for a shared memory machine), or correctly specify public/private keys for ssh. This can be done by running ``ssh-keygen -t rsa'' and copying the id_rsa.pub key into /.ssh/authorized_keys at the remote sites.

The command for launching a remote shell is platform dependent, and usually can be 'ssh', 'rsh' or 'remsh'. It should be specified during installation when siteconfig_lapw is executed (see chapter 11).
avatar
Algerien1970
Forum Manager
Forum Manager

Messages : 485
Date d'inscription : 2015-05-14

http://wien2k.forumalgerie.net

Back to top Go down

Re: Running programs in parallel mode ( From the userguide )

Post by Algerien1970 on Thu 2 Jul - 1:00

MPI parallelization

Fine grained MPI parallel versions are available for the programs lapw0, lapw1, and lapw2. This parallelization method is based on parallelization libraries, including MPI, ScaLapack, PBlas and FFTW_2.1.5 (lapw0). The required libraries are not included with WIEN2k. On parallel computers, however, they are usually installed. Otherwise, free versions of these libraries are available.

The parallelization affects the naming scheme of the executable programs: the fine grained parallel versions of lapw0/1/2 are called lapw0_mpilapw1[c]_mpi, and lapw2[c]_mpi. These programs are executed by calls to the local execution environments, as in the sequential case, by the scripts x, lapw0para, lapw1para, and lapw2para. On most computers this is done by calling mpirun and should also be configured using siteconfig_lapw.
avatar
Algerien1970
Forum Manager
Forum Manager

Messages : 485
Date d'inscription : 2015-05-14

http://wien2k.forumalgerie.net

Back to top Go down

Re: Running programs in parallel mode ( From the userguide )

Post by Algerien1970 on Thu 2 Jul - 1:02

How to use WIEN2k as a parallel program

To start the calculation in parallel, a switch must be set and an input file has to be prepared by the user.

  • The switch -p switches on the parallelization in the scripts x and run_lapw.

  • In addition to this switch the file .machines has to be present in the current working directory. In this file the machine names on which the parallel processes should be launched, and their respective relative speeds must be specified.


If the .machines file does not exist, or if the -p switch is omitted, the serial versions of the programs are executed.
Generation of all necessary files, starting of the processes and summation of the results is done by the appropriate scripts lapw1paralapwsopara,lapwdmpara and lapw2para (when using -p), and parallel programs lapw0_mpilapw1_mpi, and lapw2_mpi (when using fine grained parallelization has been selected in the .machines file).
avatar
Algerien1970
Forum Manager
Forum Manager

Messages : 485
Date d'inscription : 2015-05-14

http://wien2k.forumalgerie.net

Back to top Go down

Re: Running programs in parallel mode ( From the userguide )

Post by Algerien1970 on Thu 2 Jul - 1:03

The .machines file

The following .machines file describes a simple example. We assume to have 5 computers, (alpha, ... epsilon), where epsilon has 4, and delta and gamma 2 cpus. In addition, gamma, delta and epsilon are 3 times faster than alpha and beta.:

# This is a valid .machines file 
# 
granularity:1 
1:alpha 
1:beta 
3:gamma:2 delta 
3:delta:1 epsilon:4 
residue:delta:2 
lapw0:gamma:2 delta:2 epsilon:4


To each set of processors, defined by a single line in this file, a certain number of k-points is assigned, which are computed in parallel. In each line the weight (relative speed) and computers are specified in the following form:

weight:machine_name1:number1 machine_name2:number2 ...


where weight is an integer (e.g. a three times more powerful machine should have a three times higher weight). The name of the computer is machine_name[1/2/...], and the number of processors to be used on these computers are number[1/2/...]. If there is only one processor on a given computer, the :1 may be omitted. Empty lines are skipped, comment lines start with #.

Assuming there are 8 k-points to be distributed in the above example, they are distributed as follows. The computers alpha and beta get 1 each. Two processors of computer gamma and one processor of computer delta cooperate in a fine grained parallelization on the solution of 3 k-points, and one processor of computer delta plus four processors of computer epsilon cooperate on the solution of 3 k-points. If there were additional k-points, they would be calculated by the first processor (or set of processors) becoming available. With higher numbers of k-points, this method ensures dynamic load balancing. If a processor is busy doing other (e.g., interactive) work, the overall calculation will not stall, but most of its work will be done by other processors (or sets of processors using MPI). This is, however, not an implementation for fail safety: if a process does not terminate (e.g., due to shutdown of a computer) the calculation will never terminate. It is up to the user to handle with such hardware failures by modifying the .machines file and restarting the calculation at the appropriate point.

During the run of lapw1para the file .processes is generated. This file is used by lapw2para to determine which case.vector* to read.

By default lapw1para will generate approximately 3 vector-files per processor, if enough k-points are available for distribution. The factor 3 is called ``granularity'' and allows for some load balancing in heterogeneous environments. If you can be sure that load balancing is not an issue (eg. because you use a queuing-system and can be sure that you will get 100% of the cpus for your jobs) it is recommended to set

granularity:1


for best performance.

On shared memory machines it is advisable to add a ``residue machine'' to calculate the surplus (residual) k-points (given by the expression ) and rely on the operating system's load balancing scheme. Such a ``residue machine'' is specified as

residue:machine_name:number


in the .machines file.

Alternatively, it is also possible to distribute the remaining k-points one-by-one (and not in one junk) over all processors. The option

extrafine:1


can be set in the .machines file.

When using ``iterative diagonalization'' or the $SCRATCH variable (set to a local directory), the k-point distribution must be ``fixed''. This means, the ratio (k-points / processors) must be integer (sloppy called ``commensurate'' at other places in the UG) and granularity:1 should be set.

The line

lapw0:gamma:2 delta:2 epsilon:4


defines the computers used for running lapw0_mpi. In this example the 6 processors of the computers gammadelta, and epsilon run lapw0_mpi in parallel.

If fine grained parallelization is used, each set of processors defined in the .machines file is converted to a single file .machine[1/2/...], which is used in a call to mpirun (or another parallel execution environment).

When using a queuing system (like PBS, LoadLeveler or SUN-Gridengine) one can only request the NUMBER of processors, but does not know on which nodes the job will run. Thus a ``static'' .machinesfile is not possible. On can write a simple shell script, which will generate this file on the fly once the job has been started and the nodes are assigned to this job. 

Examples can be found at our web-site ``http://www.wien2k.at/reg_users/faq''.
avatar
Algerien1970
Forum Manager
Forum Manager

Messages : 485
Date d'inscription : 2015-05-14

http://wien2k.forumalgerie.net

Back to top Go down

Re: Running programs in parallel mode ( From the userguide )

Post by Algerien1970 on Thu 2 Jul - 1:07

How the list of k-points is split

In the setup of the k-point parallel version of LAPW1 the list of k-points in case.klist (Note, that the k-list from case.in1 cannot be used for parallel calculations) is split into subsets according to the weights specified in the .machines file:


where  is the number of k-points to be calculated on processor i.  is always set to a value greater equal one.

A loop over all  processors is repeated until all k-points have been processed.

Speedup in a parallel program is intrinsically dependent on the serial or parallel parts of the code according to Amdahl's law: 

whereas N is the number of processors and P the percentage of code executed in parallel.

In WIEN2k usually only a small part of time is spent in the programs lapw0, lcore and mixer which is very small (negligible) in comparison to the times spent in lapw1 and lapw2. The time for waiting until all parallel lapw1 and lapw2 processes have finished is important too. For a good performance it is therefore necessary to have a good load balancing by estimating properly the speed and availability of the machines used. We encourage the use of testpara_lapw or ``Utils.  testpara'' from w2web to check the k-point distribution over the machines before actually running the programs in parallel.

While running lapw1 and lapw2 in parallel mode, the scripts testpara1_lapw (see 5.2.14) and testpara2_lapw (see 5.2.15) can be used to monitor the succession of parallel execution.
avatar
Algerien1970
Forum Manager
Forum Manager

Messages : 485
Date d'inscription : 2015-05-14

http://wien2k.forumalgerie.net

Back to top Go down

Re: Running programs in parallel mode ( From the userguide )

Post by Algerien1970 on Thu 2 Jul - 1:09

Flow chart of the parallel scripts

To see how files are handled by the scripts lapw1para and lapw2para refer to figures 5.1 and 5.2. After the lapw2 calculations are completed the densities and the informations from the case.scf2_x files are summarized by sumpara.
Note: parallel lapw2 and sumpara take two command line arguments, namely the case.def file but also a number_of_processor indicator.

Figure 5.1: Flow chart of lapw1para

Figure 5.2: Flow chart of lapw2para
avatar
Algerien1970
Forum Manager
Forum Manager

Messages : 485
Date d'inscription : 2015-05-14

http://wien2k.forumalgerie.net

Back to top Go down

Re: Running programs in parallel mode ( From the userguide )

Post by Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum