Solution Method
Developing efficient parallel solution schemes for linear systems of
equations are much more complicated than efficient serial solution schemes.
Currently, we use a multifrontal solver which eliminates all DOF which
are local to a process (i.e. the corresponding shape functions that are
zero on the subdomain interface). We use the Hilbert Space Filling
Curve for the elimination ordering. An example of the multifrontal
elimination ordering for a 4 subdomain mesh is:
The resulting matrix after the elimination is called the interface stiffness matrix (or the Schur complement) and the elimination process is called iterative substructuring. The next step in the solution method is using an iterative method to use. In our research, the Conjugate Gradient algorithm is used since the stiffness matrix is symmetric positive definite (assuming appropriate boundary conditions). For stiffness matrices that arise from hp adapted meshes, the condition number can become very high decreasing the convergence rate of the Conjugate Gradient algorithm (this will be true for any other iterative method as well). To increase the convergence rate of the iterative scheme, preconditioning is used.
Preconditioning
Just as efficient parallel solution algorithms are more complex than
efficient serial solution algorithms, efficient parallel preconditioning
methods are also more complex than efficient serial preconditioning methods.
We use the ideas of Domain Decomposition to create our preconditioners.
A description of some of the preconditioners that we examine are:












While the NN Block and the NN and P Block preconditioners act globally, the NN Block becomes very difficult to deal with for even medium sized problems. The Jacobi, Block Jacobi, and Subdomain Overlap preconditioners are easier to deal with but have no global exchange of information. While this is not a big deal for small numbers of subdomains, for large numbers of subdomains the convergence rate can be severely affected. To alleviate this problem, a coarsegrid component is added to the Block Jacobi and Subdomain Overlap preconditioners (the Block Jacobi and Subdomain Overlap preconditioners work well locally while the Jacobi preconditioner does not have a local exchange of information).
CoarseGrid Preconditioner
The coarsegrid preconditioner should be an approximation to the NN
Block preconditioner and much smaller dimension so that it is easy to work
with. Many of the coarsegrid preconditioners that have been developed
so far have one or more of the following constraints:
While this appears to be using exotic shape functions as well (it
actually is), the coarsegrid interface stiffness matrix is algebraically
constructed from the interface stiffness matrix. The following figure
shows how the coarsegrid is constructed from the finegrid for a 1dimensional
problem with no substructuring.




From these results the important property of the preconditioner is that
as the number of subdomains is increased but the same amount of elements
are used in each subdomain, the iteration counts are nearly identical.
This is important because as the number of processors used increases relative
to the number of elements, the amount of iterations becomes nearly constant.
This implies that a problem that is roughly twice as large can be solved
in the same amount of time as long as twice as many processors are used.




In these figures we note the effect of the coarsegrid (CG) component
of the preconditioner significantly reduces the iteration counts for the
Conjugate Gradient algorithm. Of course what we are really interested
is how the preconditioners perform for an hp adapted mesh.
In the following figures the adapted mesh along with the partitioning is
shown. Note that the partitioning results in poorly shaped subdomains
as well as disconnected subdomains.




The following figure shows the iteration counts as the mesh is locally hp adapted for different preconditioners. The coarsegrid and Block Jacobi and coarsegrid and Subdomain Overlap preconditioners have nearly constant iteration counts as the mesh is refined.
For more details, see Chapters 4 and 5 from Andrew Bauer's dissertation (ps, pdf).




