hp Adaptive finite element discretizations of partial differential equations have been shown to give exponential convergence rates in the error with respect to the number of degrees of freedom of the system (or other appropriate measure of work).  To be able to compute the numerical solution as fast as possible, parallel methods must be used (see the section on AFEAPI).

Solution Method
Developing efficient parallel solution schemes for linear systems of equations are much more complicated than efficient serial solution schemes.  Currently, we use a multi-frontal solver which eliminates all DOF which are local to a process (i.e. the corresponding shape functions that are zero on the subdomain interface).  We use the Hilbert Space Filling Curve for the elimination ordering.  An example of the multi-frontal elimination ordering for a 4 subdomain mesh is:

The resulting matrix after the elimination is called the interface stiffness matrix (or the Schur complement) and the elimination process is called iterative substructuring.  The next step in the solution method is using an iterative method to use.  In our research, the Conjugate Gradient algorithm is used since the stiffness matrix is symmetric positive definite (assuming appropriate boundary conditions).  For stiffness matrices that arise from hp adapted meshes, the condition number can become very high decreasing the convergence rate of the Conjugate Gradient algorithm (this will be true for any other iterative method as well).  To increase the convergence rate of the iterative scheme, preconditioning is used.

Just as efficient parallel solution algorithms are more complex than efficient serial solution algorithms, efficient parallel preconditioning methods are also more complex than efficient serial preconditioning methods.  We use the ideas of Domain Decomposition to create our preconditioners.  A description of some of the preconditioners that we examine are:

NN Block
NN and P Block

Block Jacobi
Subdomain Overlap

While the NN Block and the NN and P Block preconditioners act globally, the NN Block becomes very difficult to deal with for even medium sized problems.  The Jacobi, Block Jacobi, and Subdomain Overlap preconditioners are easier to deal with but have no global exchange of information.  While this is not a big deal for small numbers of subdomains, for large numbers of subdomains the convergence rate can be severely affected.  To alleviate this problem, a coarse-grid component is added to the Block Jacobi and Subdomain Overlap preconditioners (the Block Jacobi and Subdomain Overlap preconditioners work well locally while the Jacobi preconditioner does not have a local exchange of information).

Coarse-Grid Preconditioner
The coarse-grid preconditioner should be an approximation to the NN Block preconditioner and much smaller dimension so that it is easy to work with.  Many of the coarse-grid preconditioners that have been developed so far have one or more of the following constraints:

Our coarse-grid is created as follows:

While this appears to be using exotic shape functions as well (it actually is), the coarse-grid interface stiffness matrix is algebraically constructed from the interface stiffness matrix.  The following figure shows how the coarse-grid is constructed from the fine-grid for a 1-dimensional problem with no substructuring.

Iteration counts for 3-dimensional test problem while performing constant h refinement for the Coarse-Grid and Block Jacobi preconditioner for different amounts of subdomains
Iteration counts for 3-dimensional problem while performing constant p enrichment for the Coarse-Grid and Block Jacobi preconditioner for different amounts of subdomains

From these results the important property of the preconditioner is that as the number of subdomains is increased but the same amount of elements are used in each subdomain, the iteration counts are nearly identical.  This is important because as the number of processors used increases relative to the number of elements, the amount of iterations becomes nearly constant.  This implies that a problem that is roughly twice as large can be solved in the same amount of time as long as twice as many processors are used.
Iteration counts for 2-dimensional test problem while performing constant h refinement for the 64 subdomain case for different preconditioners
Iteration counts for 3-dimensional test problem while performing constant h refinement for the 64 subdomain case for different preconditioners

In these figures we note the effect of the coarse-grid (CG) component of the preconditioner significantly reduces the iteration counts for the Conjugate Gradient algorithm.  Of course what we are really interested is how the preconditioners perform for an hp adapted mesh.  In the following figures the adapted mesh along with the partitioning is shown.  Note that the partitioning results in poorly shaped subdomains as well as disconnected subdomains.
hp Adapted mesh with a contour plot of the local polynomial order
64 subdomains of the partitioned mesh

The following figure shows the iteration counts as the mesh is locally hp adapted for different preconditioners.  The coarse-grid and Block Jacobi and coarse-grid and Subdomain Overlap preconditioners have nearly constant iteration counts as the mesh is refined.

For more details, see Chapters 4 and 5 from Andrew Bauer's dissertation   (ps, pdf).
ACM2E Home
About Us