FLOW-3D/MP is the distributed memory version of FLOW-3D designed to run on high-performance computer clusters allowing engineers to tackle problems with very large computational domains or long simulation runtimes. It uses a hybrid MPI-OpenMP methodology to parallelize and consequently speed up calculations on multiple CPU cores across the compute nodes of a cluster. The simulation domain is decomposed into multiple sub-domains which are distributed across the compute nodes of the cluster thus dividing the computational work between them. The synchronization of the solution on different sub-domains is accomplished by exchanging data between nodes using a message passing interface (MPI) library. Within each sub-domain, OpenMP threads are spawned to further parallelize the computation. This combination of MPI and OpenMP parallelization results in enhanced performance of the solver, with significantly reduced runtimes for large simulations.
Why use FLOW-3D/MP?
The current hardware in high performance computing (HPC) consists of multi-core, multi-CPU nodes (ccNUMA shared memory) connected over a fast network infrastructure such as Infiniband. With advantages of better computational performance and efficiency, decreased power consumption, reduced costs and superior flexibility, multi-core clusters are all-pervasive in the field of scientific computing.
Multi-core clusters have empowered the users with the ability to increase grid resolutions to improve solution accuracy and to resolve more features in the flow. FLOW-3D/MP has been designed and optimized to exploit the best features of such clusters, providing significantly reduced runtimes while retaining the solution accuracy. Finally, memory limitations of stand-alone workstations can be overcome by the distributed memory approach of FLOW-3D/MP.
What kind of performance can I expect?
Naturally, the actual performance of FLOW-3D/MP varies between simulations, but the solver has shown scaling up to 512 cores for a range of applications including metal casting, water and environmental, microfluidics and aerospace. Presented on the benchmarks page are the details along with performance plots for several cases. For an ideal case, which entails having a fully fluid-filled computational domain, FLOW-3D/MP has shown scaling to 1024 cores.
How to use FLOW-3D/MP?
FLOW-3D/MP is typically installed and run on a compute cluster. The compute cluster can be a stand-alone cluster or part of a supercomputing facility. The graphical user interface provided with FLOW-3D/MP allows the user to easily set up and run simulations. For large scale clusters where simulations are run using a job scheduler like PBS, Torque, SGE, etc., users have access to a job submission utility that is highly configurable and scheduler independent.
What’s in FLOW-3D/MP v6.1?
FLOW-3D/MP v6.1 is based on FLOW-3D v11.1. Some feature highlights include the new particle model, squeeze pins model, mooring lines and active simulation control. All models are compatible with the hybrid MPI-OpenMP methodology for FLOW-3D/MP.
Computational load balance is a critical aspect of FLOW-3D/MP and greatly affects the performance of the solver. Load balancing can be categorized as static (before the simulation starts) and dynamic (during the course of the simulation).
In order to achieve static load balancing, FLOW-3D/MP provides an Automatic Decomposition Tool that divides a single computational domain into multiple sub-domains (MPI domains), distributing the active cells as evenly as possible between them. This minimizes the synchronization time between the sub-domains and enhances performance. In v6.1, the decomposition step has been combined with the solver step to avoid a break in the setup, enhancing the user experience.
In order to achieve dynamic load balancing, the dynamic thread balancing feature can be used to adjust the OpenMP threads during the course of the simulation. For one-fluid, free-surface simulations, performance gains of up to 20% have been achieved using this feature.
Other significant improvements in v6.1 include the optimization of the GMRES pressure solver, batch processing and report generation, and raster data interface used in modeling flood events in complex terrains. Please refer to the FLOW-3D v11.1 page for more information regarding the new models and features.
* Performance metric is defined as the number of times a simulation can be run in 24 hrs. Higher bars represent better performance.