In this blog, Flow Science’s IT Manager Matthew Taylor breaks down the different hardware components and suggests some ideal configurations for getting the most out of your FLOW-3D products.
While we publish general supported platforms guidelines, we are often asked for more specific information on choosing new hardware. This article delves a bit deeper into hardware selection for FLOW-3D products as of late 2019. At the end of this article, we’ll give you some ideas for different configurations, depending on the types of simulations you are running.
FLOW-3D is a computationally-intensive program to run, because CFD solver performance is entirely dependent on the floating-point performance of the CPU. FLOW-3D POST is also highly CPU-dependent. While we can’t benchmark every available CPU, we can reasonably compare relative performance.
The best option for estimating FLOW-3D performance for a given CPU, or for comparing the performance between multiple CPU options, is Standard Performance Evaluation Corporation’s SPEC CPU2017 benchmark, specifically their SPECspeed 2017 Floating Point results, which predict CFD solver performance extremely well.
Since this is a paid benchmark, not all CPUs will be tested. Expensive configurations such as multi-socket Intel Xeon machines with large amounts of RAM are well represented.
Another option for CPU comparison is Passmark Software’s CPU benchmark. While their PerformanceTest suite is paid software, there are free trials available. Most CPUs are listed, including most lower-cost options. Although floating-point performance is only one aspect of the full benchmark, it is a decent test of overall performance in a variety of workloads.
Once you’ve determined your budget and chosen some CPUs that fall within that budget, you can use the benchmarks to determine the best performance for the price.
Clock vs. cores
Typically, chips with higher clock speeds include fewer CPU cores. FLOW-3D is well-parallelized, but some operations are inherently single-threaded, such as disk writes. Simulations with frequent and/or large data output often benefit from a higher clock speed rather than more cores. Similarly, multi-threading across cores and sockets introduces overhead, so for very small problems, restricting the number of cores used may increase performance.
Architecture is important. More recent CPUs typically provide more functions per cycle. This means current-gen CPUs will usually outperform older CPUs at the same clock speed. They can also be more electrically efficient, leading to higher performance per watt. At Flow Science, we have machines with current generation 10- to 12-core Core i9 CPUs that outperform older, multi-socket 12-, 16-, or even 24-core Xeons.
We do not overclock CPUs. We consider hardware a multi-year investment, and overclocking increases heat, which reduces longevity. Depending on the CPU, stability may also be decreased. When overclocking CPUs, careful thermal management is highly recommended.
Simulations where all cores are at 100% utilization for long periods typically perform better with HyperThreading disabled. Because 100% utilization is common with FLOW-3D, we recommend disabling HyperThreading when configuring new hardware. The setting is accessed from within the machine’s BIOS settings pages.
For a few workloads, we have observed slightly better performance with HyperThreading enabled. It may be worth testing your common simulation types in both configurations for the best possible runtimes.
The performance increase when using multiple cores is not linear. For example, upgrading from a 12 core CPU to a 24 core CPU will not halve simulation runtimes. When choosing more than 16 to 32 CPU cores, depending on your simulation type, utilizing the HPC versions of FLOW-3D and FLOW-3D CAST or moving to FLOW-3D CLOUD should be considered.
How about AMD Ryzen or Epyc CPUs?
AMD is topping the benchmark charts with some CPUs and their prices can be extremely competitive. We have tested FLOW-3D with a small selection of AMD CPUs, and our current feeling is that Epyc is not ideal, and that Ryzen performs fairly well. Heat is still an issue that must be carefully addressed. We are aware of a Windows bug affecting their 32-core options that dramatically degrades performance; Linux would be recommended for these CPUs.
At least 4GB of RAM per processor core is a good starting point for FLOW-3D. When postprocessing results using FLOW-3D POST, a significant amount of RAM is recommended. The following recommendations are from an earlier blog post, Good Hardware Means Improved FLOW-3D POST Performance:
- Extra-large (200 million+ cells): At least 128GB
- Large (Between 60-150 million cells): 64-128GB
- Medium (Between 30-60 million cells): 32-64GB
- Small (30 million cells and below): At least 32GB
Faster RAM can improve performance, but large amounts of very fast RAM can be costly. The effect of RAM speed is hard to measure effectively, which makes price/performance hard to determine. We recommend choosing the fastest RAM that fits your budget, and when in doubt, choosing a larger amount of RAM rather than higher speed.
Skylake and newer considerations
We have determined that the way RAM is physically populated on the board is extremely important for performance on Skylake and newer architectures. For a CPU that supports six memory channels, 6 or 12 DIMMs should be populated identically. For four channel CPUs, 4 or 8 DIMMs should be populated identically.
An unbalanced configuration, where the memory channels or DIMM size/speed are mismatched, reduces performance significantly.
We very strongly recommend nVidia’s Quadro graphics hardware. These are available at a range of price points and are included in higher-end, workstation-class notebook PCs. A card with at least 3GB of VRAM should be chosen.
AMD graphics cards are not recommended. Intel integrated graphics are explicitly not supported. Some laptops include both Intel integrated and nVidia discrete cards. In those situations, you will need to use the nVidia control panel to use discrete graphics for FLOW-3D and FLOW-3D POST.
Extremely low-end motherboard graphics on servers and HPC nodes, which do not support 3D acceleration, CANNOT be used and are not supported.
On Windows, the Quadro driver automatically enables OpenGL over a Remote Desktop connection.
On Linux, additional software is required to support OpenGL over a remote connection, and that typically requires Quadro hardware.
VNC servers do not support OpenGL over a remote connection and are not recommended or supported.
We periodically test the freely-distributed VirtualGL/TurboVNC combination, and historically it has not worked with FLOW-3D. Significant issues such as freezing when accessing OpenGL windows and strange geometry artifacts have been observed.
Since Flow Science cannot support remote visualization solutions, we recommend buying a product with support for installation, implementation, and maintenance.
We strongly recommend using solid state drives (SSD) rather than traditional spinning hard disks. Prices have reduced a great deal over the past several years and continue to drop. One ideal option is to use a 1TB or larger SSD for short-term storage of simulation results and postprocessing, with a larger HDD RAID array or external storage option for archiving older results.
Given the above recommendations, here are some ideas for possible configurations and how they might be used:
4- to 6-core Intel Core i7 or i9 notebook with 32GB RAM, Quadro graphics with 1-2GB VRAM
- Light water and environmental simulations (fewer than 500,000 cells) with simple physics like gravity and viscosity
- Light postprocessing
6- to 10-core Intel Core i7 with 32-64GB RAM, Quadro graphics with 2-3GB VRAM
- Aerospace sloshing dynamics simulations
- Design simulations for municipal water/wastewater applications, e.g., contact tanks, settling tanks
- Metal casting simulations with low cell counts, low velocity relative to cell size
- Medium postprocessing
10- to 18-core Intel Core i9 workstation with 64-128GB RAM, 3GB+ VRAM
- Fish passage design simulations
- Metal casting simulations with thick walls, slow-filling castings
- Light to medium additive manufacturing simulations
- Heavy postprocessing
16- to 36-core dual Intel Xeon workstation or server with 128GB+ RAM, 3GB+ VRAM
- Spillway design simulations with air entrainment and particles
- Metal casting simulations with thin walls, high velocities
- Nozzle filling simulations for automotive fuel tanks
- Heavy additive manufacturing simulations
- Heavy postprocessing
36+ core High Performance Computing cluster, multiple nodes, multi-socket per node, low-latency, high-bandwidth network interconnects, significant RAM per compute node. Graphics node with 128GB+ RAM, Quadro with 3GB+ VRAM, and Penguin Scyld Cloud Workstation for remote visualization
- Extremely computationally intensive simulations, e.g., hydraulics simulations of river flows or high-pressure die casting simulations for parts that require fine mesh resolution to resolve complex geometry
- Significant runtime improvements for medium and heavy simulations
- Heavy postprocessing
Please feel free to contact us to review your vendor quotes or parts lists for new hardware. We’re happy to help you acquire the right machine(s) to maximize your success with FLOW-3D.