1. The official website only requires storage of more than 8GB (Impala requires the machine to be configured at 128GB). Of course, the real need for efficient processing is still to have as much memory as possible. If the memory exceeds 200GB, you need to be careful because the JVM has problems managing memory exceeding 200GB and requires special configuration. \\The memory capacity is large enough, and it has to be really allocated to Spark.
2. Spark's hardware configuration requirements involve CPU, memory, disks and other aspects. CPU: On the CPU side, Spark requires high-performance processors to support its distributed computing framework. CPU configuration should take into account the degree of parallelism, that is, the number of copies or granularity of distributed data sets. The higher the degree of parallelism, the finer the granularity of the data, the more fragmented the data, and the more scattered the data, which requires the CPU to have strong processing power to cope with high concurrent computing tasks.

3. Minimum configuration: refers to code name: the minimum required configuration that spark games can be installed.

4. Summary GPUs provide significant performance improvements for Spark big data analysis through parallel computing capabilities, and are especially suitable for processing large-scale, computation-intensive tasks. Tests by China Telecom show that with reasonable configuration, GPUs can achieve 7-58 times speedup while reducing hardware costs.



发表评论