Spark Transformations
We all know the following fact: 1, RDD are immutable 2, Never modify RDD in place 3, Transform RDD to another RDD There are 2 different transformations for RDD, one is narrow transformation: 
transformations like map, flatMap, filter all are narrow transformation, which means shuffle won't happen, so it's fast, it's speed just depends on: 1, availability of local memory 2, CPU speed another is wide transfomration: 
transformations like groupByKey, reduceByKey, repartition all are wide tranformation, the network speed in shuffle is the key to it's speed, so it's slower the final comparison:  摘自:http://www.yjs001.cn/bigdata/spark/40455850856940820301.html
Spark Transformations
|