Hadoop mapreduce faciliteted to solve complex problems on distributed systems but with some limitations. This course will discuss limitation of Hadoop mapreduce and how Spark overcomes those limitations. We describe RDDs which is core of Spark and In memory computation. Understanding of persistent RDDs, in memory computation, and solving Big Data problems using Spark with Scala is core of this course. Discussion will move through SparkSQL and problem solving with SparkSQL dataframes. Hand-on is the parallel movement for all the discussion. Concept on dealing with streaming data with Spark Streaming is also an important topic, which is included. Last part of course is Spark program optimization. Optimization of Spark core, Spark SQL, Spark streaming and optimizing the utilization of cluster system . We discuss Spark on Yarn, Standalone and Mesos cluster too.
Learn More