“Apache Spark is a fast and general engine for data processing, used primarily by data engineers, data scientists and business analysts. It provides high-level APIs in Scala, Java, Python, R and SQL. It also supports a rich set of higher-level tools that make it attractive for machine learning, graph computation, stream data processing, ETL and business intelligence. In a way, it’s the Swiss army knife of data processing.
Spark was originally started by Matei Zaharia at UC Berkeley in 2009, and was donated to the Apache Software Foundation in June 2013. Spark became an Apache Top-level Project (TLP) in February 2014.
Apache Spark has seen rapid adoption by enterprises across a wide range of industries. Internet powerhouses such as Netflix, Yahoo and Tencent have eagerly deployed Spark at massive scale,…
