Stack Explorer

Apache Spark

big-data framework

Distributed big data processing engine

Official site

Supported languages

Concepts

RDDDataFrameSparkSessiontransformationsactions

Pros and Cons

Ventajas

  • + Distributed processing
  • + APIs for SQL, ML, streaming
  • + In-memory very fast
  • + Mature ecosystem

Desventajas

  • - Requires cluster
  • - Distributed debugging is complex
  • - Overhead for small data

Casos de Uso

  • Large-scale ETL
  • Distributed analytics
  • ML on big data
  • Stream processing