Apache Spark

big-data framework

Distributed big data processing engine

Supported languages

Python Scala Java

Concepts

RDDDataFrameSparkSessiontransformationsactions

Pros and Cons

Ventajas

+ Distributed processing
+ APIs for SQL, ML, streaming
+ In-memory very fast
+ Mature ecosystem

Desventajas

- Requires cluster
- Distributed debugging is complex
- Overhead for small data

Casos de Uso

Large-scale ETL
Distributed analytics
ML on big data
Stream processing

Related Technologies

Alternatives