Apache Spark
big-data framework
Distributed big data processing engine
Concepts
RDDDataFrameSparkSessiontransformationsactions
Pros and Cons
Ventajas
- + Distributed processing
- + APIs for SQL, ML, streaming
- + In-memory very fast
- + Mature ecosystem
Desventajas
- - Requires cluster
- - Distributed debugging is complex
- - Overhead for small data
Casos de Uso
- Large-scale ETL
- Distributed analytics
- ML on big data
- Stream processing