Difference between RDDs and DataFrames in Spark

bookmark

RDDs (Resilient Distributed Datasets) DataFrames
Low-level distributed data structure High-level distributed data structure
Provides a fault-tolerant collection Provides a distributed collection
Supports only unstructured data Supports structured and semi-
  structured data
Offers limited optimizations Offers advanced optimizations
No built-in schema enforcement Enforces schema and data integrity
More suitable for complex computations More suitable for SQL-like operations