Question | Click to View Answer |
Create a +----------+------------+------------+
| name | birthYear | specialty |
+----------+------------+------------+
| stallman | 1953 | programmer |
| newton | 1643 | physics |
| frink | 1965 | professor |
+----------+------------+------------+
|
A case class Dork(name: String, birthYear: Int, specialty: String)
val dorks = List(
Dork("stallman", 1953, "programmer"),
Dork("newton", 1643, "physics"),
Dork("frink", 1965, "professor")
)
var dorkDs: org.apache.spark.sql.Dataset[Dork] = spark.createDataset(dorks)
The rest of the questions in this quiz will depend on the |
Display the contents of the |
The dorkDs.show()
|
Print the schema of the |
dorkDs.printSchema()
Datasets have defined schemas (unlike RDDs, which do not have defined schemas). The defined schemas of Datasets allow them to be queried and joined, similar to tables in a relational database. |
Create a |
The val youngDorkDs = dorkDs.filter {
dorkDs("birthYear") > 1900
}
The val youngDorkDs = dorkDs.filter {
$"birthYear" > 1900
}
|
Create a Dataset called |
A The def coolify(name: String): String = {
s"$name is cool"
}
val coolifyUdf = udf[String, String](coolify)
val dorkDs1 = dorkDs.withColumn("cool", coolifyUdf($"name"))
dorkDs1.show()
|
Use the +----------+
| name |
+----------+
| stallman |
| newton |
| frink |
+----------+
|
The dorkDs.createOrReplaceTempView("dorks")
val dorkNamesDs = spark.sql("select name from dorks")
The val dorkNamesDs = dorkDs.select("name")
|
Sort the |
val sortedDorkDs = dorkDs.orderBy("name")
You can achieve the same result the SQL way. dorkDs.createOrReplaceTempView("dorks")
val sortedDorkDs = spark.sql("select * from dorks order by name")
|
What type of object does |
|
What type of object does |
|
What does |
|