Spark is an analytics engine from Apache that has become very popular for large-scale data processing. It allows you to write applications quickly in Java, Scala, Python, R, and SQL and it runs on Hadoop, Mesos, Kubernetes, standalone, or in the cloud.
How well do you know Spark? What follows is a self-test of 25 questions not centered around any one certification but based on the general concepts documented at the official Apache Spark website. In all cases, pick the best answer(s) to each question. The answers appear at the end of the questions. Good luck!
1. Which of the following statements is true?
A. By default, all data is shared across different Spark applications
B. Spark applications are limited in their support of external storage systems
C. Commit logs are used only when external storage systems are read from
D. Data cannot be shared across different Spark applications without writing it to an external storage system
2. Spark currently supports authentication for RPC channels using a shared secret. Authentication can be turned on by setting which configuration parameter?
A. spark.rpc
B. spark.secret
C. spark.channel
D. spark.authenticate
3. Spark uses local disks to store data that doesn’t fit in RAM, as well as to preserve intermediate output between stages. Because of this, how many disks per node are recommended?
A. 2-4
B. 4-8
C. 8-12
D. 12-16
4. Each driver program in Spark has a web UI. By default, this is on which port?
A. 2080
B. 4040
C. 6070
D. 7501
5. What type of variables can be used to cache a value in memory on all nodes?
A. broadcast
B. accumlator
C. compilation
D. assembly
6. As a cluster-computing framework, which of the following does Spark most closely serve as a substitute for?
A. Scrubber
B. HDFS
C. MapReduce
D. Hadoop
7. In Spark, a unit of work sent to one executor is known as:
A. batch
B. thread
C. job
D. task
8. The length in bits of the encryption key you want Spark to generate can vary. Which of the following is the default value?
A. 128
B. 192
C. 256
D. 264
9. Which of the following is the programming language Spark is written in?
A. Java
B. Python
C. Hibernate
D. Scala
10. Since Spark 2.0, which of the following is the main programming interface of Spark?
A. RDD
B. Dataset
C. NoSQL
D. Jakarta
11. Within Spark, an immutable distributed collection of objects is know as which of the following?
A. COD
B. DDS
C. ERD
D. RDD
12. Which of the following is a client of Spark’s execution environment?
A. Hatchling
B. Product
C. Spawn
D. Milieu
E. Context
13. Which object in your main program allows Spark applications to run as independent sets of processes on a cluster?
A. SparkCluster
B. SparkCombine
C. SparkContext
D. SparkConstant
14. Which of the following Spark metric instances represent the process in which your SparkContext is created?
A. sector
B. driver
C. partition
D. track
15. Which of the following is a fundamental data structure of Spark?
A. RDDs
B. PCBs
C. NFSs
D. QLDs
Please visit GoCertify to attempt the remaining 10 questions of this quiz.
ANSWERS
1. D
2. D
3. B
4. B
5. A
6. C
7. D
8. A
9. D
10. B
11. D
12. E
13. C
14. B
15. A
Important Update: We have updated our Privacy Policy to comply with the California Consumer Privacy Act (CCPA)