Test your knowledge of Apache Spark topics

Posted on

December 13, 2018

by
‍

How much do you know about using Apache Spark to attacklarge-scale data processing problems? Let's find out!

Spark is an analytics engine from Apache that has become very popular for large-scale data processing. It allows you to write applications quickly in Java, Scala, Python, R, and SQL and it runs on Hadoop, Mesos, Kubernetes, standalone, or in the cloud.

How well do you know Spark? What follows is a self-test of 25 questions not centered around any one certification but based on the general concepts documented at the official Apache Spark website. In all cases, pick the best answer(s) to each question. The answers appear at the end of the questions. Good luck!

1. Which of the following statements is true?
A. By default, all data is shared across different Spark applications
B. Spark applications are limited in their support of external storage systems
C. Commit logs are used only when external storage systems are read from
D. Data cannot be shared across different Spark applications without writing it to an external storage system

2. Spark currently supports authentication for RPC channels using a shared secret. Authentication can be turned on by setting which configuration parameter?
A. spark.rpc
B. spark.secret
C. spark.channel
D. spark.authenticate

3. Spark uses local disks to store data that doesn’t fit in RAM, as well as to preserve intermediate output between stages. Because of this, how many disks per node are recommended?
A. 2-4
B. 4-8
C. 8-12
D. 12-16

4. Each driver program in Spark has a web UI. By default, this is on which port?
A. 2080
B. 4040
C. 6070
D. 7501

5. What type of variables can be used to cache a value in memory on all nodes?
A. broadcast
B. accumlator
C. compilation
D. assembly

6. As a cluster-computing framework, which of the following does Spark most closely serve as a substitute for?
A. Scrubber
B. HDFS
C. MapReduce
D. Hadoop

7. In Spark, a unit of work sent to one executor is known as:
A. batch
B. thread
C. job
D. task

8. The length in bits of the encryption key you want Spark to generate can vary. Which of the following is the default value?
A. 128
B. 192
C. 256
D. 264

9. Which of the following is the programming language Spark is written in?
A. Java
B. Python
C. Hibernate
D. Scala

10. Since Spark 2.0, which of the following is the main programming interface of Spark?
A. RDD
B. Dataset
C. NoSQL
D. Jakarta

11. Within Spark, an immutable distributed collection of objects is know as which of the following?
A. COD
B. DDS
C. ERD
D. RDD

12. Which of the following is a client of Spark’s execution environment?
A. Hatchling
B. Product
C. Spawn
D. Milieu
E. Context

13. Which object in your main program allows Spark applications to run as independent sets of processes on a cluster?
A. SparkCluster
B. SparkCombine
C. SparkContext
D. SparkConstant

14. Which of the following Spark metric instances represent the process in which your SparkContext is created?
A. sector
B. driver
C. partition
D. track

15. Which of the following is a fundamental data structure of Spark?
A. RDDs
B. PCBs
C. NFSs
D. QLDs

Please visit GoCertify to attempt the remaining 10 questions of this quiz.

ANSWERS

1. D
2. D
3. B
4. B
5. A
6. C
7. D
8. A
9. D
10. B
11. D
12. E
13. C
14. B
15. A

About the Author

Emmett Dulaney is a professor at Anderson University and the author of several books including Linux All-in-One For Dummies and the CompTIA Network+ N10-008 Exam Cram, Seventh Edition.

Posted to topic:

Certification

Important Update: We have updated our Privacy Policy to comply with the California Consumer Privacy Act (CCPA)