Labour Day Sale Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: buysanta

Exact2Pass Menu

Question # 4

Which of the following code blocks returns a copy of DataFrame transactionsDf in which column productId has been renamed to productNumber?

A.

transactionsDf.withColumnRenamed("productId", "productNumber")

B.

transactionsDf.withColumn("productId", "productNumber")

C.

transactionsDf.withColumnRenamed("productNumber", "productId")

D.

transactionsDf.withColumnRenamed(col(productId), col(productNumber))

E.

transactionsDf.withColumnRenamed(productId, productNumber)

Full Access
Question # 5

The code block shown below should show information about the data type that column storeId of DataFrame transactionsDf contains. Choose the answer that correctly fills the blanks in the code

block to accomplish this.

Code block:

transactionsDf.__1__(__2__).__3__

A.

1. select

2. "storeId"

3. print_schema()

B.

1. limit

2. 1

3. columns

C.

1. select

2. "storeId"

3. printSchema()

D.

1. limit

2. "storeId"

3. printSchema()

E.

1. select

2. storeId

3. dtypes

Full Access
Question # 6

Which of the following code blocks returns a DataFrame with approximately 1,000 rows from the 10,000-row DataFrame itemsDf, without any duplicates, returning the same rows even if the code

block is run twice?

A.

itemsDf.sampleBy("row", fractions={0: 0.1}, seed=82371)

B.

itemsDf.sample(fraction=0.1, seed=87238)

C.

itemsDf.sample(fraction=1000, seed=98263)

D.

itemsDf.sample(withReplacement=True, fraction=0.1, seed=23536)

E.

itemsDf.sample(fraction=0.1)

Full Access
Question # 7

Which of the following code blocks adds a column predErrorSqrt to DataFrame transactionsDf that is the square root of column predError?

A.

transactionsDf.withColumn("predErrorSqrt", sqrt(predError))

B.

transactionsDf.select(sqrt(predError))

C.

transactionsDf.withColumn("predErrorSqrt", col("predError").sqrt())

D.

transactionsDf.withColumn("predErrorSqrt", sqrt(col("predError")))

E.

transactionsDf.select(sqrt("predError"))

Full Access
Question # 8

The code block displayed below contains one or more errors. The code block should load parquet files at location filePath into a DataFrame, only loading those files that have been modified before

2029-03-20 05:44:46. Spark should enforce a schema according to the schema shown below. Find the error.

Schema:

1.root

2. |-- itemId: integer (nullable = true)

3. |-- attributes: array (nullable = true)

4. | |-- element: string (containsNull = true)

5. |-- supplier: string (nullable = true)

Code block:

1.schema = StructType([

2. StructType("itemId", IntegerType(), True),

3. StructType("attributes", ArrayType(StringType(), True), True),

4. StructType("supplier", StringType(), True)

5.])

6.

7.spark.read.options("modifiedBefore", "2029-03-20T05:44:46").schema(schema).load(filePath)

A.

The attributes array is specified incorrectly, Spark cannot identify the file format, and the syntax of the call to Spark's DataFrameReader is incorrect.

B.

Columns in the schema definition use the wrong object type and the syntax of the call to Spark's DataFrameReader is incorrect.

C.

The data type of the schema is incompatible with the schema() operator and the modification date threshold is specified incorrectly.

D.

Columns in the schema definition use the wrong object type, the modification date threshold is specified incorrectly, and Spark cannot identify the file format.

E.

Columns in the schema are unable to handle empty values and the modification date threshold is specified incorrectly.

Full Access
Question # 9

The code block shown below should set the number of partitions that Spark uses when shuffling data for joins or aggregations to 100. Choose the answer that correctly fills the blanks in the code

block to accomplish this.

spark.sql.shuffle.partitions

__1__.__2__.__3__(__4__, 100)

A.

1. spark

2. conf

3. set

4. "spark.sql.shuffle.partitions"

B.

1. pyspark

2. config

3. set

4. spark.shuffle.partitions

C.

1. spark

2. conf

3. get

4. "spark.sql.shuffle.partitions"

D.

1. pyspark

2. config

3. set

4. "spark.sql.shuffle.partitions"

E.

1. spark

2. conf

3. set

4. "spark.sql.aggregate.partitions"

Full Access
Question # 10

Which of the following code blocks returns DataFrame transactionsDf sorted in descending order by column predError, showing missing values last?

A.

transactionsDf.sort(asc_nulls_last("predError"))

B.

transactionsDf.orderBy("predError").desc_nulls_last()

C.

transactionsDf.sort("predError", ascending=False)

D.

transactionsDf.desc_nulls_last("predError")

E.

transactionsDf.orderBy("predError").asc_nulls_last()

Full Access
Question # 11

Which of the following code blocks returns a DataFrame with an added column to DataFrame transactionsDf that shows the unix epoch timestamps in column transactionDate as strings in the format

month/day/year in column transactionDateFormatted?

Excerpt of DataFrame transactionsDf:

A.

transactionsDf.withColumn("transactionDateFormatted", from_unixtime("transactionDate", format="dd/MM/yyyy"))

B.

transactionsDf.withColumnRenamed("transactionDate", "transactionDateFormatted", from_unixtime("transactionDateFormatted", format="MM/dd/yyyy"))

C.

transactionsDf.apply(from_unixtime(format="MM/dd/yyyy")).asColumn("transactionDateFormatted")

D.

transactionsDf.withColumn("transactionDateFormatted", from_unixtime("transactionDate", format="MM/dd/yyyy"))

E.

transactionsDf.withColumn("transactionDateFormatted", from_unixtime("transactionDate"))

Full Access
Question # 12

Which of the following code blocks returns a new DataFrame in which column attributes of DataFrame itemsDf is renamed to feature0 and column supplier to feature1?

A.

itemsDf.withColumnRenamed(attributes, feature0).withColumnRenamed(supplier, feature1)

B.

1.itemsDf.withColumnRenamed("attributes", "feature0")

2.itemsDf.withColumnRenamed("supplier", "feature1")

C.

itemsDf.withColumnRenamed(col("attributes"), col("feature0"), col("supplier"), col("feature1"))

D.

itemsDf.withColumnRenamed("attributes", "feature0").withColumnRenamed("supplier", "feature1")

E.

itemsDf.withColumn("attributes", "feature0").withColumn("supplier", "feature1")

Full Access
Question # 13

Which of the following is not a feature of Adaptive Query Execution?

A.

Replace a sort merge join with a broadcast join, where appropriate.

B.

Coalesce partitions to accelerate data processing.

C.

Split skewed partitions into smaller partitions to avoid differences in partition processing time.

D.

Reroute a query in case of an executor failure.

E.

Collect runtime statistics during query execution.

Full Access
Question # 14

Which of the following code blocks creates a new DataFrame with 3 columns, productId, highest, and lowest, that shows the biggest and smallest values of column value per value in column

productId from DataFrame transactionsDf?

Sample of DataFrame transactionsDf:

1.+-------------+---------+-----+-------+---------+----+

2.|transactionId|predError|value|storeId|productId| f|

3.+-------------+---------+-----+-------+---------+----+

4.| 1| 3| 4| 25| 1|null|

5.| 2| 6| 7| 2| 2|null|

6.| 3| 3| null| 25| 3|null|

7.| 4| null| null| 3| 2|null|

8.| 5| null| null| null| 2|null|

9.| 6| 3| 2| 25| 2|null|

10.+-------------+---------+-----+-------+---------+----+

A.

transactionsDf.max('value').min('value')

B.

transactionsDf.agg(max('value').alias('highest'), min('value').alias('lowest'))

C.

transactionsDf.groupby(col(productId)).agg(max(col(value)).alias("highest"), min(col(value)).alias("lowest"))

D.

transactionsDf.groupby('productId').agg(max('value').alias('highest'), min('value').alias('lowest'))

E.

transactionsDf.groupby("productId").agg({"highest": max("value"), "lowest": min("value")})

Full Access
Question # 15

Which of the following statements about stages is correct?

A.

Different stages in a job may be executed in parallel.

B.

Stages consist of one or more jobs.

C.

Stages ephemerally store transactions, before they are committed through actions.

D.

Tasks in a stage may be executed by multiple machines at the same time.

E.

Stages may contain multiple actions, narrow, and wide transformations.

Full Access
Question # 16

Which of the following code blocks reads JSON file imports.json into a DataFrame?

A.

spark.read().mode("json").path("/FileStore/imports.json")

B.

spark.read.format("json").path("/FileStore/imports.json")

C.

spark.read("json", "/FileStore/imports.json")

D.

spark.read.json("/FileStore/imports.json")

E.

spark.read().json("/FileStore/imports.json")

Full Access
Question # 17

The code block shown below should convert up to 5 rows in DataFrame transactionsDf that have the value 25 in column storeId into a Python list. Choose the answer that correctly fills the blanks in

the code block to accomplish this.

Code block:

transactionsDf.__1__(__2__).__3__(__4__)

A.

1. filter

2. "storeId"==25

3. collect

4. 5

B.

1. filter

2. col("storeId")==25

3. toLocalIterator

4. 5

C.

1. select

2. storeId==25

3. head

4. 5

D.

1. filter

2. col("storeId")==25

3. take

4. 5

E.

1. filter

2. col("storeId")==25

3. collect

4. 5

Full Access
Question # 18

Which of the following code blocks efficiently converts DataFrame transactionsDf from 12 into 24 partitions?

A.

transactionsDf.repartition(24, boost=True)

B.

transactionsDf.repartition()

C.

transactionsDf.repartition("itemId", 24)

D.

transactionsDf.coalesce(24)

E.

transactionsDf.repartition(24)

Full Access
Question # 19

Which of the following statements about storage levels is incorrect?

A.

The cache operator on DataFrames is evaluated like a transformation.

B.

In client mode, DataFrames cached with the MEMORY_ONLY_2 level will not be stored in the edge node's memory.

C.

Caching can be undone using the DataFrame.unpersist() operator.

D.

MEMORY_AND_DISK replicates cached DataFrames both on memory and disk.

E.

DISK_ONLY will not use the worker node's memory.

Full Access
Question # 20

Which of the following code blocks returns a single row from DataFrame transactionsDf?

Full DataFrame transactionsDf:

1.+-------------+---------+-----+-------+---------+----+

2.|transactionId|predError|value|storeId|productId| f|

3.+-------------+---------+-----+-------+---------+----+

4.| 1| 3| 4| 25| 1|null|

5.| 2| 6| 7| 2| 2|null|

6.| 3| 3| null| 25| 3|null|

7.| 4| null| null| 3| 2|null|

8.| 5| null| null| null| 2|null|

9.| 6| 3| 2| 25| 2|null|

10.+-------------+---------+-----+-------+---------+----+

A.

transactionsDf.where(col("storeId").between(3,25))

B.

transactionsDf.filter((col("storeId")!=25) | (col("productId")==2))

C.

transactionsDf.filter(col("storeId")==25).select("predError","storeId").distinct()

D.

transactionsDf.select("productId", "storeId").where("storeId == 2 OR storeId != 25")

E.

transactionsDf.where(col("value").isNull()).select("productId", "storeId").distinct()

Full Access
Question # 21

Which of the following describes a narrow transformation?

A.

narrow transformation is an operation in which data is exchanged across partitions.

B.

A narrow transformation is a process in which data from multiple RDDs is used.

C.

A narrow transformation is a process in which 32-bit float variables are cast to smaller float variables, like 16-bit or 8-bit float variables.

D.

A narrow transformation is an operation in which data is exchanged across the cluster.

E.

A narrow transformation is an operation in which no data is exchanged across the cluster.

Full Access
Question # 22

Which of the following statements about executors is correct?

A.

Executors are launched by the driver.

B.

Executors stop upon application completion by default.

C.

Each node hosts a single executor.

D.

Executors store data in memory only.

E.

An executor can serve multiple applications.

Full Access
Question # 23

The code block displayed below contains an error. The code block should produce a DataFrame with color as the only column and three rows with color values of red, blue, and green, respectively.

Find the error.

Code block:

1.spark.createDataFrame([("red",), ("blue",), ("green",)], "color")

Instead of calling spark.createDataFrame, just DataFrame should be called.

A.

The commas in the tuples with the colors should be eliminated.

B.

The colors red, blue, and green should be expressed as a simple Python list, and not a list of tuples.

C.

Instead of color, a data type should be specified.

D.

The "color" expression needs to be wrapped in brackets, so it reads ["color"].

Full Access
Question # 24

Which of the following code blocks displays the 10 rows with the smallest values of column value in DataFrame transactionsDf in a nicely formatted way?

A.

transactionsDf.sort(asc(value)).show(10)

B.

transactionsDf.sort(col("value")).show(10)

C.

transactionsDf.sort(col("value").desc()).head()

D.

transactionsDf.sort(col("value").asc()).print(10)

E.

transactionsDf.orderBy("value").asc().show(10)

Full Access
Question # 25

Which of the following code blocks removes all rows in the 6-column DataFrame transactionsDf that have missing data in at least 3 columns?

A.

transactionsDf.dropna("any")

B.

transactionsDf.dropna(thresh=4)

C.

transactionsDf.drop.na("",2)

D.

transactionsDf.dropna(thresh=2)

E.

transactionsDf.dropna("",4)

Full Access
Question # 26

Which of the following code blocks performs an inner join of DataFrames transactionsDf and itemsDf on columns productId and itemId, respectively, excluding columns value and storeId from

DataFrame transactionsDf and column attributes from DataFrame itemsDf?

A.

transactionsDf.drop('value', 'storeId').join(itemsDf.select('attributes'), transactionsDf.productId==itemsDf.itemId)

B.

1.transactionsDf.createOrReplaceTempView('transactionsDf')

2.itemsDf.createOrReplaceTempView('itemsDf')

3.

4.spark.sql("SELECT -value, -storeId FROM transactionsDf INNER JOIN itemsDf ON productId==itemId").drop("attributes")

C.

transactionsDf.drop("value", "storeId").join(itemsDf.drop("attributes"), "transactionsDf.productId==itemsDf.itemId")

D.

1.transactionsDf \

2. .drop(col('value'), col('storeId')) \

3. .join(itemsDf.drop(col('attributes')), col('productId')==col('itemId'))

E.

1.transactionsDf.createOrReplaceTempView('transactionsDf')

2.itemsDf.createOrReplaceTempView('itemsDf')

3.

4.statement = """

5.SELECT * FROM transactionsDf

6.INNER JOIN itemsDf

7.ON transactionsDf.productId==itemsDf.itemId

8."""

9.spark.sql(statement).drop("value", "storeId", "attributes")

Full Access
Question # 27

Which of the following is a characteristic of the cluster manager?

A.

Each cluster manager works on a single partition of data.

B.

The cluster manager receives input from the driver through the SparkContext.

C.

The cluster manager does not exist in standalone mode.

D.

The cluster manager transforms jobs into DAGs.

E.

In client mode, the cluster manager runs on the edge node.

Full Access