CCA175 Cloudera CCA Spark and Hadoop Developer Exam exact Exam Questions

CCA Spark and Hadoop Developer Exam

Last Update 5 hours ago Total Questions : 96

The CCA Spark and Hadoop Developer Exam content is now fully updated, with all current exam questions added 5 hours ago. Deciding to include CCA175 practice exam questions in your study plan goes far beyond basic test preparation.

You'll find that our CCA175 exam questions frequently feature detailed scenarios and practical problem-solving exercises that directly mirror industry challenges. Engaging with these CCA175 sample sets allows you to effectively manage your time and pace yourself, giving you the ability to finish any CCA Spark and Hadoop Developer Exam practice test comfortably within the allotted time.

Question # 21

Problem Scenario 75 : You have been given MySQL DB with following details.

user=retail_dba

password=cloudera

database=retail_db

table=retail_db.orders

table=retail_db.order_items

jdbc URL = jdbc:mysql://quickstart:3306/retail_db

Please accomplish following activities.

1. Copy " retail_db.order_items " table to hdfs in respective directory p90_order_items .

2. Do the summation of entire revenue in this table using pyspark.

3. Find the maximum and minimum revenue as well.

4. Calculate average revenue

Columns of ordeMtems table : (order_item_id , order_item_order_id , order_item_product_id, order_item_quantity,order_item_subtotal,order _ item_subtotal,order_item_product_price)

Question # 22

Problem Scenario GG : You have been given below code snippet.

val a = sc.parallelize(List( " dog " , " tiger " , " lion " , " cat " , " spider " , " eagle " ), 2)

val b = a.keyBy(_.length)

val c = sc.parallelize(List( " ant " , " falcon " , " squid " ), 2)

val d = c.keyBy(.length)

operation 1

Write a correct code snippet for operationl which will produce desired output, shown below. Array[(lnt, String)] = Array((4,lion))

Question # 23

Problem Scenario 15 : You have been given following mysql database details as well as other info.

user=retail_dba

password=cloudera

database=retail_db

jdbc URL = jdbc:mysql://quickstart:3306/retail_db

Please accomplish following activities.

1. In mysql departments table please insert following record. Insert into departments values(9999, ' " Data Science " 1 );

2. Now there is a downstream system which will process dumps of this file. However, system is designed the way that it can process only files if fields are enlcosed in( ' ) single quote and separate of the field should be (-} and line needs to be terminated by : (colon).

3. If data itself contains the " (double quote } than it should be escaped by \.

4. Please import the departments table in a directory called departments_enclosedby and file should be able to process by downstream system.

Question # 24

Problem Scenario 29 : Please accomplish the following exercises using HDFS command line options.

1. Create a directory in hdfs named hdfs_commands.

2. Create a file in hdfs named data.txt in hdfs_commands.

3. Now copy this data.txt file on local filesystem, however while copying file please make sure file properties are not changed e.g. file permissions.

4. Now create a file in local directory named data_local.txt and move this file to hdfs in hdfs_commands directory.

5. Create a file data_hdfs.txt in hdfs_commands directory and copy it to local file system.

6. Create a file in local filesystem named file1.txt and put it to hdfs

Question # 25

Problem Scenario 83 : In Continuation of previous question, please accomplish following activities.

1. Select all the records with quantity > = 5000 and name starts with ' Pen '

2. Select all the records with quantity > = 5000, price is less than 1.24 and name starts with ' Pen '

3. Select all the records witch does not have quantity > = 5000 and name does not starts with ' Pen '

4. Select all the products which name is ' Pen Red ' , ' Pen Black '

5. Select all the products which has price BETWEEN 1.0 AND 2.0 AND quantity BETWEEN 1000 AND 2000.

Question # 26

Problem Scenario 68 : You have given a file as below.

spark75/f ile1.txt

File contain some text. As given Below

spark75/file1.txt

Apache Hadoop is an open-source software framework written in Java for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common and should be automatically handled by the framework

The core of Apache Hadoop consists of a storage part known as Hadoop Distributed File System (HDFS) and a processing part called MapReduce. Hadoop splits files into large blocks and distributes them across nodes in a cluster. To process data, Hadoop transfers packaged code for nodes to process in parallel based on the data that needs to be processed.

his approach takes advantage of data locality nodes manipulating the data they have access to to allow the dataset to be processed faster and more efficiently than it would be in a more conventional supercomputer architecture that relies on a parallel file system where computation and data are distributed via high-speed networking

For a slightly more complicated task, lets look into splitting up sentences from our documents into word bigrams. A bigram is pair of successive tokens in some sequence. We will look at building bigrams from the sequences of words in each sentence, and then try to find the most frequently occuring ones.

The first problem is that values in each partition of our initial RDD describe lines from the file rather than sentences. Sentences may be split over multiple lines. The glom() RDD method is used to create a single entry for each document containing the list of all lines, we can then join the lines up, then resplit them into sentences using " . " as the separator, using flatMap so that every object in our RDD is now a sentence.

A bigram is pair of successive tokens in some sequence. Please build bigrams from the sequences of words in each sentence, and then try to find the most frequently occuring ones.

Question # 27

Problem Scenario 62 : You have been given below code snippet.

val a = sc.parallelize(List( " dog M , " tiger " , " lion " , " cat " , " panther " , " eagle " ), 2)

val b = a.map(x = > (x.length, x))

operation1

Write a correct code snippet for operationl which will produce desired output, shown below. Array[(lnt, String)] = Array((3,xdogx), (5,xtigerx), (4,xlionx), (3,xcatx), (7,xpantherx), (5,xeaglex))

Question # 28

Problem Scenario 33 : You have given a files as below.

spark5/EmployeeName.csv (id,name)

spark5/EmployeeSalary.csv (id,salary)

Data is given below:

EmployeeName.csv

E01,Lokesh

E02,Bhupesh

E03,Amit

E04,Ratan

E05,Dinesh

E06,Pavan

E07,Tejas

E08,Sheela

E09,Kumar

E10,Venkat

EmployeeSalary.csv

E01,50000

E02,50000

E03,45000

E04,45000

E05,50000

E06,45000

E07,50000

E08,10000

E09,10000

E10,10000

Now write a Spark code in scala which will load these two tiles from hdfs and join the same, and produce the (name.salary) values.

And save the data in multiple tile group by salary (Means each file will have name of employees with same salary). Make sure file name include salary as well.