Pre-Summer Sale Special Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: buysanta

Exact2Pass Menu

Google Professional Data Engineer Exam

Last Update 18 hours ago Total Questions : 400

The Google Professional Data Engineer Exam content is now fully updated, with all current exam questions added 18 hours ago. Deciding to include Professional-Data-Engineer practice exam questions in your study plan goes far beyond basic test preparation.

You'll find that our Professional-Data-Engineer exam questions frequently feature detailed scenarios and practical problem-solving exercises that directly mirror industry challenges. Engaging with these Professional-Data-Engineer sample sets allows you to effectively manage your time and pace yourself, giving you the ability to finish any Google Professional Data Engineer Exam practice test comfortably within the allotted time.

Question # 51

Your company has recently grown rapidly and now ingesting data at a significantly higher rate than it was previously. You manage the daily batch MapReduce analytics jobs in Apache Hadoop. However, the recent increase in data has meant the batch jobs are falling behind. You were asked to recommend ways the development team could increase the responsiveness of the analytics without increasing costs. What should you recommend they do?

A.

Rewrite the job in Pig.

B.

Rewrite the job in Apache Spark.

C.

Increase the size of the Hadoop cluster.

D.

Decrease the size of the Hadoop cluster but also rewrite the job in Hive.

Question # 52

You are designing the database schema for a machine learning-based food ordering service that will predict what users want to eat. Here is some of the information you need to store:

The user profile: What the user likes and doesn’t like to eat

The user account information: Name, address, preferred meal times

The order information: When orders are made, from where, to whom

The database will be used to store all the transactional data of the product. You want to optimize the data schema. Which Google Cloud Platform product should you use?

A.

BigQuery

B.

Cloud SQL

C.

Cloud Bigtable

D.

Cloud Datastore

Question # 53

You work for a large ecommerce company. You store your customers order data in Bigtable. You have a garbage collection policy set to delete the data after 30 days and the number of versions is set to 1. When the data analysts run a query to report total customer spending, the analysts sometimes see customer data that is older than 30 days. You need to ensure that the analysts do not see customer data older than 30 days while minimizing cost and overhead. What should you do?

A.

Set the expiring values of the column families to 30 days and set the number of versions to 2.

B.

Use a timestamp range filter in the query to fetch the customer's data for a specific range.

C.

Set the expiring values of the column families to 29 days and keep the number of versions to 1.

D.

Schedule a job daily to scan the data in the table and delete data older than 30 days.

Question # 54

You receive data files in CSV format monthly from a third party. You need to cleanse this data, but every third month the schema of the files changes. Your requirements for implementing these transformations include:

Executing the transformations on a schedule

Enabling non-developer analysts to modify transformations

Providing a graphical tool for designing transformations

What should you do?

A.

Use Cloud Dataprep to build and maintain the transformation recipes, and execute them on a scheduled basis

B.

Load each month’s CSV data into BigQuery, and write a SQL query to transform the data to a standard schema. Merge the transformed tables together with a SQL query

C.

Help the analysts write a Cloud Dataflow pipeline in Python to perform the transformation. The Python code should be stored in a revision control system and modified as the incoming data’s schema changes

D.

Use Apache Spark on Cloud Dataproc to infer the schema of the CSV file before creating a Dataframe. Then implement the transformations in Spark SQL before writing the data out to Cloud Storage and loading into BigQuery

Question # 55

You are using BigQuery's ML.GENERATE_TEXT function to write marketing materials for a new product launch. The problem is, the AI is generating random text that does not always relate to the product. You want the simplest way to improve the output to generate a marketing copy. What should you do?

A.

Seed the prompt with examples of product descriptions for similar products.

B.

Experiment with different values for the temperature setting in ML.GENERATE_TEXT.

C.

Fine-tune a more specialized language model on a dataset of product features.

D.

Experiment with different values for the top_p setting in ML.GENERATE_TEXT.

Question # 56

Given the record streams MJTelco is interested in ingesting per day, they are concerned about the cost of Google BigQuery increasing. MJTelco asks you to provide a design solution. They require a single large data table called tracking_table. Additionally, they want to minimize the cost of daily queries while performing fine-grained analysis of each day’s events. They also want to use streaming ingestion. What should you do?

A.

Create a table called tracking_table and include a DATE column.

B.

Create a partitioned table called tracking_table and include a TIMESTAMP column.

C.

Create sharded tables for each day following the pattern tracking_table_YYYYMMDD.

D.

Create a table called tracking_table with a TIMESTAMP column to represent the day.

Question # 57

MJTelco is building a custom interface to share data. They have these requirements:

They need to do aggregations over their petabyte-scale datasets.

They need to scan specific time range rows with a very fast response time (milliseconds).

Which combination of Google Cloud Platform products should you recommend?

A.

Cloud Datastore and Cloud Bigtable

B.

Cloud Bigtable and Cloud SQL

C.

BigQuery and Cloud Bigtable

D.

BigQuery and Cloud Storage

Question # 58

You need to compose visualizations for operations teams with the following requirements:

Which approach meets the requirements?

A.

Load the data into Google Sheets, use formulas to calculate a metric, and use filters/sorting to show only suboptimal links in a table.

B.

Load the data into Google BigQuery tables, write Google Apps Script that queries the data, calculates the metric, and shows only suboptimal rows in a table in Google Sheets.

C.

Load the data into Google Cloud Datastore tables, write a Google App Engine Application that queries all rows, applies a function to derive the metric, and then renders results in a table using the Google charts and visualization API.

D.

Load the data into Google BigQuery tables, write a Google Data Studio 360 report that connects to your data, calculates a metric, and then uses a filter expression to show only suboptimal rows in a table.

Question # 59

You are creating a model to predict housing prices. Due to budget constraints, you must run it on a single resource-constrained virtual machine. Which learning algorithm should you use?

A.

Linear regression

B.

Logistic classification

C.

Recurrent neural network

D.

Feedforward neural network

Question # 60

Your company is in a highly regulated industry. One of your requirements is to ensure individual users have access only to the minimum amount of information required to do their jobs. You want to enforce this requirement with Google BigQuery. Which three approaches can you take? (Choose three.)

A.

Disable writes to certain tables.

B.

Restrict access to tables by role.

C.

Ensure that the data is encrypted at all times.

D.

Restrict BigQuery API access to approved users.

E.

Segregate data across multiple tables or databases.

F.

Use Google Stackdriver Audit Logging to determine policy violations.

Go to page: