Summer Sale Special 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: ex2p65

Exact2Pass Menu

Question # 4

When running a pipeline that has a BigQuery source, on your local machine, you continue to get permission denied errors. What could be the reason for that?

A.

Your gcloud does not have access to the BigQuery resources

B.

BigQuery cannot be accessed from local machines

C.

You are missing gcloud on your machine

D.

Pipelines cannot be run locally

Full Access
Question # 5

Which of the following are examples of hyperparameters? (Select 2 answers.)

A.

Number of hidden layers

B.

Number of nodes in each hidden layer

C.

Biases

D.

Weights

Full Access
Question # 6

Which row keys are likely to cause a disproportionate number of reads and/or writes on a particular node in a Bigtable cluster (select 2 answers)?

A.

A sequential numeric ID

B.

A timestamp followed by a stock symbol

C.

A non-sequential numeric ID

D.

A stock symbol followed by a timestamp

Full Access
Question # 7

You are planning to use Google's Dataflow SDK to analyze customer data such as displayed below. Your project requirement is to extract only the customer name from the data source and then write to an output PCollection.

Tom,555 X street

Tim,553 Y street

Sam, 111 Z street

Which operation is best suited for the above data processing requirement?

A.

ParDo

B.

Sink API

C.

Source API

D.

Data extraction

Full Access
Question # 8

You have a job that you want to cancel. It is a streaming pipeline, and you want to ensure that any data that is in-flight is processed and written to the output. Which of the following commands can you use on the Dataflow monitoring console to stop the pipeline job?

A.

Cancel

B.

Drain

C.

Stop

D.

Finish

Full Access
Question # 9

If you want to create a machine learning model that predicts the price of a particular stock based on its recent price history, what type of estimator should you use?

A.

Unsupervised learning

B.

Regressor

C.

Classifier

D.

Clustering estimator

Full Access
Question # 10

Your business users need a way to clean and prepare data before using the data for analysis. Your business users are less technically savvy and prefer to work with graphical user interfaces to define their transformations. After the data has been transformed, the business users want to perform their analysis directly in a spreadsheet. You need to recommend a solution that they can use. What should you do?

A.

Use Dataprep to clean the data, and write the results to BigQuery Analyze the data by using Connected Sheets.

B.

Use Dataprep to clean the data, and write the results to BigQuery Analyze the data by using Looker Studio.

C.

Use Dataflow to clean the data, and write the results to BigQuery. Analyze the data by using Connected Sheets.

D.

Use Dataflow to clean the data, and write the results to BigQuery. Analyze the data by using Looker Studio.

Full Access
Question # 11

What are two methods that can be used to denormalize tables in BigQuery?

A.

1) Split table into multiple tables; 2) Use a partitioned table

B.

1) Join tables into one table; 2) Use nested repeated fields

C.

1) Use a partitioned table; 2) Join tables into one table

D.

1) Use nested repeated fields; 2) Use a partitioned table

Full Access
Question # 12

Which of these statements about exporting data from BigQuery is false?

A.

To export more than 1 GB of data, you need to put a wildcard in the destination filename.

B.

The only supported export destination is Google Cloud Storage.

C.

Data can only be exported in JSON or Avro format.

D.

The only compression option available is GZIP.

Full Access
Question # 13

Which SQL keyword can be used to reduce the number of columns processed by BigQuery?

A.

BETWEEN

B.

WHERE

C.

SELECT

D.

LIMIT

Full Access
Question # 14

Which of these statements about BigQuery caching is true?

A.

By default, a query's results are not cached.

B.

BigQuery caches query results for 48 hours.

C.

Query results are cached even if you specify a destination table.

D.

There is no charge for a query that retrieves its results from cache.

Full Access
Question # 15

The YARN ResourceManager and the HDFS NameNode interfaces are available on a Cloud Dataproc cluster ____.

A.

application node

B.

conditional node

C.

master node

D.

worker node

Full Access
Question # 16

Which methods can be used to reduce the number of rows processed by BigQuery?

A.

Splitting tables into multiple tables; putting data in partitions

B.

Splitting tables into multiple tables; putting data in partitions; using the LIMIT clause

C.

Putting data in partitions; using the LIMIT clause

D.

Splitting tables into multiple tables; using the LIMIT clause

Full Access
Question # 17

Which TensorFlow function can you use to configure a categorical column if you don't know all of the possible values for that column?

A.

categorical_column_with_vocabulary_list

B.

categorical_column_with_hash_bucket

C.

categorical_column_with_unknown_values

D.

sparse_column_with_keys

Full Access
Question # 18

Does Dataflow process batch data pipelines or streaming data pipelines?

A.

Only Batch Data Pipelines

B.

Both Batch and Streaming Data Pipelines

C.

Only Streaming Data Pipelines

D.

None of the above

Full Access
Question # 19

If a dataset contains rows with individual people and columns for year of birth, country, and income, how many of the columns are continuous and how many are categorical?

A.

1 continuous and 2 categorical

B.

3 categorical

C.

3 continuous

D.

2 continuous and 1 categorical

Full Access
Question # 20

Which of the following IAM roles does your Compute Engine account require to be able to run pipeline jobs?

A.

dataflow.worker

B.

dataflow.compute

C.

dataflow.developer

D.

dataflow.viewer

Full Access
Question # 21

In order to securely transfer web traffic data from your computer's web browser to the Cloud Dataproc cluster you should use a(n) _____.

A.

VPN connection

B.

Special browser

C.

SSH tunnel

D.

FTP connection

Full Access
Question # 22

You are working on a sensitive project involving private user data. You have set up a project on Google Cloud Platform to house your work internally. An external consultant is going to assist with coding a complex transformation in a Google Cloud Dataflow pipeline for your project. How should you maintain users’ privacy?

A.

Grant the consultant the Viewer role on the project.

B.

Grant the consultant the Cloud Dataflow Developer role on the project.

C.

Create a service account and allow the consultant to log on with it.

D.

Create an anonymized sample of the data for the consultant to work with in a different project.

Full Access
Question # 23

If you're running a performance test that depends upon Cloud Bigtable, all the choices except one below are recommended steps. Which is NOT a recommended step to follow?

A.

Do not use a production instance.

B.

Run your test for at least 10 minutes.

C.

Before you test, run a heavy pre-test for several minutes.

D.

Use at least 300 GB of data.

Full Access
Question # 24

An external customer provides you with a daily dump of data from their database. The data flows into Google Cloud Storage GCS as comma-separated values (CSV) files. You want to analyze this data in Google BigQuery, but the data could have rows that are formatted incorrectly or corrupted. How should you build this pipeline?

A.

Use federated data sources, and check data in the SQL query.

B.

Enable BigQuery monitoring in Google Stackdriver and create an alert.

C.

Import the data into BigQuery using the gcloud CLI and set max_bad_records to 0.

D.

Run a Google Cloud Dataflow batch pipeline to import the data into BigQuery, and push errors to another dead-letter table for analysis.

Full Access
Question # 25

Your company is using WHILECARD tables to query data across multiple tables with similar names. The SQL statement is currently failing with the following error:

# Syntax error : Expected end of statement but got “-“ at [4:11]

SELECT age

FROM

bigquery-public-data.noaa_gsod.gsod

WHERE

age != 99

AND_TABLE_SUFFIX = ‘1929’

ORDER BY

age DESC

Which table name will make the SQL statement work correctly?

A.

‘bigquery-public-data.noaa_gsod.gsod‘

B.

bigquery-public-data.noaa_gsod.gsod*

C.

‘bigquery-public-data.noaa_gsod.gsod’*

D.

‘bigquery-public-data.noaa_gsod.gsod*`

Full Access
Question # 26

Your company uses a proprietary system to send inventory data every 6 hours to a data ingestion service in the cloud. Transmitted data includes a payload of several fields and the timestamp of the transmission. If there are any concerns about a transmission, the system re-transmits the data. How should you deduplicate the data most efficiency?

A.

Assign global unique identifiers (GUID) to each data entry.

B.

Compute the hash value of each data entry, and compare it with all historical data.

C.

Store each data entry as the primary key in a separate database and apply an index.

D.

Maintain a database table to store the hash value and other metadata for each data entry.

Full Access
Question # 27

You are creating a model to predict housing prices. Due to budget constraints, you must run it on a single resource-constrained virtual machine. Which learning algorithm should you use?

A.

Linear regression

B.

Logistic classification

C.

Recurrent neural network

D.

Feedforward neural network

Full Access
Question # 28

You want to process payment transactions in a point-of-sale application that will run on Google Cloud Platform. Your user base could grow exponentially, but you do not want to manage infrastructure scaling.

Which Google database service should you use?

A.

Cloud SQL

B.

BigQuery

C.

Cloud Bigtable

D.

Cloud Datastore

Full Access
Question # 29

You have spent a few days loading data from comma-separated values (CSV) files into the Google BigQuery table CLICK_STREAM. The column DT stores the epoch time of click events. For convenience, you chose a simple schema where every field is treated as the STRING type. Now, you want to compute web session durations of users who visit your site, and you want to change its data type to the TIMESTAMP. You want to minimize the migration effort without making future queries computationally expensive. What should you do?

A.

Delete the table CLICK_STREAM, and then re-create it such that the column DT is of the TIMESTAMP type. Reload the data.

B.

Add a column TS of the TIMESTAMP type to the table CLICK_STREAM, and populate the numeric values from the column TS for each row. Reference the column TS instead of the column DT from now on.

C.

Create a view CLICK_STREAM_V, where strings from the column DT are cast into TIMESTAMP values. Reference the view CLICK_STREAM_V instead of the table CLICK_STREAM from now on.

D.

Add two columns to the table CLICK STREAM: TS of the TIMESTAMP type and IS_NEW of the BOOLEAN type. Reload all data in append mode. For each appended row, set the value of IS_NEW to true. For future queries, reference the column TS instead of the column DT, with the WHERE clause ensuring that the value of IS_NEW must be true.

E.

Construct a query to return every row of the table CLICK_STREAM, while using the built-in function to cast strings from the column DT into TIMESTAMP values. Run the query into a destination table NEW_CLICK_STREAM, in which the column TS is the TIMESTAMP type. Reference the table NEW_CLICK_STREAM instead of the table CLICK_STREAM from now on. In the future, new data is loaded into the table NEW_CLICK_STREAM.

Full Access
Question # 30

You need to store and analyze social media postings in Google BigQuery at a rate of 10,000 messages per minute in near real-time. Initially, design the application to use streaming inserts for individual postings. Your application also performs data aggregations right after the streaming inserts. You discover that the queries after streaming inserts do not exhibit strong consistency, and reports from the queries might miss in-flight data. How can you adjust your application design?

A.

Re-write the application to load accumulated data every 2 minutes.

B.

Convert the streaming insert code to batch load for individual messages.

C.

Load the original message to Google Cloud SQL, and export the table every hour to BigQuery via streaming inserts.

D.

Estimate the average latency for data availability after streaming inserts, and always run queries after waiting twice as long.

Full Access
Question # 31

Your company is migrating their 30-node Apache Hadoop cluster to the cloud. They want to re-use Hadoop jobs they have already created and minimize the management of the cluster as much as possible. They also want to be able to persist data beyond the life of the cluster. What should you do?

A.

Create a Google Cloud Dataflow job to process the data.

B.

Create a Google Cloud Dataproc cluster that uses persistent disks for HDFS.

C.

Create a Hadoop cluster on Google Compute Engine that uses persistent disks.

D.

Create a Cloud Dataproc cluster that uses the Google Cloud Storage connector.

E.

Create a Hadoop cluster on Google Compute Engine that uses Local SSD disks.

Full Access
Question # 32

Your company is performing data preprocessing for a learning algorithm in Google Cloud Dataflow. Numerous data logs are being are being generated during this step, and the team wants to analyze them. Due to the dynamic nature of the campaign, the data is growing exponentially every hour.

The data scientists have written the following code to read the data for a new key features in the logs.

BigQueryIO.Read

.named(“ReadLogData”)

.from(“clouddataflow-readonly:samples.log_data”)

You want to improve the performance of this data read. What should you do?

A.

Specify the TableReference object in the code.

B.

Use .fromQuery operation to read specific fields from the table.

C.

Use of both the Google BigQuery TableSchema and TableFieldSchema classes.

D.

Call a transform that returns TableRow objects, where each element in the PCollexction represents a single row in the table.

Full Access
Question # 33

You are building a model to make clothing recommendations. You know a user’s fashion preference is likely to change over time, so you build a data pipeline to stream new data back to the model as it becomes available. How should you use this data to train the model?

A.

Continuously retrain the model on just the new data.

B.

Continuously retrain the model on a combination of existing data and the new data.

C.

Train on the existing data while using the new data as your test set.

D.

Train on the new data while using the existing data as your test set.

Full Access
Question # 34

You are designing a basket abandonment system for an ecommerce company. The system will send a message to a user based on these rules:

    No interaction by the user on the site for 1 hour

    Has added more than $30 worth of products to the basket

    Has not completed a transaction

You use Google Cloud Dataflow to process the data and decide if a message should be sent. How should you design the pipeline?

A.

Use a fixed-time window with a duration of 60 minutes.

B.

Use a sliding time window with a duration of 60 minutes.

C.

Use a session window with a gap time duration of 60 minutes.

D.

Use a global window with a time based trigger with a delay of 60 minutes.

Full Access
Question # 35

You designed a database for patient records as a pilot project to cover a few hundred patients in three clinics. Your design used a single database table to represent all patients and their visits, and you used self-joins to generate reports. The server resource utilization was at 50%. Since then, the scope of the project has expanded. The database must now store 100 times more patientrecords. You can no longer run the reports, because they either take too long or they encounter errors with insufficient compute resources. How should you adjust the database design?

A.

Add capacity (memory and disk space) to the database server by the order of 200.

B.

Shard the tables into smaller ones based on date ranges, and only generate reports with prespecified date ranges.

C.

Normalize the master patient-record table into the patient table and the visits table, and create other necessary tables to avoid self-join.

D.

Partition the table into smaller tables, with one for each clinic. Run queries against the smaller table pairs, and use unions for consolidated reports.

Full Access
Question # 36

You need to compose visualization for operations teams with the following requirements:

    Telemetry must include data from all 50,000 installations for the most recent 6 weeks (sampling once every minute)

    The report must not be more than 3 hours delayed from live data.

    The actionable report should only show suboptimal links.

    Most suboptimal links should be sorted to the top.

    Suboptimal links can be grouped and filtered by regional geography.

    User response time to load the report must be <5 seconds.

You create a data source to store the last 6 weeks of data, and create visualizations that allow viewers to see multiple date ranges, distinct geographic regions, and unique installation types. You always show the latest data without any changes to your visualizations. You want to avoid creating and updating new visualizations each month. What should you do?

A.

Look through the current data and compose a series of charts and tables, one for each possible

combination of criteria.

B.

Look through the current data and compose a small set of generalized charts and tables bound to criteria filters that allow value selection.

C.

Export the data to a spreadsheet, compose a series of charts and tables, one for each possible

combination of criteria, and spread them across multiple tabs.

D.

Load the data into relational database tables, write a Google App Engine application that queries all rows, summarizes the data across each criteria, and then renders results using the Google Charts and visualization API.

Full Access
Question # 37

You create a new report for your large team in Google Data Studio 360. The report uses Google BigQuery as its data source. It is company policy to ensure employees can view only the data associated with their region, so you create and populate a table for each region. You need to enforce the regional access policy to the data.

Which two actions should you take? (Choose two.)

A.

Ensure all the tables are included in global dataset.

B.

Ensure each table is included in a dataset for a region.

C.

Adjust the settings for each table to allow a related region-based security group view access.

D.

Adjust the settings for each view to allow a related region-based security group view access.

E.

Adjust the settings for each dataset to allow a related region-based security group view access.

Full Access
Question # 38

You have two projects where you run BigQuery jobs:

• One project runs production jobs that have strict completion time SLAs. These are high priority jobs that must have the required compute resources available when needed. These jobs generally never go below a 300 slot utilization, but occasionally spike up an additional 500 slots.

• The other project is for users to run ad-hoc analytical queries. This project generally never uses more than 200 slots at a time. You want these ad-hoc queries to be billed based on how much data users scan rather than by slot capacity.

You need to ensure that both projects have the appropriate compute resources available. What should you do?

A.

Create a single Enterprise Edition reservation for both projects. Set a baseline of 300 slots. Enable autoscaling up to 700 slots.

B.

Create two reservations, one for each of the projects. For the SLA project, use an Enterprise Edition with a baseline of 300 slots and enable autoscaling up to 500 slots. For the ad-hoc project, configure on-demand billing.

C.

Create two Enterprise Edition reservations, one for each of the projects. For the SLA project, set a baseline of 300 slots and enable

autoscaling up to 500 slots. For the ad-hoc project, set a reservation baseline of 0 slots and set the ignore_idle_slot3 flag to False.

D.

Create two Enterprise Edition reservations, one for each of the projects. For the SLA project, set a baseline of 800 slots. For the ad-hoc

project, enable autoscaling up to 200 slots.

Full Access
Question # 39

Your weather app queries a database every 15 minutes to get the current temperature. The frontend is powered by Google App Engine and server millions of users. How should you design the frontend to respond to a database failure?

A.

Issue a command to restart the database servers.

B.

Retry the query with exponential backoff, up to a cap of 15 minutes.

C.

Retry the query every second until it comes back online to minimize staleness of data.

D.

Reduce the query frequency to once every hour until the database comes back online.

Full Access
Question # 40

You want to use a database of information about tissue samples to classify future tissue samples as either normal or mutated. You are evaluating an unsupervised anomaly detection method for classifying the tissue samples. Which two characteristic support this method? (Choose two.)

A.

There are very few occurrences of mutations relative to normal samples.

B.

There are roughly equal occurrences of both normal and mutated samples in the database.

C.

You expect future mutations to have different features from the mutated samples in the database.

D.

You expect future mutations to have similar features to the mutated samples in the database.

E.

You already have labels for which samples are mutated and which are normal in the database.

Full Access
Question # 41

You want to use Google Stackdriver Logging to monitor Google BigQuery usage. You need an instant notification to be sent to your monitoring tool when new data is appended to a certain table using an insert job, but you do not want to receive notifications for other tables. What should you do?

A.

Make a call to the Stackdriver API to list all logs, and apply an advanced filter.

B.

In the Stackdriver logging admin interface, and enable a log sink export to BigQuery.

C.

In the Stackdriver logging admin interface, enable a log sink export to Google Cloud Pub/Sub, and subscribe to the topic from your monitoring tool.

D.

Using the Stackdriver API, create a project sink with advanced log filter to export to Pub/Sub, and subscribe to the topic from your monitoring tool.

Full Access
Question # 42

MJTelco needs you to create a schema in Google Bigtable that will allow for the historical analysis of the last 2 years of records. Each record that comes in is sent every 15 minutes, and contains a unique identifier of the device and a data record. The most common query is for all the data for a given device for a given day. Which schema should you use?

A.

Rowkey: date#device_idColumn data: data_point

B.

Rowkey: dateColumn data: device_id, data_point

C.

Rowkey: device_idColumn data: date, data_point

D.

Rowkey: data_pointColumn data: device_id, date

E.

Rowkey: date#data_pointColumn data: device_id

Full Access
Question # 43

Your startup has never implemented a formal security policy. Currently, everyone in the company has access to the datasets stored in Google BigQuery. Teams have freedom to use the service as they see fit, and they have not documented their use cases. You have been asked to secure the data warehouse. You need to discover what everyone is doing. What should you do first?

A.

Use Google Stackdriver Audit Logs to review data access.

B.

Get the identity and access management IIAM) policy of each table

C.

Use Stackdriver Monitoring to see the usage of BigQuery query slots.

D.

Use the Google Cloud Billing API to see what account the warehouse is being billed to.

Full Access
Question # 44

You need to compose visualizations for operations teams with the following requirements:

Which approach meets the requirements?

A.

Load the data into Google Sheets, use formulas to calculate a metric, and use filters/sorting to show only suboptimal links in a table.

B.

Load the data into Google BigQuery tables, write Google Apps Script that queries the data, calculates the metric, and shows only suboptimal rows in a table in Google Sheets.

C.

Load the data into Google Cloud Datastore tables, write a Google App Engine Application that queries all rows, applies a function to derive the metric, and then renders results in a table using the Google charts and visualization API.

D.

Load the data into Google BigQuery tables, write a Google Data Studio 360 report that connects to your data, calculates a metric, and then uses a filter expression to show only suboptimal rows in a table.

Full Access
Question # 45

Your company has recently grown rapidly and now ingesting data at a significantly higher rate than it was previously. You manage the daily batch MapReduce analytics jobs in Apache Hadoop. However, the recent increase in data has meant the batch jobs are falling behind. You were asked to recommend ways the development team could increase the responsiveness of the analytics without increasing costs. What should you recommend they do?

A.

Rewrite the job in Pig.

B.

Rewrite the job in Apache Spark.

C.

Increase the size of the Hadoop cluster.

D.

Decrease the size of the Hadoop cluster but also rewrite the job in Hive.

Full Access
Question # 46

Your company is loading comma-separated values (CSV) files into Google BigQuery. The data is fully imported successfully; however, the imported data is not matching byte-to-byte to the source file. What is the most likely cause of this problem?

A.

The CSV data loaded in BigQuery is not flagged as CSV.

B.

The CSV data has invalid rows that were skipped on import.

C.

The CSV data loaded in BigQuery is not using BigQuery’s default encoding.

D.

The CSV data has not gone through an ETL phase before loading into BigQuery.

Full Access
Question # 47

Your company produces 20,000 files every hour. Each data file is formatted as a comma separated values (CSV) file that is less than 4 KB. All files must be ingested on Google Cloud Platform before they can be processed. Your company site has a 200 ms latency to Google Cloud, and your Internet connection bandwidth is limited as 50 Mbps. You currently deploy a secure FTP (SFTP) server on a virtual machine in Google Compute Engine as the data ingestion point. A local SFTP client runs on a dedicated machine to transmit the CSV files as is. The goal is to make reports with data from the previous day available to the executives by 10:00 a.m. each day. This design is barely able to keep up with the current volume, even though the bandwidth utilization is rather low.

You are told that due to seasonality, your company expects the number of files to double for the next three months. Which two actions should you take? (choose two.)

A.

Introduce data compression for each file to increase the rate file of file transfer.

B.

Contact your internet service provider (ISP) to increase your maximum bandwidth to at least 100 Mbps.

C.

Redesign the data ingestion process to use gsutil tool to send the CSV files to a storage bucket in parallel.

D.

Assemble 1,000 files into a tape archive (TAR) file. Transmit the TAR files instead, and disassemble the CSV files in the cloud upon receiving them.

E.

Create an S3-compatible storage endpoint in your network, and use Google Cloud Storage Transfer Service to transfer on-premices data to the designated storage bucket.

Full Access
Question # 48

Given the record streams MJTelco is interested in ingesting per day, they are concerned about the cost of Google BigQuery increasing. MJTelco asks you to provide a design solution. They require a single large data table called tracking_table. Additionally, they want to minimize the cost of daily queries while performing fine-grained analysis of each day’s events. They also want to use streaming ingestion. What should you do?

A.

Create a table called tracking_table and include a DATE column.

B.

Create a partitioned table called tracking_table and include a TIMESTAMP column.

C.

Create sharded tables for each day following the pattern tracking_table_YYYYMMDD.

D.

Create a table called tracking_table with a TIMESTAMP column to represent the day.

Full Access
Question # 49

MJTelco is building a custom interface to share data. They have these requirements:

    They need to do aggregations over their petabyte-scale datasets.

    They need to scan specific time range rows with a very fast response time (milliseconds).

Which combination of Google Cloud Platform products should you recommend?

A.

Cloud Datastore and Cloud Bigtable

B.

Cloud Bigtable and Cloud SQL

C.

BigQuery and Cloud Bigtable

D.

BigQuery and Cloud Storage

Full Access
Question # 50

You are designing the database schema for a machine learning-based food ordering service that will predict what users want to eat. Here is some of the information you need to store:

    The user profile: What the user likes and doesn’t like to eat

    The user account information: Name, address, preferred meal times

    The order information: When orders are made, from where, to whom

The database will be used to store all the transactional data of the product. You want to optimize the data schema. Which Google Cloud Platform product should you use?

A.

BigQuery

B.

Cloud SQL

C.

Cloud Bigtable

D.

Cloud Datastore

Full Access
Question # 51

You work for a manufacturing plant that batches application log files together into a single log file once a day at 2:00 AM. You have written a Google Cloud Dataflow job to process that log file. You need to make sure the log file in processed once per day as inexpensively as possible. What should you do?

A.

Change the processing job to use Google Cloud Dataproc instead.

B.

Manually start the Cloud Dataflow job each morning when you get into the office.

C.

Create a cron job with Google App Engine Cron Service to run the Cloud Dataflow job.

D.

Configure the Cloud Dataflow job as a streaming job so that it processes the log data immediately.

Full Access
Question # 52

You work for an economic consulting firm that helps companies identify economic trends as they happen. As part of your analysis, you use Google BigQuery to correlate customer data with the average prices of the 100 most common goods sold, including bread, gasoline, milk, and others. The average prices of these goods are updated every 30 minutes. You want to make sure this data stays up to date so you can combine it with other data in BigQuery as cheaply as possible. What should you do?

A.

Load the data every 30 minutes into a new partitioned table in BigQuery.

B.

Store and update the data in a regional Google Cloud Storage bucket and create a federated data source in BigQuery

C.

Store the data in Google Cloud Datastore. Use Google Cloud Dataflow to query BigQuery and combine the data programmatically with the data stored in Cloud Datastore

D.

Store the data in a file in a regional Google Cloud Storage bucket. Use Cloud Dataflow to query BigQuery and combine the data programmatically with the data stored in Google Cloud Storage.

Full Access
Question # 53

You work for a large fast food restaurant chain with over 400,000 employees. You store employee information in Google BigQuery in a Users table consisting of a FirstName field and a LastName field. A member of IT is building an application and asks you to modify the schema and data in BigQuery so the application can query a FullName field consisting of the value of the FirstName field concatenated with a space, followed by the value of the LastName field for each employee. How can you make that data available while minimizing cost?

A.

Create a view in BigQuery that concatenates the FirstName and LastName field values to produce the FullName.

B.

Add a new column called FullName to the Users table. Run an UPDATE statement that updates the FullName column for each user with the concatenation of the FirstName and LastName values.

C.

Create a Google Cloud Dataflow job that queries BigQuery for the entire Users table, concatenates the FirstName value and LastName value for each user, and loads the proper values for FirstName, LastName, and FullName into a new table in BigQuery.

D.

Use BigQuery to export the data for the table to a CSV file. Create a Google Cloud Dataproc job to process the CSV file and output a new CSV file containing the proper values for FirstName, LastName and FullName. Run a BigQuery load job to load the new CSV file into BigQuery.

Full Access
Question # 54

You are choosing a NoSQL database to handle telemetry data submitted from millions of Internet-of-Things (IoT) devices. The volume of data is growing at 100 TB per year, and each data entry has about 100 attributes. The data processing pipeline does not require atomicity, consistency, isolation, and durability (ACID). However, high availability and low latency are required.

You need to analyze the data by querying against individual fields. Which three databases meet your requirements? (Choose three.)

A.

Redis

B.

HBase

C.

MySQL

D.

MongoDB

E.

Cassandra

F.

HDFS with Hive

Full Access
Question # 55

You are deploying a new storage system for your mobile application, which is a media streaming service. You decide the best fit is Google Cloud Datastore. You have entities with multiple properties, some of which can take on multiple values. For example, in the entity ‘Movie’ the property ‘actors’ and the property ‘tags’ have multiple values but the property ‘date released’ does not. A typical query would ask for all movies with actor= ordered by date_released or all movies with tag=Comedy ordered by date_released. How should you avoid a combinatorial explosion in the number of indexes?

A.

Option A

B.

Option B.

C.

Option C

D.

Option D

Full Access
Question # 56

Flowlogistic is rolling out their real-time inventory tracking system. The tracking devices will all send package-tracking messages, which will now go to a single Google Cloud Pub/Sub topic instead of the Apache Kafka cluster. A subscriber application will then process the messages for real-time reporting and store them in Google BigQuery for historical analysis. You want to ensure the package data can be analyzed over time.

Which approach should you take?

A.

Attach the timestamp on each message in the Cloud Pub/Sub subscriber application as they are received.

B.

Attach the timestamp and Package ID on the outbound message from each publisher device as they are sent to Clod Pub/Sub.

C.

Use the NOW () function in BigQuery to record the event’s time.

D.

Use the automatically generated timestamp from Cloud Pub/Sub to order the data.

Full Access
Question # 57

Flowlogistic wants to use Google BigQuery as their primary analysis system, but they still have Apache Hadoop and Spark workloads that they cannot move to BigQuery. Flowlogistic does not know how to store the data that is common to both workloads. What should they do?

A.

Store the common data in BigQuery as partitioned tables.

B.

Store the common data in BigQuery and expose authorized views.

C.

Store the common data encoded as Avro in Google Cloud Storage.

D.

Store he common data in the HDFS storage for a Google Cloud Dataproc cluster.

Full Access
Question # 58

Flowlogistic’s CEO wants to gain rapid insight into their customer base so his sales team can be better informed in the field. This team is not very technical, so they’ve purchased a visualization tool to simplify the creation of BigQuery reports. However, they’ve been overwhelmed by all thedata in the table, and are spending a lot of money on queries trying to find the data they need. You want to solve their problem in the most cost-effective way. What should you do?

A.

Export the data into a Google Sheet for virtualization.

B.

Create an additional table with only the necessary columns.

C.

Create a view on the table to present to the virtualization tool.

D.

Create identity and access management (IAM) roles on the appropriate columns, so only they appear in a query.

Full Access
Question # 59

Flowlogistic’s management has determined that the current Apache Kafka servers cannot handle the data volume for their real-time inventory tracking system. You need to build a new system on Google Cloud Platform (GCP) that will feed the proprietary tracking software. The system must be able to ingest data from a variety of global sources, process and query in real-time, and store the data reliably. Which combination of GCP products should you choose?

A.

Cloud Pub/Sub, Cloud Dataflow, and Cloud Storage

B.

Cloud Pub/Sub, Cloud Dataflow, and Local SSD

C.

Cloud Pub/Sub, Cloud SQL, and Cloud Storage

D.

Cloud Load Balancing, Cloud Dataflow, and Cloud Storage

Full Access
Question # 60

You need to choose a database for a new project that has the following requirements:

    Fully managed

    Able to automatically scale up

    Transactionally consistent

    Able to scale up to 6 TB

    Able to be queried using SQL

Which database do you choose?

A.

Cloud SQL

B.

Cloud Bigtable

C.

Cloud Spanner

D.

Cloud Datastore

Full Access
Question # 61

You are migrating your data warehouse to Google Cloud and decommissioning your on-premises data center Because this is a priority for your company, you know that bandwidth will be made available for the initial data load to the cloud. The files being transferred are not large in number, but each file is 90 GB Additionally, you want your transactional systems to continually update the warehouse on Google Cloud in real time What tools should you use to migrate the data and ensure that it continues to write to your warehouse?

A.

Storage Transfer Service for the migration, Pub/Sub and Cloud Data Fusion for the real-time updates

B.

BigQuery Data Transfer Service for the migration, Pub/Sub and Dataproc for the real-time updates

C.

gsutil for the migration; Pub/Sub and Dataflow for the real-time updates

D.

gsutil for both the migration and the real-time updates

Full Access
Question # 62

You want to migrate an on-premises Hadoop system to Cloud Dataproc. Hive is the primary tool in use, and the data format is Optimized Row Columnar (ORC). All ORC files have been successfully copied to a Cloud Storage bucket. You need to replicate some data to the cluster’s local Hadoop Distributed File System (HDFS) to maximize performance. What are two ways to start using Hive in Cloud Dataproc? (Choose two.)

A.

Run the gsutil utility to transfer all ORC files from the Cloud Storage bucket to HDFS. Mount the Hive tables locally.

B.

Run the gsutil utility to transfer all ORC files from the Cloud Storage bucket to any node of the Dataproc cluster. Mount the Hive tables locally.

C.

Run the gsutil utility to transfer all ORC files from the Cloud Storage bucket to the master node of the Dataproc cluster. Then run the Hadoop utility to copy them do HDFS. Mount the Hive tables from HDFS.

D.

Leverage Cloud Storage connector for Hadoop to mount the ORC files as external Hive tables. Replicate external Hive tables to the native ones.

E.

Load the ORC files into BigQuery. Leverage BigQuery connector for Hadoop to mount the BigQuery tables as external Hive tables. Replicate external Hive tables to the native ones.

Full Access
Question # 63

You launched a new gaming app almost three years ago. You have been uploading log files from the previous day to a separate Google BigQuery table with the table name format LOGS_yyyymmdd. You have been using table wildcard functions to generate daily and monthly reports for all time ranges. Recently, you discovered that some queries that cover long date ranges are exceeding the limit of 1,000 tables and failing. How can you resolve this issue?

A.

Convert all daily log tables into date-partitioned tables

B.

Convert the sharded tables into a single partitioned table

C.

Enable query caching so you can cache data from previous months

D.

Create separate views to cover each month, and query from these views

Full Access
Question # 64

You currently have transactional data stored on-premises in a PostgreSQL database. To modernize your data environment, you want to run transactional workloads and support analytics needs with a single database. You need to move to Google Cloud without changing database management systems, and minimize cost and complexity. What should you do?

A.

Migrate your workloads to AlloyDB for PostgreSQL.

B.

Migrate to BigQuery to optimize analytics.

C.

Migrate and modernize your database with Cloud Spanner.

D.

Migrate your PostgreSQL database to Cloud SQL for PostgreSQL.

Full Access
Question # 65

You have thousands of Apache Spark jobs running in your on-premises Apache Hadoop cluster. You want to migrate the jobs to Google Cloud. You want to use managed services to run your jobs instead of maintaining a long-lived Hadoop cluster yourself. You have a tight timeline and want to keep code changes to a minimum. What should you do?

A.

Copy your data to Compute Engine disks. Manage and run your jobs directly on those instances.

B.

Move your data to Cloud Storage. Run your jobs on Dataproc.

C.

Move your data to BigQuery. Convert your Spark scripts to a SQL-based processing approach.

D.

Rewrite your jobs in Apache Beam. Run your jobs in Dataflow.

Full Access
Question # 66

You are designing a cloud-native historical data processing system to meet the following conditions:

    The data being analyzed is in CSV, Avro, and PDF formats and will be accessed by multiple analysis tools including Cloud Dataproc, BigQuery, and Compute Engine.

    A streaming data pipeline stores new data daily.

    Peformance is not a factor in the solution.

    The solution design should maximize availability.

How should you design data storage for this solution?

A.

Create a Cloud Dataproc cluster with high availability. Store the data in HDFS, and peform analysis as needed.

B.

Store the data in BigQuery. Access the data using the BigQuery Connector or Cloud Dataproc and Compute Engine.

C.

Store the data in a regional Cloud Storage bucket. Aceess the bucket directly using Cloud Dataproc, BigQuery, and Compute Engine.

D.

Store the data in a multi-regional Cloud Storage bucket. Access the data directly using Cloud Dataproc, BigQuery, and Compute Engine.

Full Access
Question # 67

Your team is building a data lake platform on Google Cloud. As a part of the data foundation design, you are planning to store all the raw data in Cloud Storage You are expecting to ingest approximately 25 GB of data a day and your billing department is worried about the increasing cost of storing old data. The current business requirements are:

• The old data can be deleted anytime

• You plan to use the visualization layer for current and historical reporting

• The old data should be available instantly when accessed

• There should not be any charges for data retrieval.

What should you do to optimize for cost?

A.

Create the bucket with the Autoclass storage class feature.

B.

Create an Object Lifecycle Management policy to modify the storage class for data older than 30 days to nearline, 90 days to coldline. and 365 days to archive storage class. Delete old data as needed.

C.

Create an Object Lifecycle Management policy to modify the storage class for data older than 30 days to coldline, 90 days to nearline. and 365 days to archive storage class Delete old data as needed.

D.

Create an Object Lifecycle Management policy to modify the storage class for data older than 30 days to nearlme. 45 days to coldline. and 60 days to archive storage class Delete old data as needed.

Full Access
Question # 68

You currently have a single on-premises Kafka cluster in a data center in the us-east region that is responsible for ingesting messages from IoT devices globally. Because large parts of globe have poor internet connectivity, messages sometimes batch at the edge, come in all at once, and cause a spike in load on your Kafka cluster. This is becoming difficult to manage and prohibitively expensive. What is the Google-recommended cloud native architecture for this scenario?

A.

Edge TPUs as sensor devices for storing and transmitting the messages.

B.

Cloud Dataflow connected to the Kafka cluster to scale the processing of incoming messages.

C.

An IoT gateway connected to Cloud Pub/Sub, with Cloud Dataflow to read and process the messages from Cloud Pub/Sub.

D.

A Kafka cluster virtualized on Compute Engine in us-east with Cloud Load Balancing to connect to the devices around the world.

Full Access
Question # 69

You have enabled the free integration between Firebase Analytics and Google BigQuery. Firebase now

automatically creates a new table daily in BigQuery in the format app_events_YYYYMMDD. You want to

query all of the tables for the past 30 days in legacy SQL. What should you do?

A.

Use the TABLE_DATE_RANGE function

B.

Use the WHERE_PARTITIONTIME pseudo column

C.

Use WHERE date BETWEEN YYYY-MM-DD AND YYYY-MM-DD

D.

Use SELECT IF.(date >= YYYY-MM-DD AND date <= YYYY-MM-DD

Full Access
Question # 70

You are designing a real-time system for a ride hailing app that identifies areas with high demand for rides to effectively reroute available drivers to meet the demand. The system ingests data from multiple sources to Pub/Sub. processes the data, and stores the results for visualization and analysis in real-time dashboards. The data sources include driver location updates every 5 seconds and app-based booking events from riders. The data processing involves real-time aggregation of supply and demand data for the last 30 seconds, every 2 seconds, and storing the results in a low-latency system for visualization. What should you do?

A.

Group the data by using a tumbling window in a Dataflow pipeline, and write the aggregated data to Memorystore

B.

Group the data by using a hopping window in a Dataflow pipeline, and write the aggregated data to Memorystore

C.

Group the data by using a session window in a Dataflow pipeline, and write the aggregated data to BigQuery.

D.

Group the data by using a hopping window in a Dataflow pipeline, and write the aggregated data to BigQuery.

Full Access
Question # 71

An aerospace company uses a proprietary data format to store its night data. You need to connect this new data source to BigQuery and stream the data into BigQuery. You want to efficiency import the data into BigQuery where consuming as few resources as possible. What should you do?

A.

Use a standard Dataflow pipeline to store the raw data m BigQuery and then transform the format later when the data is used

B.

Write a she script that triggers a Cloud Function that performs periodic ETL batch jobs on the new data source

C.

Use Apache Hive to write a Dataproc job that streams the data into BigQuery in CSV format

D.

Use an Apache Beam custom connector to write a Dataflow pipeline that streams the data into BigQuery in Avro format

Full Access
Question # 72

You want to archive data in Cloud Storage. Because some data is very sensitive, you want to use the “Trust No One” (TNO) approach to encrypt your data to prevent the cloud provider staff from decrypting your data. What should you do?

A.

Use gcloud kms keys create to create a symmetric key. Then use gcloud kms encrypt to encrypt each archival file with the key and unique additional authenticated data (AAD). Use gsutil cp to upload each encrypted file to the Cloud Storage bucket, and keep the AAD outside of Google Cloud.

B.

Use gcloud kms keys create to create a symmetric key. Then use gcloud kms encrypt to encrypt each archival file with the key. Use gsutil cp to upload each encrypted file to the Cloud Storage bucket. Manually destroy the key previously used for encryption, and rotate the key once and rotate the key once.

C.

Specify customer-supplied encryption key (CSEK) in the .boto configuration file. Use gsutil cp to upload each archival file to the Cloud Storage bucket. Save the CSEK in Cloud Memorystore as permanent storage of the secret.

D.

Specify customer-supplied encryption key (CSEK) in the .boto configuration file. Use gsutil cp to upload each archival file to the Cloud Storage bucket. Save the CSEK in a different project that only the security team can access.

Full Access
Question # 73

You are administering a BigQuery on-demand environment. Your business intelligence tool is submitting hundreds of queries each day that aggregate a large (50 TB) sales history fact table at the day and month levels. These queries have a slow response time and are exceeding cost expectations. You need to decrease response time, lower query costs, and minimize maintenance. What should you do?

A.

Build materialized views on top of the sales table to aggregate data at the day and month level.

B.

Build authorized views on top of the sales table to aggregate data at the day and month level.

C.

Enable Bl Engine and add your sales table as a preferred table.

D.

Create a scheduled query to build sales day and sales month aggregate tables on an hourly basis.

Full Access
Question # 74

You are creating the CI'CD cycle for the code of the directed acyclic graphs (DAGs) running in Cloud Composer. Your team has two Cloud Composer instances: one instance for development and another instance for production. Your team is using a Git repository to maintain and develop the code of the DAGs. You want to deploy the DAGs automatically to Cloud Composer when a certain tag is pushed to the Git repository. What should you do?

A.

1. Use Cloud Build to build a container and the Kubemetes Pod Operator to deploy the code of the DAG to the Google Kubernetes

Engine (GKE) cluster of the development instance for testing.

2. If the tests pass, copy the code to the Cloud Storage bucket of the production instance.

B.

1 Use Cloud Build to copy the code of the DAG to the Cloud Storage bucket of the development instance for DAG testing.

2. If the tests pass, use Cloud Build to build a container with the code of the DAG and the KubernetesPodOperator to deploy the container to the Google Kubernetes Engine (GKE) cluster of the production instance.

C.

1 Use Cloud Build to build a container with the code of the DAG and the KubernetesPodOperator to deploy the code to the Google Kubernetes Engine (GKE) cluster of the development instance for testing.

2. If the tests pass, use the KubernetesPodOperator to deploy the container to the GKE cluster of the production instance.

D.

1 Use Cloud Build to copy the code of the DAG to the Cloud Storage bucket of the development instance for DAG testing.

2. If the tests pass, use Cloud Build to copy the code to the bucket of the production instance.

Full Access
Question # 75

You work on a regression problem in a natural language processing domain, and you have 100M labeled exmaples in your dataset. You have randomly shuffled your data and split your dataset into train and test samples (in a 90/10 ratio). After you trained the neural network and evaluatedyour model on a test set, you discover that the root-mean-squared error (RMSE) of your model is twice as high on the train set as on the test set. How should you improve the performance of your model?

A.

Increase the share of the test sample in the train-test split.

B.

Try to collect more data and increase the size of your dataset.

C.

Try out regularization techniques (e.g., dropout of batch normalization) to avoid overfitting.

D.

Increase the complexity of your model by, e.g., introducing an additional layer or increase sizing the size of vocabularies or n-grams used.

Full Access
Question # 76

You are managing a Cloud Dataproc cluster. You need to make a job run faster while minimizing costs, without losing work in progress on your clusters. What should you do?

A.

Increase the cluster size with more non-preemptible workers.

B.

Increase the cluster size with preemptible worker nodes, and configure them to forcefully decommission.

C.

Increase the cluster size with preemptible worker nodes, and use Cloud Stackdriver to trigger a script to preserve work.

D.

Increase the cluster size with preemptible worker nodes, and configure them to use graceful decommissioning.

Full Access