Data-Engineer-Associate Amazon Web Services AWS Certified Data Engineer

AWS Certified Data Engineer - Associate (DEA-C01)

Last Update 23 hours ago Total Questions : 302

The AWS Certified Data Engineer - Associate (DEA-C01) content is now fully updated, with all current exam questions added 23 hours ago. Deciding to include Data-Engineer-Associate practice exam questions in your study plan goes far beyond basic test preparation.

You'll find that our Data-Engineer-Associate exam questions frequently feature detailed scenarios and practical problem-solving exercises that directly mirror industry challenges. Engaging with these Data-Engineer-Associate sample sets allows you to effectively manage your time and pace yourself, giving you the ability to finish any AWS Certified Data Engineer - Associate (DEA-C01) practice test comfortably within the allotted time.

Question # 21

A company has a production AWS account that runs company workloads. The company ' s security team created a security AWS account to store and analyze security logs from the production AWS account. The security logs in the production AWS account are stored in Amazon CloudWatch Logs.

The company needs to use Amazon Kinesis Data Streams to deliver the security logs to the security AWS account.

Which solution will meet these requirements?

Create a destination data stream in the production AWS account. In the security AWS account, create an IAM role that has cross-account permissions to Kinesis Data Streams in the production AWS account.

Create a destination data stream in the security AWS account. Create an IAM role and a trust policy to grant CloudWatch Logs the permission to put data into the stream. Create a subscription filter in the security AWS account.

Create a destination data stream in the production AWS account. In the production AWS account, create an IAM role that has cross-account permissions to Kinesis Data Streams in the security AWS account.

Answer:

Explanation:

Amazon Kinesis Data Streams is a service that enables you to collect, process, and analyze real-time streaming data. You can use Kinesis Data Streams to ingest data from various sources, such as Amazon CloudWatch Logs, and deliver it to different destinations, such as Amazon S3 or Amazon Redshift. To use Kinesis Data Streams to deliver the security logs from the production AWS account to the security AWS account, you need to create a destination data stream in the security AWS account. This data stream will receive the log data from the CloudWatch Logs service in the production AWS account. To enable this cross-account data delivery, you need to create an IAM role and a trust policy in the security AWS account. The IAM role defines the permissions that the CloudWatch Logs service needs to put data into the destination data stream. The trust policy allows the production AWS account to assume the IAM role. Finally, you need to create a subscription filter in the production AWS account. A subscription filter defines the pattern to match log events and the destination to send the matching events. In this case, the destination is the destination data stream in the security AWS account. This solution meets the requirements of using Kinesis Data Streams to deliver the security logs to the security AWS account. The other options are either not possible or not optimal. You cannot create a destination data stream in the production AWS account, as this would not deliver the data to the security AWS account. You cannot create a subscription filter in the security AWS account, as this would not capture the log events from the production AWS account. References:

Using Amazon Kinesis Data Streams with Amazon CloudWatch Logs

AWS Certified Data Engineer - Associate DEA-C01 Complete Study Guide, Chapter 3: Data Ingestion and Transformation, Section 3.3: Amazon Kinesis Data Streams

Question # 22

A global company currently uses Amazon Redshift to store data and Amazon Quick Suite (previously known as Amazon QuickSight) to generate reports.

A team of business analysts have varying levels of technical expertise. Some analysts lack SQL knowledge. All the analysts need to create new reports frequently. The company wants to use natural program language queries to create dashboards and reports more efficiently.

Which solution will meet these requirements with the LEAST operational effort?

Use Quick Suite dashboards that have zero-ETL access to Amazon Redshift.

Enable Amazon Q in Quick Suite. Generate Quick Suite dashboards and reports.

Integrate Tableau with Amazon Redshift to give Tableau direct access to the data.

Use Quick Suite dashboards that have federated query access to Amazon Redshift.

Question # 23

A data engineer is designing a log table for an application that requires continuous ingestion. The application must provide dependable API-based access to specific records from other applications. The application must handle more than 4,000 concurrent write operations and 6,500 read operations every second.

Create an Amazon Redshift table with the KEY distribution style. Use the Amazon Redshift Data API to perform all read and write operations.

Store the log files in an Amazon S3 Standard bucket. Register the schema in AWS Glue Data Catalog. Create an external Redshift table that points to the AWS Glue schema. Use the table to perform Amazon Redshift Spectrum read operations.

Create an Amazon Redshift table with the EVEN distribution style. Use the Amazon Redshift JDBC connector to establish a database connection. Use the database connection to perform all read and write operations.

Create an Amazon DynamoDB table that has provisioned capacity to meet the application ' s capacity needs. Use the DynamoDB table to perform all read and write operations by using DynamoDB APIs.

Question # 24

A data engineer is building a data pipeline on AWS by using AWS Glue extract, transform, and load (ETL) jobs. The data engineer needs to process data from Amazon RDS and MongoDB, perform transformations, and load the transformed data into Amazon Redshift for analytics. The data updates must occur every hour.

Which combination of tasks will meet these requirements with the LEAST operational overhead? (Choose two.)

Configure AWS Glue triggers to run the ETL jobs even/ hour.

Use AWS Glue DataBrewto clean and prepare the data for analytics.

Use AWS Lambda functions to schedule and run the ETL jobs even/ hour.

Use AWS Glue connections to establish connectivity between the data sources and Amazon Redshift.

Use the Redshift Data API to load transformed data into Amazon Redshift.

Question # 25

A company is using Amazon Redshift to build a data warehouse solution. The company is loading hundreds of tiles into a tact table that is in a Redshift cluster.

The company wants the data warehouse solution to achieve the greatest possible throughput. The solution must use cluster resources optimally when the company loads data into the tact table.

Which solution will meet these requirements?

Use multiple COPY commands to load the data into the Redshift cluster.

Use S3DistCp to load multiple files into Hadoop Distributed File System (HDFS). Use an HDFS connector to ingest the data into the Redshift cluster.

Use a number of INSERT statements equal to the number of Redshift cluster nodes. Load the data in parallel into each node.

Use a single COPY command to load the data into the Redshift cluster.

Question # 26

A data engineer is using an AWS Glue ETL job to remove outdated customer records from a table that contains customer account information. The data engineer is using the following SQL command to remove customers that exist in a table named monthly_accounts_update from the customer accounts table:

MERGE INTO accounts t USING monthly_accounts_update s ON t.customer = s.customer WHEN MATCHED THEN DELETE

What will happen when the data engineer runs the SQL command?

All customer records that exist in both the customer accounts table and the monthly_accounts_update table will be deleted from the accounts table.

Only customer records that are present in both tables will be retained in the customer accounts table.

The table will be deleted.

No records will be deleted because the command syntax is not valid in AWS Glue.

Question # 27

A company stores customer data in an Amazon S3 bucket. Multiple teams in the company want to use the customer data for downstream analysis. The company needs to ensure that the teams do not have access to personally identifiable information (PII) about the customers.

Which solution will meet this requirement with LEAST operational overhead?

Use Amazon Macie to create and run a sensitive data discovery job to detect and remove PII.

Use S3 Object Lambda to access the data, and use Amazon Comprehend to detect and remove PII.

Use Amazon Kinesis Data Firehose and Amazon Comprehend to detect and remove PII.

Use an AWS Glue DataBrew job to store the PII data in a second S3 bucket. Perform analysis on the data that remains in the original S3 bucket.

Question # 28

A data engineer uses the AWS Glue Data Catalog to manage data lake metadata. The data engineer ' s extract, transform, and load (ETL) process creates new partitions in an Amazon S3 data lake throughout the day. The new partitions are not queryable through Amazon Athena until an AWS Glue crawler run finishes each night. The data engineer needs to make new partitions immediately available for querying.

Which solution will meet these requirements?

Modify the ETL process to use the AWS Glue CreatePartition API call after creating each new partition in Amazon S3.

Configure S3 Event Notifications to invoke an AWS Lambda function that copies new partition data to a separate cataloged S3 bucket.

Use Amazon DynamoDB Streams to track partition changes and update the AWS Glue Data Catalog.

Use the AWS Glue StartImportLabelsTaskRun API call to synchronize partitions on demand.

Question # 29

A company needs to implement a workflow to process transactions. Each transaction goes through multiple levels of validation. Each validation level depends on the preceding validation level.

The workflow must either process or reject each transaction within 24 hours. The workflow must run for less than 24 hours total.

Which solution will meet these requirements with the LEAST operational cost?

Create a standard workflow in AWS Step Functions. Implement a Wait for Callback pattern to wait for the validation steps to finish.

Create an express workflow in AWS Step Functions. Implement a Wait for Callback pattern to wait for the validation steps to finish.

Use AWS Lambda functions to implement the workflow. Use Amazon EventBridge to invoke the validation steps.

Use Amazon Managed Workflows for Apache Airflow (Amazon MWAA) to implement the workflow.

Question # 30

A data engineer must use AWS services to ingest a dataset into an Amazon S3 data lake. The data engineer profiles the dataset and discovers that the dataset contains personally identifiable information (PII). The data engineer must implement a solution to profile the dataset and obfuscate the PII.

Which solution will meet this requirement with the LEAST operational effort?

Use an Amazon Kinesis Data Firehose delivery stream to process the dataset. Create an AWS Lambda transform function to identify the PII. Use an AWS SDK to obfuscate the PII. Set the S3 data lake as the target for the delivery stream.

Use the Detect PII transform in AWS Glue Studio to identify the PII. Obfuscate the PII. Use an AWS Step Functions state machine to orchestrate a data pipeline to ingest the data into the S3 data lake.

Use the Detect PII transform in AWS Glue Studio to identify the PII. Create a rule in AWS Glue Data Quality to obfuscate the PII. Use an AWS Step Functions state machine to orchestrate a data pipeline to ingest the data into the S3 data lake.

Ingest the dataset into Amazon DynamoDB. Create an AWS Lambda function to identify and obfuscate the PII in the DynamoDB table and to transform the data. Use the same Lambda function to ingest the data into the S3 data lake.

Answer:

Explanation:

AWS Glue is a fully managed service that provides a serverless data integration platform for data preparation, data cataloging, and data loading. AWS Glue Studio is a graphical interface that allows you to easily author, run, and monitor AWS Glue ETL jobs. AWS Glue Data Quality is a feature that enables you to validate, cleanse, and enrich your data using predefined or custom rules. AWS Step Functions is a service that allows you to coordinate multiple AWS services into serverless workflows.

Using the Detect PII transform in AWS Glue Studio, you can automatically identify and label the PII in your dataset, such as names, addresses, phone numbers, email addresses, etc. You can then create a rule in AWS Glue Data Quality to obfuscate the PII, such as masking, hashing, or replacing the values with dummy data. You can also use other rules to validate and cleanse your data, such as checking for null values, duplicates, outliers, etc. You can then use an AWS Step Functions state machine to orchestrate a data pipeline to ingest the data into the S3 data lake. You can use AWS Glue DataBrew to visually explore and transform the data, AWS Glue crawlers to discover and catalog the data, and AWS Glue jobs to load the data into the S3 data lake.

This solution will meet the requirement with the least operational effort, as it leverages the serverless and managed capabilities of AWS Glue, AWS Glue Studio, AWS Glue Data Quality, and AWS Step Functions. You do not need to write any code to identify or obfuscate the PII, as you can use the built-in transforms and rules in AWS Glue Studio and AWS Glue Data Quality. You also do not need to provision or manage any servers or clusters, as AWS Glue and AWS Step Functions scale automatically based on the demand.

The other options are not as efficient as using the Detect PII transform in AWS Glue Studio, creating a rule in AWS Glue Data Quality, and using an AWS Step Functions state machine. Using an Amazon Kinesis Data Firehose delivery stream to process the dataset, creating an AWS Lambda transform function to identify the PII, using an AWS SDK to obfuscate the PII, and setting the S3 data lake as the target for the delivery stream will require more operational effort, as you will need to write and maintain code to identify and obfuscate the PII, as well as manage the Lambda function and its resources. Using the Detect PII transform in AWS Glue Studio to identify the PII, obfuscating the PII, and using an AWS Step Functions state machine to orchestrate a data pipeline to ingest the data into the S3 data lake will not be as effective as creating a rule in AWS Glue Data Quality to obfuscate the PII, as you will need to manually obfuscate the PII after identifying it, which can be error-prone and time-consuming. Ingesting the dataset into Amazon DynamoDB, creating an AWS Lambda function to identify and obfuscate the PII in the DynamoDB table and to transform the data, and using the same Lambda function to ingest the data into the S3 data lake will require more operational effort, as you will need to write and maintain code to identify and obfuscate the PII, as well as manage the Lambda function and its resources. You will also incur additional costs and complexity by using DynamoDB as an intermediate data store, which may not be necessary for your use case. References:

AWS Glue

AWS Glue Studio

AWS Glue Data Quality

[AWS Step Functions]

[AWS Certified Data Engineer - Associate DEA-C01 Complete Study Guide], Chapter 6: Data Integration and Transformation, Section 6.1: AWS Glue

Go to page: