Summer Sale Special 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: ex2p65

Exact2Pass Menu

Question # 4

Which of the following describes the appropriate use case for PCA?

A.

Dimensionality reduction

B.

Classification

C.

Regression

D.

Recommendation

Full Access
Question # 5

A data scientist would like to model a complex phenomenon using a large data set composed of categorical, discrete, and continuous variables. After completing exploratory data analysis, the data scientist is reasonably certain that no linear relationship exists between the predictors and the target. Although the phenomenon is complex, the data scientist still wants to maintain the highest possible degree of interpretability in the final model. Which of the following algorithms best meets this objective?

A.

Artificial neural network

B.

Decision tree

C.

Multiple linear regression

D.

Random forest

Full Access
Question # 6

A data scientist wants to evaluate the performance of various nonlinear models. Which of the following is best suited for this task?

A.

AIC

B.

Chi-squared test

C.

MCC

D.

ANOVA

Full Access
Question # 7

A data scientist trained a model for departments to share. The departments must access the model using HTTP requests. Which of the following approaches is appropriate?

A.

Utilize distributed computing.

B.

Deploy containers.

C.

Create an endpoint.

D.

Use the File Transfer Protocol.

Full Access
Question # 8

Which of the following measures would a data scientist most likely use to calculate the similarity of two text strings?

A.

Word cloud

B.

Edit distance

C.

String indexing

D.

k-nearest neighbors

Full Access
Question # 9

The following graphic shows the results of an unsupervised, machine-learning clustering model:

k is the number of clusters, and n is the processing time required to run the model. Which of the following is the best value of k to optimize both accuracy and processing requirements?

A.

2

B.

10

C.

15

D.

20

Full Access
Question # 10

Given a logistics problem with multiple constraints (fuel, capacity, speed), which of the following is the most likely optimization technique a data scientist would apply?

A.

Constrained

B.

Unconstrained

C.

Non-iterative

D.

Iterative

Full Access
Question # 11

A client has gathered weather data on which regions have high temperatures. The client would like a visualization to gain a better understanding of the data.

INSTRUCTIONS

Part 1

Review the charts provided and use the drop-down menu to select the most appropriate way to standardize the data.

Part 2

Answer the questions to determine how to create one data set.

Part 3

Select the most appropriate visualization based on the data set that represents what the client is looking for.

If at any time you would like to bring back the initial state of the simulation, please click the Reset All button.

Full Access
Question # 12

An analyst wants to show how the component pieces of a company's business units contribute to the company's overall revenue. Which of the following should the analyst use to best demonstrate this breakdown?

A.

Box-and-whisker chart

B.

Sankey diagram

C.

Scatter plot matrix

D.

Residual chart

Full Access
Question # 13

A data scientist is using the following confusion matrix to assess model performance:

Actually Fails

Actually Succeeds

Predicted to Fail

80%

20%

Predicted to Succeed

15%

85%

The model is predicting whether a delivery truck will be able to make 200 scheduled delivery stops.

Every time the model is correct, the company saves 1 hour in planning and scheduling.

Every time the model is wrong, the company loses 4 hours of delivery time.

Which of the following is the net model impact for the company?

A.

25 hours lost

B.

25 hours saved

C.

165 hours lost

D.

165 hours saved

Full Access
Question # 14

Which of the following distribution methods or models can most effectively represent the actual arrival times of a bus that runs on an hourly schedule?

A.

Binomial

B.

Exponential

C.

Normal

D.

Poisson

Full Access
Question # 15

Which of the following problem-solving approaches is a set of guidelines to handle highly variable and not fully apparent situations?

A.

Schedule

B.

Plan

C.

Heuristic

D.

Algorithm

Full Access
Question # 16

A data scientist has built a model that provides the likelihood of an error occurring in a factory. The historical accuracy of the model is 90%. At a specific factory, the model is reporting a likelihood score of 0.90. Which of the following explains a confidence score of 0.90?

A.

Running this model for all known factory issues, it is expected the model will identify 90 out of 100 known factory issues.

B.

Running this model on 100 samples of factories, a certain model performance is expected for 90 out of the 100 samples.

C.

Running this model 100 times on a factory, it is expected the model will predict 90 out of 100 factory errors.

D.

Running this model 100 times within a factory it is expected the model will predict error 90 out of 100 times the model is ran.

Full Access
Question # 17

A data scientist is deploying a model that needs to be accessed by multiple departments with minimal development effort by the departments. Which of the following APIs would be best for the data scientist to use?

A.

SOAP

B.

RPC

C.

JSON

D.

REST

Full Access
Question # 18

A data scientist is working with a data set that has ten predictors and wants to use only the predictors that most influence the results. Which of the following models would be the best for the data scientist to use?

A.

OLS

B.

Ridge

C.

Weighted least squares

D.

LASSO

Full Access
Question # 19

During EDA, a data scientist wants to look for patterns, such as linearity, in the data. Which of the following plots should the data scientist use?

A.

Violin

B.

Box-and-whisker

C.

Scatter

D.

Q-Q

Full Access
Question # 20

A data scientist needs to analyze a company's chemical businesses and is using the master database of the conglomerate company. Nothing in the data differentiates the data observations for the different businesses. Which of the following is the most efficient way to identify the chemical businesses' observations?

A.

Ingest the data from all of the hard drives and perform exploratory data analysis to identify which business is responsible for chemical operations.

B.

Perform analysis on all of the data and create a summary report on the results relevant to chemical operations.

C.

Consult with the business team to identify which sites are responsible for chemical operations and ingest only the relevant data for analysis.

D.

Ingest data from the hard drive containing the most data and present sample results on the chemical operations.

Full Access
Question # 21

A data analyst is analyzing data and would like to build conceptual associations. Which of the following is the best way to accomplish this task?

A.

n-grams

B.

NER

C.

TF-IDF

D.

POS

Full Access
Question # 22

A data scientist wants to predict a person's travel destination. The options are:

    Branson, Missouri, United States

    Mount Kilimanjaro, Tanzania

    Disneyland Paris, Paris, France

    Sydney Opera House, Sydney, Australia

Which of the following models would best fit this use case?

A.

Linear discriminant analysis

B.

k-means modeling

C.

Latent semantic analysis

D.

Principal component analysis

Full Access
Question # 23

A data scientist is performing a linear regression and wants to construct a model that explains the most variation in the data. Which of the following should the data scientist maximize when evaluating the regression performance metrics?

A.

Accuracy

B.

C.

p value

D.

AUC

Full Access
Question # 24

A data scientist is building a proof of concept for a commercialized machine-learning model. Which of the following is the best starting point?

A.

Literature review

B.

Model performance evaluation

C.

Hyperparameter tuning

D.

Model selection

Full Access
Question # 25

A movie production company would like to find the actors appearing in its top movies using data from the tables below. The resulting data must show all movies in Table 1, enriched with actors listed in Table 2.

Which of the following query operations achieves the desired data set?

A.

Perform an INNER JOIN between Table 1 using column Movie, and Table 2 using column Acted_In.

B.

Perform a UNION between Table 1 using column Movie, and Table 2 using column Acted_In.

C.

Perform an INTERSECT between Table 1 using column Movie, and Table 2 using column Acted_In.

D.

Perform a LEFT JOIN on Table 1 using column Movie, with Table 2 using column Acted_In.

Full Access