Microsoft DP-100 Sample Questions

Question # 1

You need to implement a scaling strategy for the local penalty detection data.Which normalization type should you use?

A. Streaming
B. Weight
C. Batch
D. Cosine

Question # 2

You need to resolve the local machine learning pipeline performance issue. What should you do?

A. Increase Graphic Processing Units (GPUs).
B. Increase the learning rate.
C. Increase the training iterations,
D. Increase Central Processing Units (CPUs).

Question # 3

You need to implement a model development strategy to determine a user’s tendency to respond to an ad.Which technique should you use?

A. Use a Relative Expression Split module to partition the data based on centroid distance.
B. Use a Relative Expression Split module to partition the data based on distance travelled to the event.
C. Use a Split Rows module to partition the data based on distance travelled to the event.
D. Use a Split Rows module to partition the data based on centroid distance.

Question # 4

You need to select an environment that will meet the business and data requirements.Which environment should you use?

A. Azure HDInsight with Spark MLlib
B. Azure Cognitive Services
C. Azure Machine Learning Studio
D. Microsoft Machine Learning Server

Question # 5

You need to implement a feature engineering strategy for the crowd sentiment local models.What should you do?

A. Apply an analysis of variance (ANOVA).
B. Apply a Pearson correlation coefficient.
C. Apply a Spearman correlation coefficient.
D. Apply a linear discriminant analysis.

Question # 6

You need to select a feature extraction method.Which method should you use?

A. Spearman correlation
B. Mutual information
C. Mann-Whitney test
D. Pearson’s correlation

Question # 7

You need to select a feature extraction method.Which method should you use?

A. Mutual information
B. Mood’s median test
C. Kendall correlation
D. Permutation Feature Importance

Question # 8

Your team is building a data engineering and data science development environment.The environment must support the following requirements:support Python and Scalacompose data storage, movement, and processing services into automated datapipelinesthe same tool should be used for the orchestration of both data engineering anddata sciencesupport workload isolation and interactive workloadsenable scaling across a cluster of machinesYou need to create the environment.What should you do?

A. Build the environment in Apache Hive for HDInsight and use Azure Data Factory fororchestration.
B. Build the environment in Azure Databricks and use Azure Data Factory for orchestration.
C. Build the environment in Apache Spark for HDInsight and use Azure ContainerInstances for orchestration.
D. Build the environment in Azure Databricks and use Azure Container Instances fororchestration.

Question # 9

You create a classification model with a dataset that contains 100 samples with Class Aand 10,000 samples with Class BThe variation of Class B is very high.You need to resolve imbalances.Which method should you use?

A. Partition and Sample
B. Cluster Centroids
C. Tomek links
D. Synthetic Minority Oversampling Technique (SMOTE)

Question # 10

You are building a binary classification model by using a supplied training set.The training set is imbalanced between two classes.You need to resolve the data imbalance.What are three possible ways to achieve this goal? Each correct answer presents acomplete solution NOTE: Each correct selection is worth one point.

A. Penalize the classification
B. Resample the data set using under sampling or oversampling
C. Generate synthetic samples in the minority class.
D. Use accuracy as the evaluation metric of the model.
E. Normalize the training feature set.

Question # 11

You need to select a pre built development environment for a series of data scienceexperiments. You must use the R language for the experiments.Which three environments can you use? Each correct answer presents a completesolution. NOTE: Each correct selection is worth one point.

A. MI.NET Library on a local environment
B. Azure Machine Learning Studio
C. Data Science Virtual Machine (OSVM)
D. Azure Data bricks
E. Azure Cognitive Services

Question # 12

Note: This question is part of a series of questions that present the same scenario. Eachquestion in the series contains a unique solution that might meet the stated goals. Somequestion sets might have more than one correct solution, while others might not have acorrect solution.After you answer a question in this section, you will NOT be able to return to it. As a result,these questions will not appear in the review screen.You are analyzing a numerical dataset which contains missing values in several columns.You must clean the missing values using an appropriate operation without affecting thedimensionality of the feature set.You need to analyze a full dataset to include all values.Solution: Calculate the column median value and use the median value as the replacementfor any missing value in the column.Does the solution meet the goal?

A. Yes
B. No

Question # 13

You are solving a classification task.The dataset is imbalanced.You need to select an Azure Machine Learning Studio module to improve the classification accuracy.Which module should you use?

A. Fisher Linear Discriminant Analysis.
B. Filter Based Feature Selection
C. Synthetic Minority Oversampling Technique (SMOTE)
D. Permutation Feature Importance