Download Free Demo

Google Professional-Data-Engineer Exam Dumps

Google Professional Data Engineer Exam

( 1206 Reviews )

Total Questions : 400

Update Date : July 16, 2026

PDF + Test Engine

$65 ~~$95~~

Test Engine

$55 ~~$85~~

PDF Only

$45 ~~$75~~

Recent Professional-Data-Engineer Exam Results

Our Google Professional-Data-Engineer dumps are key to get success. More than 80000+ success stories.

Clients Passed Google Professional-Data-Engineer Exam Today

93%

Passing score in Real Google Professional-Data-Engineer Exam

95%

Questions were from our given Professional-Data-Engineer dumps

Professional-Data-Engineer Dumps

Dumpsspot offers the best Professional-Data-Engineer exam dumps that comes with 100% valid questions and answers. With the help of our trained team of professionals, the Professional-Data-Engineer Dumps PDF carries the highest quality. Our course pack is affordable and guarantees a 98% to 100% passing rate for exam. Our Professional-Data-Engineer test questions are specially designed for people who want to pass the exam in a very short time.

Most of our customers choose Dumpsspot's Professional-Data-Engineer study guide that contains questions and answers that help them to pass the exam on the first try. Out of them, many have passed the exam with a passing rate of 98% to 100% by just training online.

Top Benefits Of Google Professional-Data-Engineer Certification

Proven skills proficiency
High earning salary or potential
Opens more career opportunities
Enrich and broaden your skills
Stepping stone to avail of advance Professional-Data-Engineer certification

Who is the target audience of Google Professional-Data-Engineer certification?

The Professional-Data-Engineer PDF is for the candidates who aim to pass the Google Certification exam in their first attempt.
For the candidates who wish to pass the exam for Google Professional-Data-Engineer in a short period of time.
For those who are working in Google industry to explore more.

What makes us provide these Google Professional-Data-Engineer dumps?

Dumpsspot puts the best Professional-Data-Engineer Dumps question and answers forward for the students who want to clear the exam in their first go. We provide a guarantee of 100% assurance. You will not have to worry about passing the exam because we are here to take care of that.

Google Professional-Data-Engineer Sample Questions

Question # 1

Your software uses a simple JSON format for all messages. These messages are published to Google CloudPub/Sub, then processed with Google Cloud Dataflow to create a real-time dashboard for the CFO. Duringtesting, you notice that some messages are missing in the dashboard. You check the logs, and all messages arebeing published to Cloud Pub/Sub successfully. What should you do next?

A. Check the dashboard application to see if it is not displaying correctly.
B. Run a fixed dataset through the Cloud Dataflow pipeline and analyze the output.
C. Use Google Stackdriver Monitoring on Cloud Pub/Sub to find the missing messages.
D. Switch Cloud Dataflow to pull messages from Cloud Pub/Sub instead of Cloud Pub/Sub pushing messages to Cloud Dataflow.

Question # 2

Your startup has never implemented a formal security policy. Currently, everyone in the company has accessto the datasets stored in Google BigQuery. Teams have freedom to use the service as they see fit, and theyhave not documented their use cases. You have been asked to secure the data warehouse. You need to discoverwhat everyone is doing. What should you do first?

A. Use Google Stackdriver Audit Logs to review data access.
B. Get the identity and access management IIAM) policy of each table
C. Use Stackdriver Monitoring to see the usage of BigQuery query slots.
D. Use the Google Cloud Billing API to see what account the warehouse is being billed to.

Question # 3

You are building new real-time data warehouse for your company and will use Google BigQuery streaming inserts. There is no guarantee that data will only be sent in once but you do have a unique ID for each row of data and an event timestamp. You want to ensure that duplicates are not included while interactively querying data. Which query type should you use?

A. Include ORDER BY DESK on timestamp column and LIMIT to 1.
B. Use GROUP BY on the unique ID column and timestamp column and SUM on the values.
C. Use the LAG window function with PARTITION by unique ID along with WHERE LAG IS NOTNULL.
D. Use the ROW_NUMBER window function with PARTITION by unique ID along with WHERE rowequals 1.

Question # 4

You work for a car manufacturer and have set up a data pipeline using Google Cloud Pub/Sub to captureanomalous sensor events. You are using a push subscription in Cloud Pub/Sub that calls a custom HTTPSendpoint that you have created to take action of these anomalous events as they occur. Your custom HTTPSendpoint keeps getting an inordinate amount of duplicate messages. What is the most likely cause of theseduplicate messages?

A. The message body for the sensor event is too large.
B. Your custom endpoint has an out-of-date SSL certificate.
C. The Cloud Pub/Sub topic has too many messages published to it.
D. Your custom endpoint is not acknowledging messages within the acknowledgement deadline.

Question # 5

You are designing a basket abandonment system for an ecommerce company. The system will send a message to a user based on these rules:No interaction by the user on the site for 1 hour Has added more than $30 worth of products to the basket Has not completed a transactionYou use Google Cloud Dataflow to process the data and decide if a message should be sent. How should you design the pipeline?

A. Use a fixed-time window with a duration of 60 minutes.
B. Use a sliding time window with a duration of 60 minutes.
C. Use a session window with a gap time duration of 60 minutes.
D. Use a global window with a time based trigger with a delay of 60 minutes.

Question # 6

You designed a database for patient records as a pilot project to cover a few hundred patients in three clinics.Your design used a single database table to represent all patients and their visits, and you used self-joins togenerate reports. The server resource utilization was at 50%. Since then, the scope of the project has expanded.The database must now store 100 times more patient records. You can no longer run the reports, because theyeither take too long or they encounter errors with insufficient compute resources. How should you adjust thedatabase design?

A. Add capacity (memory and disk space) to the database server by the order of 200.
B. Shard the tables into smaller ones based on date ranges, and only generate reports with prespecified dateranges.
C. Normalize the master patient-record table into the patient table and the visits table, and create othernecessary tables to avoid self-join.
D. Partition the table into smaller tables, with one for each clinic. Run queries against the smaller tablepairs, and use unions for consolidated reports.

Question # 7

Your company has hired a new data scientist who wants to perform complicated analyses across very largedatasets stored in Google Cloud Storage and in a Cassandra cluster on Google Compute Engine. The scientistprimarily wants to create labelled data sets for machine learning projects, along with some visualization tasks.She reports that her laptop is not powerful enough to perform her tasks and it is slowing her down. You wantto help her perform her tasks. What should you do?

A. Run a local version of Jupiter on the laptop.
B. Grant the user access to Google Cloud Shell.
C. Host a visualization tool on a VM on Google Compute Engine.
D. Deploy Google Cloud Datalab to a virtual machine (VM) on Google Compute Engine.

Question # 8

Your company is in a highly regulated industry. One of your requirements is to ensure individual users haveaccess only to the minimum amount of information required to do their jobs. You want to enforce thisrequirement with Google BigQuery. Which three approaches can you take? (Choose three.)

A. Disable writes to certain tables.
B. Restrict access to tables by role.
C. Ensure that the data is encrypted at all times.
D. Restrict BigQuery API access to approved users.
E. Segregate data across multiple tables or databases.
F. Use Google Stackdriver Audit Logging to determine policy violations.

Question # 9

You need to store and analyze social media postings in Google BigQuery at a rate of 10,000 messages perminute in near real-time. Initially, design the application to use streaming inserts for individual postings. Yourapplication also performs data aggregations right after the streaming inserts. You discover that the queriesafter streaming inserts do not exhibit strong consistency, and reports from the queries might miss in-flightdata. How can you adjust your application design?

A. Re-write the application to load accumulated data every 2 minutes.
B. Convert the streaming insert code to batch load for individual messages.
C. Load the original message to Google Cloud SQL, and export the table every hour to BigQuery viastreaming inserts.
D. Estimate the average latency for data availability after streaming inserts, and always run queries afterwaiting twice as long.

Question # 10

Your company is performing data preprocessing for a learning algorithm in Google Cloud Dataflow. Numerous data logs are being are being generated during this step, and the team wants to analyze them. Due to the dynamic nature of the campaign, the data is growing exponentially every hour.The data scientists have written the following code to read the data for a new key features in the logs.BigQueryIO.Read.named(“ReadLogData”) .from(“clouddataflow-readonly:samples.log_data”) You want to improve the performance of this data read. What should you do?

A. Specify the TableReference object in the code.
B. Use .fromQuery operation to read specific fields from the table.
C. Use of both the Google BigQuery TableSchema and TableFieldSchema classes.
D. Call a transform that returns TableRow objects, where each element in the PCollexction represents asingle row in the table.

Question # 11

You want to use Google Stackdriver Logging to monitor Google BigQuery usage. You need an instantnotification to be sent to your monitoring tool when new data is appended to a certain table using an insert job,but you do not want to receive notifications for other tables. What should you do?

A. Make a call to the Stackdriver API to list all logs, and apply an advanced filter.
B. In the Stackdriver logging admin interface, and enable a log sink export to BigQuery.
C. In the Stackdriver logging admin interface, enable a log sink export to Google Cloud Pub/Sub, andsubscribe to the topic from your monitoring tool.
D. Using the Stackdriver API, create a project sink with advanced log filter to export to Pub/Sub, andsubscribe to the topic from your monitoring tool.

Question # 12

You are deploying 10,000 new Internet of Things devices to collect temperature data in your warehouses globally. You need to process, store and analyze these very large datasets in real time. What should you do?

A. Send the data to Google Cloud Datastore and then export to BigQuery.
B. Send the data to Google Cloud Pub/Sub, stream Cloud Pub/Sub to Google Cloud Dataflow, and store the data in Google BigQuery.
C. Send the data to Cloud Storage and then spin up an Apache Hadoop cluster as needed in Google Cloud Dataproc whenever analysis is required.
D. Export logs in batch to Google Cloud Storage and then spin up a Google Cloud SQL instance, import the data from Cloud Storage, and run an analysis as needed.

Question # 13

Your company uses a proprietary system to send inventory data every 6 hours to a data ingestion service in thecloud. Transmitted data includes a payload of several fields and the timestamp of the transmission. If there areany concerns about a transmission, the system re-transmits the data. How should you deduplicate the data mostefficiency?

A. Assign global unique identifiers (GUID) to each data entry.
B. Compute the hash value of each data entry, and compare it with all historical data.
C. Store each data entry as the primary key in a separate database and apply an index.
D. Maintain a database table to store the hash value and other metadata for each data entry.

Question # 14

You are building a model to make clothing recommendations. You know a user’s fashion preference is likelyto change over time, so you build a data pipeline to stream new data back to the model as it becomes available.How should you use this data to train the model?

A. Continuously retrain the model on just the new data.
B. Continuously retrain the model on a combination of existing data and the new data.
C. Train on the existing data while using the new data as your test set.
D. Train on the new data while using the existing data as your test set.

Question # 15

Your company is streaming real-time sensor data from their factory floor into Bigtable and they have noticed extremely poor performance. How should the row key be redesigned to improve Bigtable performance on queries that populate real-time dashboards?

A. Use a row key of the form <timestamp>.
B. Use a row key of the form <sensorid>.
C. Use a row key of the form <timestamp>#<sensorid>.
D. Use a row key of the form >#<sensorid>#<timestamp>.

Question # 16

You are working on a sensitive project involving private user data. You have set up a project on Google CloudPlatform to house your work internally. An external consultant is going to assist with coding a complextransformation in a Google Cloud Dataflow pipeline for your project. How should you maintain users’privacy?

A. Grant the consultant the Viewer role on the project.
B. Grant the consultant the Cloud Dataflow Developer role on the project.
C. Create a service account and allow the consultant to log on with it.
D. Create an anonymized sample of the data for the consultant to work with in a different project.

Question # 17

You have Google Cloud Dataflow streaming pipeline running with a Google Cloud Pub/Sub subscription asthe source. You need to make an update to the code that will make the new Cloud Dataflow pipelineincompatible with the current version. You do not want to lose any data when making this update. Whatshould you do?

A. Update the current pipeline and use the drain flag.
B. Update the current pipeline and provide the transform mapping JSON object.
C. Create a new pipeline that has the same Cloud Pub/Sub subscription and cancel the old pipeline.
D. Create a new pipeline that has a new Cloud Pub/Sub subscription and cancel the old pipeline.

Question # 18

Your company’s customer and order databases are often under heavy load. This makes performing analyticsagainst them difficult without harming operations. The databases are in a MySQL cluster, with nightlybackups taken using mysqldump. You want to perform analytics with minimal impact on operations. Whatshould you do?

A. Add a node to the MySQL cluster and build an OLAP cube there.
B. Use an ETL tool to load the data from MySQL into Google BigQuery.
C. Connect an on-premises Apache Hadoop cluster to MySQL and perform ETL.
D. Mount the backups to Google Cloud SQL, and then process the data using Google Cloud Dataproc.

Question # 19

You are building a model to predict whether or not it will rain on a given day. You have thousands of inputfeatures and want to see if you can improve training speed by removing some features while having aminimum effect on model accuracy. What can you do?

A. Eliminate features that are highly correlated to the output labels.
B. Combine highly co-dependent features into one representative feature.
C. Instead of feeding in each feature individually, average their values in batches of 3.
D. Remove the features that have null values for more than 50% of the training records.

Question # 20

You want to use a database of information about tissue samples to classify future tissue samples as eithernormal or mutated. You are evaluating an unsupervised anomaly detection method for classifying the tissuesamples. Which two characteristic support this method? (Choose two.)

A. There are very few occurrences of mutations relative to normal samples.
B. There are roughly equal occurrences of both normal and mutated samples in the database.
C. You expect future mutations to have different features from the mutated samples in the database.
D. You expect future mutations to have similar features to the mutated samples in the database.
E. You already have labels for which samples are mutated and which are normal in the database.