Ultimate Guide to the Professional-Data-Engineer - Latest Feb 27, 2022 Edition Available Now
2022 Updated Verified Pass Professional-Data-Engineer Exam - Real Questions and Answers
Requirements
The certification does not have any official prerequisites. However, it is advised to have at least three years of industry experience with one or more years of expertise in designing and managing different solutions with the use of Google Cloud Platform. It is also required to review the topics of the qualifying exam before sitting for it.
NEW QUESTION 96
You work for a manufacturing company that sources up to 750 different components, each from a different supplier. You've collected a labeled dataset that has on average 1000 examples for each unique component.
Your team wants to implement an app to help warehouse workers recognize incoming components based on a photo of the component. You want to implement the first working version of this app (as Proof-Of-Concept) within a few working days. What should you do?
- A. Train your own image recognition model leveraging transfer learning techniques.
- B. Use Cloud Vision AutoML, but reduce your dataset twice.
- C. Use Cloud Vision API by providing custom labels as recognition hints.
- D. Use Cloud Vision AutoML with the existing dataset.
Answer: D
NEW QUESTION 97
You are using Google BigQuery as your data warehouse. Your users report that the following simple query is running very slowly, no matter when they run the query:
SELECT country, state, city FROM [myproject:mydataset.mytable] GROUP BY country You check the query plan for the query and see the following output in the Read section of Stage:1:
What is the most likely cause of the delay for this query?
- A. Either the state or the city columns in the [myproject:mydataset.mytable] table have too many NULL values
- B. Users are running too many concurrent queries in the system
- C. The [myproject:mydataset.mytable] table has too many partitions
- D. Most rows in the [myproject:mydataset.mytable] table have the same value in the country column, causing data skew
Answer: B
NEW QUESTION 98
You are deploying a new storage system for your mobile application, which is a media streaming service. You decide the best fit is Google Cloud Datastore. You have entities with multiple properties, some of which can take on multiple values. For example, in the entity 'Movie'the property 'actors'and the property 'tags' have multiple values but the property 'date released' does not. A typical query would ask for all movies with actor=<actorname>ordered by date_releasedor all movies with tag=Comedyordered by date_released. How should you avoid a combinatorial explosion in the number of indexes?
- A. Manually configure the index in your index config as follows:

- B. Manually configure the index in your index config as follows:

- C. Set the following in your entity options: exclude_from_indexes = 'date_published'
- D. Set the following in your entity options: exclude_from_indexes = 'actors, tags'
Answer: B
NEW QUESTION 99
Which SQL keyword can be used to reduce the number of columns processed by BigQuery?
- A. SELECT
- B. LIMIT
- C. WHERE
- D. BETWEEN
Answer: A
Explanation:
Explanation
SELECT allows you to query specific columns rather than the whole table.
LIMIT, BETWEEN, and WHERE clauses will not reduce the number of columns processed by BigQuery.
Reference:
https://cloud.google.com/bigquery/launch-checklist#architecture_design_and_development_checklist
NEW QUESTION 100
Flowlogistic Case Study
Company Overview
Flowlogistic is a leading logistics and supply chain provider. They help businesses throughout the world
manage their resources and transport them to their final destination. The company has grown rapidly,
expanding their offerings to include rail, truck, aircraft, and oceanic shipping.
Company Background
The company started as a regional trucking company, and then expanded into other logistics market.
Because they have not updated their infrastructure, managing and tracking orders and shipments has
become a bottleneck. To improve operations, Flowlogistic developed proprietary technology for tracking
shipments in real time at the parcel level. However, they are unable to deploy it because their technology
stack, based on Apache Kafka, cannot support the processing volume. In addition, Flowlogistic wants to
further analyze their orders and shipments to determine how best to deploy their resources.
Solution Concept
Flowlogistic wants to implement two concepts using the cloud:
Use their proprietary technology in a real-time inventory-tracking system that indicates the location of
their loads
Perform analytics on all their orders and shipment logs, which contain both structured and unstructured
data, to determine how best to deploy resources, which markets to expand info. They also want to use
predictive analytics to learn earlier when a shipment will be delayed.
Existing Technical Environment
Flowlogistic architecture resides in a single data center:
Databases
8 physical servers in 2 clusters
- SQL Server - user data, inventory, static data
3 physical servers
- Cassandra - metadata, tracking messages
10 Kafka servers - tracking message aggregation and batch insert
Application servers - customer front end, middleware for order/customs
60 virtual machines across 20 physical servers
- Tomcat - Java services
- Nginx - static content
- Batch servers
Storage appliances
- iSCSI for virtual machine (VM) hosts
- Fibre Channel storage area network (FC SAN) - SQL server storage
- Network-attached storage (NAS) image storage, logs, backups
10 Apache Hadoop /Spark servers
- Core Data Lake
- Data analysis workloads
20 miscellaneous servers
- Jenkins, monitoring, bastion hosts,
Business Requirements
Build a reliable and reproducible environment with scaled panty of production.
Aggregate data in a centralized Data Lake for analysis
Use historical data to perform predictive analytics on future shipments
Accurately track every shipment worldwide using proprietary technology
Improve business agility and speed of innovation through rapid provisioning of new resources
Analyze and optimize architecture for performance in the cloud
Migrate fully to the cloud if all other requirements are met
Technical Requirements
Handle both streaming and batch data
Migrate existing Hadoop workloads
Ensure architecture is scalable and elastic to meet the changing demands of the company.
Use managed services whenever possible
Encrypt data flight and at rest
Connect a VPN between the production data center and cloud environment
SEO Statement
We have grown so quickly that our inability to upgrade our infrastructure is really hampering further growth
and efficiency. We are efficient at moving shipments around the world, but we are inefficient at moving
data around.
We need to organize our information so we can more easily understand where our customers are and
what they are shipping.
CTO Statement
IT has never been a priority for us, so as our data has grown, we have not invested enough in our
technology. I have a good staff to manage IT, but they are so busy managing our infrastructure that I
cannot get them to do the things that really matter, such as organizing our data, building the analytics, and
figuring out how to implement the CFO' s tracking technology.
CFO Statement
Part of our competitive advantage is that we penalize ourselves for late shipments and deliveries. Knowing
where out shipments are at all times has a direct correlation to our bottom line and profitability.
Additionally, I don't want to commit capital to building out a server environment.
Flowlogistic's CEO wants to gain rapid insight into their customer base so his sales team can be better
informed in the field. This team is not very technical, so they've purchased a visualization tool to simplify
the creation of BigQuery reports. However, they've been overwhelmed by all the data in the table, and are
spending a lot of money on queries trying to find the data they need. You want to solve their problem in the
most cost-effective way. What should you do?
- A. Create identity and access management (IAM) roles on the appropriate columns, so only they appear
in a query. - B. Export the data into a Google Sheet for virtualization.
- C. Create an additional table with only the necessary columns.
- D. Create a view on the table to present to the virtualization tool.
Answer: D
NEW QUESTION 101
You want to analyze hundreds of thousands of social media posts daily at the lowest cost and with the fewest steps.
You have the following requirements:
* You will batch-load the posts once per day and run them through the Cloud Natural Language API.
* You will extract topics and sentiment from the posts.
* You must store the raw posts for archiving and reprocessing.
* You will create dashboards to be shared with people both inside and outside your organization.
You need to store both the data extracted from the API to perform analysis as well as the raw social media posts for historical archiving. What should you do?
- A. Store the social media posts and the data extracted from the API in Cloud SQL.
- B. Feed to social media posts into the API directly from the source, and write the extracted data from the API into BigQuery.
- C. Store the raw social media posts in Cloud Storage, and write the data extracted from the API into BigQuery.
- D. Store the social media posts and the data extracted from the API in BigQuery.
Answer: C
Explanation:
Social media posts can images/videos which cannot be stored in bigquery/
NEW QUESTION 102
Case Study 1 - Flowlogistic
Company Overview
Flowlogistic is a leading logistics and supply chain provider. They help businesses throughout the world manage their resources and transport them to their final destination. The company has grown rapidly, expanding their offerings to include rail, truck, aircraft, and oceanic shipping.
Company Background
The company started as a regional trucking company, and then expanded into other logistics market.
Because they have not updated their infrastructure, managing and tracking orders and shipments has become a bottleneck. To improve operations, Flowlogistic developed proprietary technology for tracking shipments in real time at the parcel level. However, they are unable to deploy it because their technology stack, based on Apache Kafka, cannot support the processing volume. In addition, Flowlogistic wants to further analyze their orders and shipments to determine how best to deploy their resources.
Solution Concept
Flowlogistic wants to implement two concepts using the cloud:
* Use their proprietary technology in a real-time inventory-tracking system that indicates the location of their loads
* Perform analytics on all their orders and shipment logs, which contain both structured and unstructured data, to determine how best to deploy resources, which markets to expand info. They also want to use predictive analytics to learn earlier when a shipment will be delayed.
Existing Technical Environment
Flowlogistic architecture resides in a single data center:
* Databases
8 physical servers in 2 clusters
- SQL Server - user data, inventory, static data
3 physical servers
- Cassandra - metadata, tracking messages
10 Kafka servers - tracking message aggregation and batch insert
* Application servers - customer front end, middleware for order/customs
60 virtual machines across 20 physical servers
- Tomcat - Java services
- Nginx - static content
- Batch servers
* Storage appliances
- iSCSI for virtual machine (VM) hosts
- Fibre Channel storage area network (FC SAN) - SQL server storage
- Network-attached storage (NAS) image storage, logs, backups
* 10 Apache Hadoop /Spark servers
- Core Data Lake
- Data analysis workloads
* 20 miscellaneous servers
- Jenkins, monitoring, bastion hosts,
Business Requirements
* Build a reliable and reproducible environment with scaled panty of production.
* Aggregate data in a centralized Data Lake for analysis
* Use historical data to perform predictive analytics on future shipments
* Accurately track every shipment worldwide using proprietary technology
* Improve business agility and speed of innovation through rapid provisioning of new resources
* Analyze and optimize architecture for performance in the cloud
* Migrate fully to the cloud if all other requirements are met
Technical Requirements
* Handle both streaming and batch data
* Migrate existing Hadoop workloads
* Ensure architecture is scalable and elastic to meet the changing demands of the company.
* Use managed services whenever possible
* Encrypt data flight and at rest
* Connect a VPN between the production data center and cloud environment SEO Statement We have grown so quickly that our inability to upgrade our infrastructure is really hampering further growth and efficiency. We are efficient at moving shipments around the world, but we are inefficient at moving data around.
We need to organize our information so we can more easily understand where our customers are and what they are shipping.
CTO Statement
IT has never been a priority for us, so as our data has grown, we have not invested enough in our technology. I have a good staff to manage IT, but they are so busy managing our infrastructure that I cannot get them to do the things that really matter, such as organizing our data, building the analytics, and figuring out how to implement the CFO' s tracking technology.
CFO Statement
Part of our competitive advantage is that we penalize ourselves for late shipments and deliveries. Knowing where out shipments are at all times has a direct correlation to our bottom line and profitability. Additionally, I don't want to commit capital to building out a server environment.
Flowlogistic wants to use Google BigQuery as their primary analysis system, but they still have Apache Hadoop and Spark workloads that they cannot move to BigQuery. Flowlogistic does not know how to store the data that is common to both workloads. What should they do?
- A. Store the common data in BigQuery as partitioned tables.
- B. Store the common data in BigQuery and expose authorized views.
- C. Store he common data in the HDFS storage for a Google Cloud Dataproc cluster.
- D. Store the common data encoded as Avro in Google Cloud Storage.
Answer: B
Explanation:
DataProc can access data from Bigquery as well.
NEW QUESTION 103
Your company is performing data preprocessing for a learning algorithm in Google Cloud Dataflow.
Numerous data logs are being are being generated during this step, and the team wants to analyze them. Due to the dynamic nature of the campaign, the data is growing exponentially every hour.
The data scientists have written the following code to read the data for a new key features in the logs.
BigQueryIO.Read
.named("ReadLogData")
.from("clouddataflow-readonly:samples.log_data")
You want to improve the performance of this data read. What should you do?
- A. Use .fromQuery operation to read specific fields from the table.
- B. Call a transform that returns TableRow objects, where each element in the PCollection represents a single row in the table.
- C. Use of both the Google BigQuery TableSchema and TableFieldSchema classes.
- D. Specify the TableReference object in the code.
Answer: B
NEW QUESTION 104
You are implementing security best practices on your data pipeline. Currently, you are manually executing jobs as the Project Owner. You want to automate these jobs by taking nightly batch files containing non- public information from Google Cloud Storage, processing them with a Spark Scala job on a Google Cloud Dataproc cluster, and depositing the results into Google BigQuery.
How should you securely run this workload?
- A. Use a service account with the ability to read the batch files and to write to BigQuery
- B. Restrict the Google Cloud Storage bucket so only you can see the files
- C. Use a user account with the Project Viewer role on the Cloud Dataproc cluster to read the batch files and write to BigQuery
- D. Grant the Project Owner role to a service account, and run the job with it
Answer: A
NEW QUESTION 105
You are migrating your data warehouse to Google Cloud and decommissioning your on-premises data center Because this is a priority for your company, you know that bandwidth will be made available for the initial data load to the cloud. The files being transferred are not large in number, but each file is 90 GB Additionally, you want your transactional systems to continually update the warehouse on Google Cloud in real time What tools should you use to migrate the data and ensure that it continues to write to your warehouse?
- A. gsutil for the migration; Pub/Sub and Dataflow for the real-time updates
- B. gsutil for both the migration and the real-time updates
- C. BigQuery Data Transfer Service lor the migration, Pub/Sub and Dataproc for the real-time updates
- D. Storage Transfer Service for the migration, Pub/Sub and Cloud Data Fusion for the real-time updates
Answer: A
NEW QUESTION 106
You work for a car manufacturer and have set up a data pipeline using Google Cloud Pub/Sub to capture anomalous sensor events. You are using a push subscription in Cloud Pub/Sub that calls a custom HTTPS endpoint that you have created to take action of these anomalous events as they occur. Your custom HTTPS endpoint keeps getting an inordinate amount of duplicate messages. What is the most likely cause of these duplicate messages?
- A. Your custom endpoint is not acknowledging messages within the acknowledgement deadline.
- B. Your custom endpoint has an out-of-date SSL certificate.
- C. The message body for the sensor event is too large.
- D. The Cloud Pub/Sub topic has too many messages published to it.
Answer: B
NEW QUESTION 107
Your weather app queries a database every 15 minutes to get the current temperature. The frontend is powered by Google App Engine and server millions of users. How should you design the frontend to respond to a database failure?
- A. Issue a command to restart the database servers.
- B. Retry the query every second until it comes back online to minimize staleness of data.
- C. Reduce the query frequency to once every hour until the database comes back online.
- D. Retry the query with exponential backoff, up to a cap of 15 minutes.
Answer: D
NEW QUESTION 108
You have a query that filters a BigQuery table using a WHERE clause on timestamp and ID columns. By using bq query - -dry_run you learn that the query triggers a full scan of the table, even though the filter on timestamp and ID select a tiny fraction of the overall data. You want to reduce the amount of data scanned by BigQuery with minimal changes to existing SQL queries. What should you do?
- A. Use the bq query - -maximum_bytes_billed flag to restrict the number of bytes billed.
- B. Recreate the table with a partitioning column and clustering column.
- C. Use the LIMIT keyword to reduce the number of rows returned.
- D. Create a separate table for each ID.
Answer: B
NEW QUESTION 109
Which of the following is NOT a valid use case to select HDD (hard disk drives) as the storage for Google Cloud Bigtable?
- A. You expect to store at least 10 TB of data.
- B. You need to integrate with Google BigQuery.
- C. You will not use the data to back a user-facing or latency-sensitive application.
- D. You will mostly run batch workloads with scans and writes, rather than frequently executing random reads of a small number of rows.
Answer: B
Explanation:
Explanation
For example, if you plan to store extensive historical data for a large number of remote-sensing devices and then use the data to generate daily reports, the cost savings for HDD storage may justify the performance tradeoff. On the other hand, if you plan to use the data to display a real-time dashboard, it probably would not make sense to use HDD storage-reads would be much more frequent in this case, and reads are much slower with HDD storage.
Reference: https://cloud.google.com/bigtable/docs/choosing-ssd-hdd
NEW QUESTION 110
Your neural network model is taking days to train. You want to increase the training speed. What can you
do?
- A. Subsample your training dataset.
- B. Subsample your test dataset.
- C. Increase the number of layers in your neural network.
- D. Increase the number of input features to your model.
Answer: C
Explanation:
Explanation/Reference:
Reference: https://towardsdatascience.com/how-to-increase-the-accuracy-of-a-neural-network-
9f5d1c6f407d
NEW QUESTION 111
Which TensorFlow function can you use to configure a categorical column if you don't know all of the possible values for that column?
- A. categorical_column_with_vocabulary_list
- B. categorical_column_with_hash_bucket
- C. categorical_column_with_unknown_values
- D. sparse_column_with_keys
Answer: B
Explanation:
If you know the set of all possible feature values of a column and there are only a few of them, you can use categorical_column_with_vocabulary_list. Each key in the list will get assigned an auto-incremental ID starting from 0.
What if we don't know the set of possible values in advance? Not a problem. We can use categorical_column_with_hash_bucket instead. What will happen is that each possible value in the feature column occupation will be hashed to an integer ID as we encounter them in training.
Reference: https://www.tensorflow.org/tutorials/wide
NEW QUESTION 112
You have a petabyte of analytics data and need to design a storage and processing platform for it. You must be able to perform data warehouse-style analytics on the data in Google Cloud and expose the dataset as files for batch analysis tools in other cloud providers. What should you do?
- A. Store and process the entire dataset in BigQuery.
- B. Store and process the entire dataset in Cloud Bigtable.
- C. Store the full dataset in BigQuery, and store a compressed copy of the data in a Cloud Storage bucket.
- D. Store the warm data as files in Cloud Storage, and store the active data in BigQuery. Keep this ratio as 80% warm and 20% active.
Answer: D
NEW QUESTION 113
Case Study: 1 - Flowlogistic
Company Overview
Flowlogistic is a leading logistics and supply chain provider. They help businesses throughout the world manage their resources and transport them to their final destination. The company has grown rapidly, expanding their offerings to include rail, truck, aircraft, and oceanic shipping.
Company Background
The company started as a regional trucking company, and then expanded into other logistics market.
Because they have not updated their infrastructure, managing and tracking orders and shipments has become a bottleneck. To improve operations, Flowlogistic developed proprietary technology for tracking shipments in real time at the parcel level. However, they are unable to deploy it because their technology stack, based on Apache Kafka, cannot support the processing volume. In addition, Flowlogistic wants to further analyze their orders and shipments to determine how best to deploy their resources.
Solution Concept
Flowlogistic wants to implement two concepts using the cloud:
Use their proprietary technology in a real-time inventory-tracking system that indicates the location of their loads Perform analytics on all their orders and shipment logs, which contain both structured and unstructured data, to determine how best to deploy resources, which markets to expand info. They also want to use predictive analytics to learn earlier when a shipment will be delayed.
Existing Technical Environment
Flowlogistic architecture resides in a single data center:
Databases
8 physical servers in 2 clusters
SQL Server - user data, inventory, static data
3 physical servers
Cassandra - metadata, tracking messages
10 Kafka servers - tracking message aggregation and batch insert
Application servers - customer front end, middleware for order/customs 60 virtual machines across 20 physical servers Tomcat - Java services Nginx - static content Batch servers Storage appliances iSCSI for virtual machine (VM) hosts Fibre Channel storage area network (FC SAN) ?SQL server storage Network-attached storage (NAS) image storage, logs, backups Apache Hadoop /Spark servers Core Data Lake Data analysis workloads
20 miscellaneous servers
Jenkins, monitoring, bastion hosts,
Business Requirements
Build a reliable and reproducible environment with scaled panty of production. Aggregate data in a centralized Data Lake for analysis Use historical data to perform predictive analytics on future shipments Accurately track every shipment worldwide using proprietary technology Improve business agility and speed of innovation through rapid provisioning of new resources Analyze and optimize architecture for performance in the cloud Migrate fully to the cloud if all other requirements are met Technical Requirements Handle both streaming and batch data Migrate existing Hadoop workloads Ensure architecture is scalable and elastic to meet the changing demands of the company.
Use managed services whenever possible
Encrypt data flight and at rest
Connect a VPN between the production data center and cloud environment SEO Statement We have grown so quickly that our inability to upgrade our infrastructure is really hampering further growth and efficiency. We are efficient at moving shipments around the world, but we are inefficient at moving data around.
We need to organize our information so we can more easily understand where our customers are and what they are shipping.
CTO Statement
IT has never been a priority for us, so as our data has grown, we have not invested enough in our technology. I have a good staff to manage IT, but they are so busy managing our infrastructure that I cannot get them to do the things that really matter, such as organizing our data, building the analytics, and figuring out how to implement the CFO' s tracking technology.
CFO Statement
Part of our competitive advantage is that we penalize ourselves for late shipments and deliveries. Knowing where out shipments are at all times has a direct correlation to our bottom line and profitability.
Additionally, I don't want to commit capital to building out a server environment.
is rolling out their real-time inventory tracking system. The tracking devices will all send package-tracking messages, which will now go to a single Google Cloud Pub/Sub topic instead of the Apache Kafka cluster.
A subscriber application will then process the messages for real-time reporting and store them in Google BigQuery for historical analysis. You want to ensure the package data can be analyzed over time.
Which approach should you take?
- A. Use the automatically generated timestamp from Cloud Pub/Sub to order the data.
- B. Attach the timestamp on each message in the Cloud Pub/Sub subscriber application as they are received.
- C. Use the NOW () function in BigQuery to record the event's time.
- D. Attach the timestamp and Package ID on the outbound message from each publisher device as they are sent to Clod Pub/Sub.
Answer: D
NEW QUESTION 114
You decided to use Cloud Datastore to ingest vehicle telemetry data in real time. You want to build a storage system that will account for the long-term data growth, while keeping the costs low. You also want to create snapshots of the data periodically, so that you can make a point-in-time (PIT) recovery, or clone a copy of the data for Cloud Datastore in a different environment. You want to archive these snapshots for a long time. Which two methods can accomplish this? (Choose two.)
- A. Use managed export, and then import to Cloud Datastore in a separate project under a unique namespace reserved for that export.
- B. Write an application that uses Cloud Datastore client libraries to read all the entities. Format the exported data into a JSON file. Apply compression before storing the data in Cloud Source Repositories.
- C. Write an application that uses Cloud Datastore client libraries to read all the entities. Treat each entity as a BigQuery table row via BigQuery streaming insert. Assign an export timestamp for each export, and attach it as an extra column for each row. Make sure that the BigQuery table is partitioned using the export timestamp column.
- D. Use managed export, and store the data in a Cloud Storage bucket using Nearline or Coldline class.
- E. Use managed export, and then import the data into a BigQuery table created just for that export, and delete temporary export files.
Answer: A,D
Explanation:
https://cloud.google.com/datastore/docs/export-import-entities
NEW QUESTION 115
Government regulations in your industry mandate that you have to maintain an auditable record of access to certain types of data. Assuming that all expiring logs will be archived correctly, where should you store data that is subject to that mandate?
- A. In a BigQuery dataset that is viewable only by authorized personnel, with the Data Access log used to provide the auditability.
- B. In a bucket on Cloud Storage that is accessible only by an AppEngine service that collects user information and logs the access before providing a link to the bucket.
- C. Encrypted on Cloud Storage with user-supplied encryption keys. A separate decryption key will be given to each authorized user.
- D. In Cloud SQL, with separate database user names to each user. The Cloud SQL Admin activity logs will be used to provide the auditability.
Answer: A
Explanation:
Bigquery is used to analyse access logs, data access logs capture the details of the user that accessed the data.
NEW QUESTION 116
You have uploaded 5 years of log data to Cloud Storage A user reported that some data points in the log data are outside of their expected ranges, which indicates errors You need to address this issue and be able to run the process again in the future while keeping the original data for compliance reasons What should you do?
- A. Create a Compute Engine instance and create a new copy of the data in Cloud Storage Skip the rows with errors
- B. Import the data from Cloud Storage into BigQuery Create a new BigQuery table, and skip the rows with errors.
- C. Create a Cloud Dataflow workflow that reads the data from Cloud Storage, checks for values outside the expected range, sets the value to an appropriate default, and writes the updated records to a new dataset in Cloud Storage
- D. Create a Cloud Dataflow workflow that reads the data from Cloud Storage, checks for values outside the expected range, sets the value to an appropriate default, and writes the updated records to the same dataset in Cloud Storage
Answer: C
NEW QUESTION 117
Google Cloud Bigtable indexes a single value in each row. This value is called the _______.
- A. master key
- B. unique key
- C. primary key
- D. row key
Answer: D
Explanation:
Cloud Bigtable is a sparsely populated table that can scale to billions of rows and thousands of columns, allowing you to store terabytes or even petabytes of data. A single value in each row is indexed; this value is known as the row key.
Reference: https://cloud.google.com/bigtable/docs/overview
NEW QUESTION 118
You want to automate execution of a multi-step data pipeline running on Google Cloud. The pipeline includes Cloud Dataproc and Cloud Dataflow jobs that have multiple dependencies on each other. You want to use managed services where possible, and the pipeline will run every day. Which tool should you use?
- A. Cloud Scheduler
- B. Workflow Templates on Cloud Dataproc
- C. cron
- D. Cloud Composer
Answer: B
NEW QUESTION 119
You designed a database for patient records as a pilot project to cover a few hundred patients in three clinics. Your design used a single database table to represent all patients and their visits, and you used self-joins to generate reports. The server resource utilization was at 50%. Since then, the scope of the project has expanded. The database must now store 100 times more patient records. You can no longer run the reports, because they either take too long or they encounter errors with insufficient compute resources. How should you adjust the database design?
- A. Add capacity (memory and disk space) to the database server by the order of 200.
- B. Normalize the master patient-record table into the patient table and the visits table, and create other necessary tables to avoid self-join.
- C. Shard the tables into smaller ones based on date ranges, and only generate reports with prespecified date ranges.
- D. Partition the table into smaller tables, with one for each clinic. Run queries against the smaller table pairs, and use unions for consolidated reports.
Answer: C
NEW QUESTION 120
......
The Professional Data Engineer exam is the industry-standard exam that proves the candidate’s ability to do data-driven decision-making by assembling, transforming, and publishing data. If you are rooting for a career in data engineering, you should take this test. It will lead you to attain the Professional Data Engineer certification issued by Google.
Dumps Moneyack Guarantee - Professional-Data-Engineer Dumps Approved Dumps: https://www.validdumps.top/Professional-Data-Engineer-exam-torrent.html
Verified Professional-Data-Engineer Exam Dumps PDF [2022] Access using ValidDumps: https://drive.google.com/open?id=1Ob7Pl8N1FprccFfpWQGExGvF9VDXwnv8