Associate-Developer-Apache-Spark-3.5 試験問題を無料オンラインアクセス
試験コード: | Associate-Developer-Apache-Spark-3.5 |
試験名称: | Databricks Certified Associate Developer for Apache Spark 3.5 - Python |
認定資格: | Databricks |
無料問題数: | 85 |
更新日: | 2025-09-04 |
In the code block below,aggDFcontains aggregations on a streaming DataFrame:
Which output mode at line 3 ensures that the entire result table is written to the console during each trigger execution?
Given the schema:
event_ts TIMESTAMP,
sensor_id STRING,
metric_value LONG,
ingest_ts TIMESTAMP,
source_file_path STRING
The goal is to deduplicate based on: event_ts, sensor_id, and metric_value.
Options:
A data engineer observes that an upstream streaming source sends duplicate records, where duplicates share the same key and have at most a 30-minute difference inevent_timestamp. The engineer adds:
dropDuplicatesWithinWatermark("event_timestamp", "30 minutes")
What is the result?
A Spark developer is building an app to monitor task performance. They need to track the maximum task processing time per worker node and consolidate it on the driver for analysis.
Which technique should be used?