Skip to main content
Version: Next 🚧

Telemetry - Composition Dynamic Controller

This document describes the OpenTelemetry metrics emitted by unstructured-runtime and composition-dynamic-controller (CDC).

For information on how to set up the OpenTelemetry Collector, configure Prometheus, or import dashboards, please refer to the OpenTelemetry Configuration guide.

Telemetry Assets​

The following assets are available for composition-dynamic-controller:

Metrics Reference​

Naming note​

Metric names in code use dots. Prometheus normalizes them with underscores, and counters appear with a _total suffix. Histogram queries use the generated _bucket series (cumulative count), and _sum / _count series for average duration.

Core Metrics​

MetricTypeUnitDescriptionPromQL example
unstructured_runtime.startup.successCountercountController started successfully.sum(increase(unstructured_runtime_startup_success_total[1h]))
unstructured_runtime.startup.failureCountercountController startup failed.sum(increase(unstructured_runtime_startup_failure_total[1h]))
unstructured_runtime.reconcile.duration_secondsHistogramsecondsTotal reconciliation duration.histogram_quantile(0.95, sum by (le) (rate(unstructured_runtime_reconcile_duration_seconds_bucket[5m])))
unstructured_runtime.reconcile.in_flightGaugecountNumber of reconciliations currently in progress.max(unstructured_runtime_reconcile_in_flight)

Queue Metrics​

MetricTypeUnitDescriptionPromQL example
unstructured_runtime.reconcile.queue.depthUpDownCountercountCurrent number of items in the queue.max(unstructured_runtime_reconcile_queue_depth)
unstructured_runtime.reconcile.queue.wait.duration_secondsHistogramsecondsTime spent waiting in queue before processing.histogram_quantile(0.95, sum by (le) (rate(unstructured_runtime_reconcile_queue_wait_duration_seconds_bucket[5m])))
unstructured_runtime.reconcile.queue.oldest_item_age_secondsHistogramsecondsAge of the oldest item currently in queue.histogram_quantile(0.95, sum by (le) (rate(unstructured_runtime_reconcile_queue_oldest_item_age_seconds_bucket[5m])))
unstructured_runtime.reconcile.queue.work.duration_secondsHistogramsecondsTime spent processing a dequeued item.histogram_quantile(0.95, sum by (le) (rate(unstructured_runtime_reconcile_queue_work_duration_seconds_bucket[5m])))
unstructured_runtime.reconcile.queue.requeuesCountercountTotal number of items requeued.sum(increase(unstructured_runtime_reconcile_queue_requeues_total[1h]))

Reconciliation Results​

MetricTypeUnitDescriptionPromQL example
unstructured_runtime.reconcile.successCountercountSuccessfully completed reconciliations.sum(increase(unstructured_runtime_reconcile_success_total[1h]))
unstructured_runtime.reconcile.failureCountercountFailed reconciliations.sum(increase(unstructured_runtime_reconcile_failure_total[1h]))
unstructured_runtime.reconcile.requeue.afterCountercountReconciliations that requested requeue after delay.sum(increase(unstructured_runtime_reconcile_requeue_after_total[1h]))
unstructured_runtime.reconcile.requeue.immediateCountercountReconciliations that requested immediate requeue.sum(increase(unstructured_runtime_reconcile_requeue_immediate_total[1h]))
unstructured_runtime.reconcile.requeue.errorCountercountReconciliations requeued due to error.sum(increase(unstructured_runtime_reconcile_requeue_error_total[1h]))

External Operation Metrics​

MetricTypeUnitDescriptionPromQL example
unstructured_runtime.external.observe.duration_secondsHistogramsecondsTime to observe external resources.histogram_quantile(0.95, sum by (le) (rate(unstructured_runtime_external_observe_duration_seconds_bucket[5m])))
unstructured_runtime.external.connect.duration_secondsHistogramsecondsTime to connect/read external references.histogram_quantile(0.95, sum by (le) (rate(unstructured_runtime_external_connect_duration_seconds_bucket[5m])))
unstructured_runtime.external.create.duration_secondsHistogramsecondsTime to create external resources.histogram_quantile(0.95, sum by (le) (rate(unstructured_runtime_external_create_duration_seconds_bucket[5m])))
unstructured_runtime.external.update.duration_secondsHistogramsecondsTime to update external resources.histogram_quantile(0.95, sum by (le) (rate(unstructured_runtime_external_update_duration_seconds_bucket[5m])))
unstructured_runtime.external.delete.duration_secondsHistogramsecondsTime to delete external resources.histogram_quantile(0.95, sum by (le) (rate(unstructured_runtime_external_delete_duration_seconds_bucket[5m])))

Composition Specific Metrics​

MetricTypeUnitDescriptionPromQL example
composition.rbac_generation.duration_secondsHistogramsecondsTime to generate RBAC policies.histogram_quantile(0.95, sum by (le) (rate(composition_rbac_generation_duration_seconds_bucket[5m])))
composition.rbac_apply.duration_secondsHistogramsecondsTime to apply RBAC policies to the cluster.histogram_quantile(0.95, sum by (le) (rate(composition_rbac_apply_duration_seconds_bucket[5m])))
composition.helm_install.duration_secondsHistogramsecondsTime to install Helm chart.histogram_quantile(0.95, sum by (le) (rate(composition_helm_install_duration_seconds_bucket[5m])))
composition.helm_upgrade.duration_secondsHistogramsecondsTime to upgrade Helm chart.histogram_quantile(0.95, sum by (le) (rate(composition_helm_upgrade_duration_seconds_bucket[5m])))

Metric Design Notes​

  • RBAC split: RBAC generation and apply operations are measured separately.
  • Helm operations: Install and upgrade metrics help track deployment performance.
  • Histogram Buckets: Optimized for observing operation latencies from milliseconds up to 10,000 seconds.