Telemetry Service - Functional Specification¶
The Telemetry Service collects and analyzes the necessary metrics of the process. The service receives notifications about the start and end of the model process, and based on the data in the notifications and the time of notifications, requests metrics for the period of time of the model process from Prometheus, analyzes them, aggregates and saves them in its storage.
Individual topics can be uploaded directly to ClickHouse without additional enrichment.
The Quota Service is designed to quickly allocate quotas by user. The service receives real-time calculation events from telemetry. For users who are not in memory, an additional request is made to ClickHouse, where quotas are counted by event directly. For more information, review the following article: Using the Kafka table engine.
Event Naming Rules¶
Each event must follow the rules outlined below:
- All letters in the event name must be lowercase.
- Start the event name with the name of the system or service that generates it.
- Next, include the specific component or subsystem responsible for the event.
- Then, indicate the action being performed.
- Finally, add an optional note if additional context is needed.
XUMI Telemetry Specification¶
| Component | Description | Owner |
|---|---|---|
| Kafka REST Proxy | https://github.com/confluentinc/kafka-rest/tree/master https://github.com/strimzi/strimzi-kafka-bridge/tree/main REST API for Kafka |
devops |
| KAFKA_PROXY_URL | Environment Variable in docker container. Connection string for Kafka proxy | devops |
| NODE_ID | Environment Variable in docker container. UUID node | devops, ml |
| VM_TYPE | Virtual machine instance type (NV6, x4GPUlarge) | devops |
| HOST_NAME | Host name | devops |
| STORAGE_SIZE | Dynamic storage size | devops |
| STORAGE_TYPE | Dynamic storage type | devops |
| TENANT_ID | Tenant ID variable in docker container. UUID | Devops, ml |
| Python Event Service | service classes that send messages to the REST API for Kafka | ml |
| Events | ml | |
| Kafka Topics | xumi_load, xumi_task Topics for messages about loading weights and starting tasks |
devops |
PBAC Telemetry Specification¶
All audit events are directly recorded in the ClickHouse database. In the future, there may be an audit event response service. It will read data from the corresponding Kafka topics.
| Component | Description | Owner |
|---|---|---|
| KAFKA_DIRECT_URL | Environment Variable in docker container. Connection string for Kafka | devops |
| Golang Event Service | service classes that send messages to Kafka | backend |
| .NET Event Service | service classes that send messages to Kafka | backend |
| Events | backend | |
| Kafka Topics | pbac_access, pbac_politics, pbac_service Topics for messages about audit access, audit politics changes and service messages (update, load, changes politics) |
devops |
Asset Storage Telemetry Specification¶
| Component | Description | Owner |
|---|---|---|
| KAFKA_DIRECT_URL | Environment Variable in docker container. Connection string for Kafka | devops |
| Golang Event Service | service classes that send messages to Kafka | backend |
| .NET Event Service | service classes that send messages to Kafka | backend |
| Events | backend | |
| Kafka Topics | asset_access, asset_transfer Topics for messages about audit access and upload/download asset in storage |
devops |
Workflow Telemetry Specification¶
| Component | Description | Owner |
|---|---|---|
| KAFKA_DIRECT_URL | Environment Variable in docker container. Connection string for kafka | devops |
| .NET Event Service | service classes that send messages to Kafka | backend |
| Events | backend | |
| Kafka Topics | wkf_cpu, wkf_gpu Topics for messages about credits on use CPU and GPU |
devops |
Kubernetes Telemetry Specification¶
| Component | Description | Owner |
|---|---|---|
| KAFKA_DIRECT_URL | Environment Variable in docker container. Connection string for Kafka | devops |
| .NET Event Service | service classes that send messages to Kafka | backend |
| Events | backend | |
| Kafka Topics | kuber_cpu, kuber_gpu Topics for messages about credits on use CPU and GPU |
devops |
MCP Telemetry Specification¶
This entire process starts with the Workflow service. The Workflow service initializes the processing of the XUMI model and sends the corresponding event to the Azure Event Hubs, passing the necessary data in the event body:
- The identifier of the user who initiated the model processing
- The identifier of the model processing task
- The name of the Kubernetes node in which the model processing is running.
When the XUMI completes model processing (successfully or not), the Workflow service sends one of two events (depending on the success of the processing): TaskCompleted or TaskFailed, passing the identifier of the model processing task in the event body.



