Overview

This content is currently WIP. Diagrams, content, and structure are subject to change.

This section covers the observability and monitoring tools integrated with the C3 Agentic AI Platform. You’ll learn about Prometheus for metrics collection, OpenSearch for log management, and Grafana for visualization and dashboards.

What is observability and monitoring?

Observability and monitoring in the C3 Agentic AI Platform consists of tools that collect data about your applications and infrastructure. Monitoring shows when something is wrong, while observability provides context to understand why it’s wrong. The platform integrates standard monitoring tools that collect, store, and display:

Metrics: Numerical data points collected at regular intervals
Logs: Text records of events and actions
Traces: Records of requests as they flow through distributed systems
Alerts: Notifications when predefined conditions are met

Monitoring architecture

The C3 Agentic AI Platform’s monitoring architecture follows a layered approach:

Data collection

Agents and exporters that gather metrics, logs, and traces from various sources.

Data storage

Time-series databases and log storage systems that efficiently store monitoring data.

Data processing

Systems that analyze, aggregate, and correlate monitoring data to extract insights.

Visualization and alerting

Dashboards and notification systems that present monitoring data and alert on anomalies.

This architecture provides a comprehensive view of your application’s health and performance, from infrastructure to business metrics.

Core monitoring components

The C3 Agentic AI Platform integrates several industry-standard monitoring tools:

Prometheus for metrics

Prometheus collects and stores time-series metrics from your applications and infrastructure:

Resource metrics: CPU, memory, disk, and network usage
Application metrics: Request rates, error rates, and latencies
Business metrics: User activity, transaction volumes, and other domain-specific metrics

OpenSearch for logs

OpenSearch (formerly Elasticsearch) centralizes logs from all components of your application:

Application logs: Messages generated by your application code
System logs: Events from the operating system and infrastructure
Access logs: Records of API calls and user interactions
Audit logs: Security-relevant events and administrative actions

Grafana for visualization

Grafana provides dashboards for visualizing metrics and logs:

Predefined dashboards: Ready-to-use visualizations for common monitoring needs
Custom dashboards: Tailored views for specific applications or use cases
Alerts: Notifications based on metric thresholds or log patterns
Annotations: Contextual information about events like deployments or incidents

Monitoring layers

The C3 Agentic AI Platform’s monitoring capabilities span multiple layers:

Infrastructure monitoring

Infrastructure monitoring focuses on the health and performance of the underlying hardware and software:

Kubernetes monitoring: Pod status, resource usage, and cluster health
Node monitoring: CPU, memory, disk, and network metrics for each server
Database monitoring: Query performance, connection counts, and storage usage
Network monitoring: Latency, throughput, and error rates

Application monitoring

Application monitoring tracks the behavior and performance of your C3 AI applications:

API monitoring: Request rates, error rates, and latencies for each endpoint
Service monitoring: Health and performance of individual microservices
Dependency monitoring: Interactions with external systems and services
Error tracking: Exception rates, stack traces, and error patterns

Business monitoring

Business monitoring focuses on domain-specific metrics that reflect the value your application provides:

User activity: Active users, session duration, and feature usage
Transaction metrics: Volume, value, and success rates of business transactions
Data quality: Completeness, accuracy, and timeliness of data
ML model performance: Prediction accuracy, drift, and resource usage

Log analysis

Log analysis helps you understand what’s happening in your application and diagnose issues:

Log collection

Logs are collected from various sources and centralized in OpenSearch for analysis.

Log querying

OpenSearch Dashboards (formerly Kibana) provides a powerful interface for querying and analyzing logs.

Log visualization

OpenSearch Dashboards provides visualizations for log data to help identify patterns and trends.

Alerting

Alerting notifies you when something goes wrong or requires attention:

Alert rules

Alert rules define conditions that trigger notifications based on thresholds and patterns.

Alert notifications

Alert notifications are sent through various channels such as email and Slack.

Best practices

Here are some best practices for observability and monitoring in the C3 Agentic AI Platform:

Metric collection

Use meaningful metric names with consistent naming conventions
Add relevant labels to provide context
Choose appropriate metric types for different use cases
Balance granularity and volume to avoid overwhelming storage

Log management

Use structured logging with consistent formats
Log at appropriate levels based on severity
Include context to correlate related events
Manage log volume with rotation and retention policies

Dashboard design

Start with overview dashboards and provide drill-down capabilities
Group related metrics for easier analysis
Use consistent time ranges across panels
Add context with documentation and runbooks

Alerting strategy

Alert on symptoms that impact users, not internal causes
Set appropriate thresholds to balance sensitivity and specificity
Define clear ownership for each alert
Include actionable information for remediation

Kubernetes Platform

Learn about the infrastructure that powers the C3 Agentic AI Platform.

Grafana Dashboards

Explore pre-built dashboards for monitoring the C3 Agentic AI Platform.

OpenSearch Dashboard

Understand how to analyze logs and troubleshoot issues.

Get Started

Develop

Test & Deploy

Operate

What is observability and monitoring?

Monitoring architecture

Data collection

Data storage

Data processing

Visualization and alerting

Core monitoring components

Prometheus for metrics

OpenSearch for logs

Grafana for visualization

Monitoring layers

Infrastructure monitoring

Application monitoring

Business monitoring

Log analysis

Log collection

Log querying

Log visualization

Alerting

Alert rules

Alert notifications

Best practices

Metric collection

Log management

Dashboard design

Alerting strategy

Kubernetes Platform

Grafana Dashboards

OpenSearch Dashboard

Get Started

Develop

Test & Deploy

Operate

​What is observability and monitoring?

​Monitoring architecture

Data collection

Data storage

Data processing

Visualization and alerting

​Core monitoring components

​Prometheus for metrics

​OpenSearch for logs

​Grafana for visualization

​Monitoring layers

​Infrastructure monitoring

​Application monitoring

​Business monitoring

​Log analysis

​Log collection

​Log querying

​Log visualization

​Alerting

​Alert rules

​Alert notifications

​Best practices

​Metric collection

​Log management

​Dashboard design

​Alerting strategy

​Related concepts

Kubernetes Platform

Grafana Dashboards

OpenSearch Dashboard

What is observability and monitoring?

Monitoring architecture

Core monitoring components

Prometheus for metrics

OpenSearch for logs

Grafana for visualization

Monitoring layers

Infrastructure monitoring

Application monitoring

Business monitoring

Log analysis

Log collection

Log querying

Log visualization

Alerting

Alert rules

Alert notifications

Best practices

Metric collection

Log management

Dashboard design

Alerting strategy

Related concepts