This content is currently WIP. Diagrams, content, and structure are subject to change.

title: “GenAI Data Model Overview” description: “Overview of the C3 AI GenAI application data model in version 8.8”

The C3 AI GenAI application uses a sophisticated data model to represent documents, embeddings, prompts, and other key components of generative AI systems. This data model enables the application to perform advanced retrieval, generate contextually relevant responses, and provide powerful AI capabilities to enterprise applications.

Core Data Types

The GenAI data model consists of the following core data types:

Document

The Document type represents text-based content that can be processed, embedded, and retrieved by the GenAI system.
type Document {
  id: String
  title: String
  content: String
  metadata: DocumentMetadata
  source: String
  author: String
  createdAt: DateTime
  updatedAt: DateTime
  version: String
  status: DocumentStatus
  embeddings: [Embedding]
  chunks: [DocumentChunk]
  tags: [String]
  accessControl: AccessControlList
}

DocumentChunk

The DocumentChunk type represents segments of a document that are processed and embedded individually for more precise retrieval.
type DocumentChunk {
  id: String
  document: Document
  content: String
  index: Integer
  embedding: Embedding
  metadata: ChunkMetadata
}

Embedding

The Embedding type represents vector representations of text that capture semantic meaning.
type Embedding {
  id: String
  vector: [Float]
  model: EmbeddingModel
  dimension: Integer
  document: Document
  documentChunk: DocumentChunk
  createdAt: DateTime
}

PromptTemplate

The PromptTemplate type represents reusable templates for generating prompts for LLMs.
type PromptTemplate {
  id: String
  name: String
  description: String
  template: String
  variables: [PromptVariable]
  version: String
  createdBy: User
  createdAt: DateTime
  updatedAt: DateTime
  tags: [String]
  category: PromptCategory
  usageCount: Integer
  isPublic: Boolean
}

ChatSession

The ChatSession type represents a conversation between a user and the GenAI system.
type ChatSession {
  id: String
  user: User
  title: String
  createdAt: DateTime
  updatedAt: DateTime
  messages: [ChatMessage]
  context: SessionContext
  status: SessionStatus
  metadata: SessionMetadata
}

ChatMessage

The ChatMessage type represents individual messages in a chat session.
type ChatMessage {
  id: String
  session: ChatSession
  role: MessageRole  // user, assistant, system
  content: String
  timestamp: DateTime
  metadata: MessageMetadata
  references: [DocumentReference]
  feedback: MessageFeedback
}

LLMModel

The LLMModel type represents a large language model configuration.
type LLMModel {
  id: String
  name: String
  provider: String
  version: String
  capabilities: [ModelCapability]
  parameters: ModelParameters
  apiEndpoint: String
  maxTokens: Integer
  costPerToken: Float
  status: ModelStatus
}

Relationships

The following diagram illustrates the key relationships in the GenAI data model:
Document
 ├── has many DocumentChunks
 │    └── has one Embedding
 ├── has many Embeddings
 └── referenced by ChatMessages

PromptTemplate
 └── used in ChatSessions

ChatSession
 └── has many ChatMessages
     └── references Documents

Data Flow

Data in the GenAI application flows through the following stages:
  1. Document Ingestion: Documents are uploaded, processed, and chunked
  2. Embedding Generation: Document chunks are converted to vector embeddings
  3. Storage: Documents and embeddings are stored in appropriate databases
  4. Retrieval: Relevant documents are retrieved based on user queries
  5. Generation: LLMs generate responses using retrieved context and prompt templates
  6. Feedback: User feedback is collected to improve system performance

Version 8.8 Enhancements

Version 8.8 of the GenAI data model includes the following enhancements:
  • Multi-modal support for processing and generating text, images, and structured data
  • Enhanced embedding models with higher dimensionality and accuracy
  • Improved prompt management with version control and A/B testing capabilities
  • Advanced retrieval mechanisms with hybrid search (keyword + semantic)
  • Comprehensive feedback collection for continuous improvement
  • Fine-grained access controls for enterprise security requirements