Chat System
Staque IO features a sophisticated conversation management system with thread-based chat, usage tracking, and cost monitoring.
Architecture
Conversation (Top-level container)
└─ Conversation Threads (Individual chat threads)
└─ Messages (User & assistant messages with metadata)
├─ Message content
├─ Timestamp
├─ Token usage (input, output, total)
├─ Cost data
└─ Model response timeData Models
Conversation
A conversation is the top-level container that groups related chat threads together.
{
id: "uuid",
user_id: "uuid",
title: "Customer Support Bot",
use_case: "customer-support",
status: "deployed",
created_at: "2024-01-01T00:00:00Z",
updated_at: "2024-01-10T12:00:00Z"
}Conversation Thread
A thread contains the actual conversation history and usage statistics.
{
id: "uuid",
conversation_id: "uuid",
user_id: "uuid",
model_id: "amazon.nova-pro-v1:0",
thread_data: {
messages: [
{
role: "user",
content: "Hello!",
timestamp: "2024-01-10T12:00:00Z",
metadata: { username: "john" }
},
{
role: "assistant",
content: "Hello! How can I help you?",
timestamp: "2024-01-10T12:00:01Z",
metadata: {
model_id: "amazon.nova-pro-v1:0",
model_response_time: 0.823,
tokens_used: 42
},
tokens_in: 15,
tokens_out: 27,
tokens_total: 42,
latency_ms: 823,
model_id: "amazon.nova-pro-v1:0"
}
],
resource_id: "uuid",
model_id: "amazon.nova-pro-v1:0"
},
tokens_input_total: 150,
tokens_output_total: 380,
tokens_total: 530,
requests_count: 10,
cost_total_usd: 0.0042,
last_usage_at: "2024-01-10T12:00:00Z",
created_at: "2024-01-10T10:00:00Z",
updated_at: "2024-01-10T12:00:00Z"
}Chat Flow
Creating a New Thread
POST /api/chat/thread
{
"message": "Hello! How can you help me?",
"conversationId": "conversation-uuid",
"resourceId": "resource-uuid"
// threadId omitted = new thread
}
Response:
{
"success": true,
"threadId": "thread-uuid",
"messages": [
{ role: "user", content: "Hello! How can you help me?", ... },
{ role: "assistant", content: "Hello! I can assist with...", ... }
]
}Continuing a Thread
POST /api/chat/thread
{
"message": "Tell me more about pricing",
"conversationId": "conversation-uuid",
"resourceId": "resource-uuid",
"threadId": "thread-uuid" // Include existing threadId
}
Response:
{
"success": true,
"threadId": "thread-uuid",
"messages": [
// ... previous messages ...
{ role: "user", content: "Tell me more about pricing", ... },
{ role: "assistant", content: "Our pricing is...", ... }
]
}Retrieving Thread History
GET /api/chat/thread?threadId=thread-uuid&conversationId=conversation-uuid
Response:
{
"success": true,
"threadId": "thread-uuid",
"messages": [
{ role: "user", content: "...", timestamp: "..." },
{ role: "assistant", content: "...", timestamp: "..." }
]
}Model Integration
AWS Bedrock Models
Bedrock models are invoked with conversation context and system prompts:
// For Amazon Nova models
{
messages: [{
role: "user",
content: [{
text: "System: You are a helpful assistant...\n\nConversation history:\nuser: Hello\nassistant: Hi there!\nuser: How are you?"
}]
}],
inferenceConfig: {
max_new_tokens: 1000,
temperature: 0.1
}
}
// For Anthropic Claude models
{
messages: [{
role: "user",
content: "System: You are a helpful assistant...\n\nConversation history:\nuser: Hello\nassistant: Hi there!\nuser: How are you?"
}],
max_tokens: 4000,
temperature: 0.7,
anthropic_version: "bedrock-2023-05-31"
}NVIDIA NIM Models
NIM models use the OpenAI-compatible chat completions format:
{
model: "mistralai/mistral-7b-instruct-v0.3",
messages: [
{
role: "system",
content: "You are a helpful assistant..."
},
{
role: "user",
content: "Hello"
},
{
role: "assistant",
content: "Hi there!"
},
{
role: "user",
content: "How are you?"
}
],
temperature: 0.1,
max_tokens: 1000
}System Prompts
System prompts guide the model's behavior and can be customized per model:
Setting a System Prompt
POST /api/bedrock/system-prompt
{
"modelId": "amazon.nova-pro-v1:0",
"systemPrompt": "You are a customer support specialist. Be helpful, concise, and professional. Always provide actionable solutions."
}
Response:
{
"success": true,
"message": "System prompt updated successfully"
}Retrieving a System Prompt
GET /api/bedrock/system-prompt?modelId=amazon.nova-pro-v1:0
Response:
{
"success": true,
"systemPrompt": "You are a customer support specialist..."
}Usage Tracking
The system automatically tracks token usage and costs for each interaction:
Per-Message Tracking
- tokens_in: Input tokens consumed
- tokens_out: Output tokens generated
- tokens_total: Total tokens (in + out)
- latency_ms: Model response time in milliseconds
- cost: Calculated cost based on token pricing
Per-Thread Aggregates
- tokens_input_total: Sum of all input tokens
- tokens_output_total: Sum of all output tokens
- tokens_total: Sum of all tokens
- requests_count: Number of requests made
- cost_total_usd: Total cost in USD
Querying Usage
GET /api/usage/current?conversationId=conv-uuid
Response:
{
"success": true,
"mtd": {
"cost_usd": 12.45,
"tokens_in": 125000,
"tokens_out": 187500,
"tokens_total": 312500,
"requests": 850
},
"last24h": {
"cost_usd": 2.34,
"tokens_in": 15000,
"tokens_out": 22500,
"tokens_total": 37500,
"requests": 102
}
}Cost Calculation
Costs are calculated based on model-specific token pricing:
// Example calculation const inputTokens = 150 const outputTokens = 280 const inputPricing = 0.0008 // per 1K tokens const outputPricing = 0.0032 // per 1K tokens const cost = ( (inputTokens / 1000) * inputPricing + (outputTokens / 1000) * outputPricing ) // cost = 0.00012 + 0.000896 = 0.001016 USD
Message Metadata
Each message can include additional metadata:
{
role: "assistant",
content: "Here's your answer...",
timestamp: "2024-01-10T12:00:00Z",
metadata: {
model_id: "amazon.nova-pro-v1:0",
model_response_time: 0.823,
tokens_used: 42,
username: "john"
},
tokens_in: 15,
tokens_out: 27,
tokens_total: 42,
latency_ms: 823,
model_id: "amazon.nova-pro-v1:0"
}Legacy Chat System
The platform previously used a chat_messages table for storing messages. This has been replaced by the thread-based system stored in conversation_threads. A legacy endpoint is still available for backwards compatibility:
GET /api/chat/legacy?conversationId=conv-uuid // Returns messages from old chat_messages table
Best Practices
- Thread Management: Create new threads for distinct conversations
- Context Length: Monitor conversation length to avoid token limits
- System Prompts: Use specific, clear system prompts for better results
- Cost Monitoring: Regularly check usage statistics to manage costs
- Error Handling: Implement retry logic for transient failures
- Rate Limiting: Implement client-side throttling if needed
⚠️ Token Limits
Different models have different context window limits:
- Amazon Nova Pro: ~300K tokens
- Claude 3 Sonnet: ~200K tokens
- Most NVIDIA models: ~32K tokens
Implement conversation summarization or truncation for long conversations.