Deployment Guide

Staque IO supports deploying AI models across multiple platforms. Each platform has its own characteristics, pricing model, and use cases.

Deployment Platforms

Deployment Workflow

1. Browse Available Models
   └─ GET /api/models/{platform}

2. Get AI Recommendation (Optional)
   └─ POST /api/ai/recommendations

3. Create Conversation & Deploy Resource
   └─ POST /api/conversations
       ├─ Creates conversation
       ├─ Deploys resource
       └─ Creates model configuration

4. Configure System Prompt (Optional)
   └─ POST /api/bedrock/system-prompt

5. Start Chatting
   └─ POST /api/chat/thread

Platform Comparison

FeatureAWS BedrockAWS SageMakerNVIDIA NIM
Setup TimeInstant5-15 minutesInstant
Pricing ModelPer tokenPer hourPer token
Idle Cost$0Instance cost$0
CustomizationSystem prompt onlyFull model controlSystem prompt only
ScalabilityAuto-scaledManual scalingAuto-scaled
VPC SupportNoYesNo
Best ForQuick start, foundation modelsCustom models, VPC requirementsMulti-cloud, specific NVIDIA models

General Deployment Steps

Step 1: Choose Your Model

Browse available models through the UI or API. Consider factors like:

  • Use Case: Text generation, code, embeddings, etc.
  • Performance: Response time, throughput requirements
  • Cost: Token pricing vs instance pricing
  • Compliance: Data residency, encryption requirements

Step 2: Configure Deployment

Provide deployment configuration:

{
  "title": "Production AI Assistant",
  "use_case": "customer-support",
  "deployed_resource": {
    "resource_name": "Support Bot",
    "resource_type": "bedrock",  // or "sagemaker", "nvidia-nim"
    "aws_resource_id": "amazon.nova-pro-v1:0",
    "region": "eu-north-1",
    "estimated_hourly_cost": 0
  }
}

Step 3: Set System Prompt

Configure the model's behavior with a custom system prompt:

POST /api/bedrock/system-prompt

{
  "modelId": "amazon.nova-pro-v1:0",
  "systemPrompt": "You are a helpful customer support assistant..."
}

Step 4: Test and Monitor

After deployment:

  • Test the model with sample queries
  • Monitor response times and error rates
  • Track token usage and costs
  • Adjust configuration as needed

Environment Variables

Required environment variables for deployment:

# AWS Configuration (Required for Bedrock & SageMaker)
STAQUE_AWS_REGION=eu-north-1
STAQUE_AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
STAQUE_AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

# SageMaker Specific (Required for SageMaker deployments)
SAGEMAKER_SUBNET_IDS=subnet-12345,subnet-67890
SAGEMAKER_SECURITY_GROUP_IDS=sg-12345678
SAGEMAKER_EXECUTION_ROLE_ARN=arn:aws:iam::123456789012:role/SageMakerRole

# NVIDIA NIM (Required for NVIDIA NIM deployments)
NVIDIA_API_KEY=nvapi-xxxxxxxxxxxxx
NIM_BASE_URL=https://integrate.api.nvidia.com

Cost Optimization

Bedrock & NIM (Token-Based)

  • No idle costs - only pay for what you use
  • Optimize prompts to reduce token usage
  • Use cheaper models for simple tasks
  • Implement caching for repeated queries

SageMaker (Instance-Based)

  • Right-size instances based on actual load
  • Use auto-scaling to match demand
  • Delete endpoints when not in use
  • Consider Spot instances for dev/test
  • Use smaller instances for low-traffic models

Security Best Practices

  • AWS Credentials: Use IAM roles with minimal required permissions
  • API Keys: Never commit keys to version control
  • VPC: Use VPC isolation for SageMaker endpoints handling sensitive data
  • Encryption: Enable encryption at rest and in transit
  • Access Control: Implement role-based access in your application
  • Monitoring: Set up CloudWatch alarms for unusual activity

Troubleshooting

Common Issues

Error: Model not found

Solution: Verify model ID is correct and available in your region. Some models are region-specific.

Error: Insufficient permissions

Solution: Check IAM permissions for your AWS credentials. Ensure bedrock:InvokeModel or sagemaker:InvokeEndpoint is granted.

SageMaker endpoint stuck in "Creating"

Solution: Check VPC configuration (subnets, security groups). Ensure execution role has required permissions. Typical creation time is 5-10 minutes.

Next Steps