Documentation - KDCube Agentic Prototyping Platform

Introduction

KDCube APP is a self-hosted platform + SDK for building and operating agentic chat applications and copilots. It ships the full stack: streaming runtime, tool execution, memory/context, economics, and a hosting platform for multi-tenant production deployment.

What You Get Out of the Box

Streaming Chat

REST, SSE, and Socket.IO with steps/deltas/status events and role-based streaming filtering

Tool Execution

Local tools, MCP tools, and skills (built-in and custom) with easy wiring

Code Execution

Isolated Docker runtime for untrusted code generation and execution

ReAct Strategic Solver

Skills acquisition and exploitation, adaptive agent selection, planning, and tool-first/code-first flows

Web Search

Multi-backend search agent with Brave and DuckDuckGo integration

Knowledge Base

Ingestion, hybrid search, and citations with pgvector integration

Memory & Context

Retrieval, turn memories, conversation memories, user-level memories, signals, and reconciliation

Economics & Accounting

Budgets, rate limits, usage reporting, and cost tracking per tenant/project/user

Framework-Agnostic: Build your workflows in LangGraph, LangChain, CrewAI, AutoGen, or custom Python. KDCube provides the runtime infrastructure to host, scale, and govern them.

Architecture & Components

KDCube APP consists of two main layers: the SDK for building agent applications and the Platform for hosting and scaling them in production.

System Architecture

graph TD %% Entry / Auth UI[Web UI / Client] -->|HTTPS + masked cookie| NGINX[Web Proxy / Nginx] AUTH["ProxyLogin (Delegated Auth + 2FA)"] -->|token exchange| NGINX NGINX -->|real auth/id cookies| GATE[Chat API + Gateway] KB[Knowledge Base Service] --> GATE CP[Control Plane / Project Mgmt] --> GATE %% Transport + Gateway NGINX -->|SSE / Socket.IO| GATE NGINX -->|REST| GATE GATE -->|session mgmt| SESS[Session Manager] GATE -->|rate limit/backpressure| GW[Gateway + Throttling] %% Queue + Processing GATE -->|enqueue| Q[Redis Queues] Q --> PROC[Chat Processor Workers] %% Orchestration PROC --> BUNDLES[Dynamic Bundles / Workflows] BUNDLES -->|events| RELAY[ChatRelay + Redis Pub/Sub] RELAY -->|fan-out| GATE %% Context management BUNDLES --> CTX[Context Management] CTX -->|storage| PG["(Postgres RDS)"] CTX -->|artifacts| S3[(S3)] KB -->|storage| PG KB -->|artifacts| S3 CP -->|policies + quotas| PG %% Runtime + providers BUNDLES --> RT["Runtime (LLM + Tools)"] RT --> DOCKER[Ephemeral Docker Exec] RT --> TOOLS[External Tools / APIs] subgraph EXTPROV[External Providers] OAI[OpenAI] ANTH[Anthropic] GEM[Gemini] BRAVE[Brave Search] DDG[DuckDuckGo] end RT --> OAI RT --> ANTH RT --> GEM RT --> BRAVE RT --> DDG %% Cache/Queues/PubSub BUNDLES -->|cache/queues/pubsub| REDIS["(Redis / ElastiCache)"] classDef aws fill:#e8f4ff,stroke:#7aa7d6,color:#0b2b4f; classDef ext fill:#f2f7ee,stroke:#8fbf7a,color:#1f3b1c; classDef infra fill:#f7f2ff,stroke:#b69ad6,color:#2b1b4f; class PG,REDIS,S3 aws; class OAI,ANTH,GEM,BRAVE,DDG,TOOLS ext; class AUTH infra;

SDK Components (Build)

Agent Runtime

ReAct agent patterns
Planning & orchestration
Tool-first and code-first flows
Adaptive agent selection

Streaming Channels

REST + SSE + Socket.IO
Token/step events
Role-based filtering
Redis Pub/Sub relay

Tools & Skills

Local tools integration
MCP tools support
Built-in and custom skills
Easy custom wiring

Memory & Context

Turn memories
Conversation memories
User-level memories
Signals and reconciliation

Code Execution

Isolated Docker runtime
Untrusted code support
Resource limits
Security sandboxing

Attachments & Artifacts

File upload handling
Generated files
Security checks
Storage integration (S3)

Economics & Accounting

Usage tracking
Budget enforcement
Rate limits
Cost reporting

Bundle API

LangGraph workflows
LangChain integration
Custom Python apps
Dynamic registration

Platform Components (Host & Scale)

Multi-Tenant Isolation

Per-tenant/project schemas
Storage segregation
Namespace separation
Policy enforcement

Gateway

Authentication (Cognito/SimpleIDP)
Rate limiting
Backpressure control
Circuit breakers

Knowledge Base

Document ingestion
pgvector embeddings
Hybrid search
Citation tracking

Dynamic UI Widgets

Monitoring dashboards
Control plane UI
Conversation browser
Spending reports

Horizontal Scaling

Stateless web service
Queue/processor model
Redis relay fan-out
Load balancing

Storage Layer

Postgres (RDS)
Redis (ElastiCache)
S3 (artifacts)
Neo4j (optional)

Bundles: Deployable agent apps. Multiple bundles can be registered and selected per message. One bundle executes per request; different requests can target different bundles.

Key Features

Streaming Runtime

KDCube supports three client transports for maximum flexibility:

SSE (Server-Sent Events): Primary streaming transport, default in current UI
Socket.IO: Fully supported alternative with bidirectional communication
REST: Non-streaming endpoints for profile, admin, and monitoring operations

Streaming Flow

sequenceDiagram participant UI as Client UI participant API as Chat API participant RL as Redis Relay participant Q as Redis Queue participant W as Worker / Bundle UI->>API: open stream (SSE / Socket.IO connect) UI->>API: send message (SSE / Socket.IO) API->>Q: enqueue task (per user_type queue) W->>Q: dequeue + lock W->>RL: publish chat_* events to session channel RL-->>API: fan-out to connected stream API-->>UI: chat_start/step/delta/complete

Multi-Tenancy & Storage

Production-grade isolation for enterprise deployments:

Postgres: Per-tenant and per-project schemas (prod/dev separated) with control_plane schema for policies and quotas
S3: Bucket per tenant/project or shared bucket with prefix segmentation for artifacts and documents
Redis: Cache, messaging (Pub/Sub), and rate-limit counters
Neo4j: Optional graph database (currently disabled)

Economics & Cost Management

Enterprise-grade cost control and reporting:

Gateway rate limiting: Protect system resources with configurable limits
Economics rate limiting: Tier policies, per-user quotas, and concurrency locks
Budget enforcement: Per-tenant, per-project, and per-pipeline budget controls
Usage tracking: Token-level granularity with real-time dashboards
Cost reporting: Comprehensive spending analysis and aggregations

Security & Sandboxing

Mission-critical security features:

Delegated authentication: ProxyLogin with hosted 2FA support
Cookie-based auth: Secure token exchange with masked cookies
Docker isolation: Ephemeral containers for code execution
Network isolation: Tool calls proxied through supervisor
Resource limits: CPU, memory, and execution time constraints
Audit trails: Comprehensive logging for compliance

Knowledge Base & RAG

Production-ready retrieval-augmented generation:

Document ingestion: Support for multiple file formats (PDF, DOCX, TXT, etc.)
Hybrid search: Combine vector similarity with keyword search
pgvector integration: Scalable vector database on Postgres
Citation tracking: Structured citations with source tracking and in-stream rendering
Reranking: Improve retrieval quality with reranking models
Multi-tenant: Isolated knowledge bases per tenant/project

Getting Started

Get up and running with KDCube APP in minutes using one of these quickstart options:

Option 1: CLI Installer (Recommended)

The fastest way to get started with guided installation:

# Install the CLI
pip install kdcube-ai-app

# Run the interactive installer
kdcube-install

# Follow the prompts to configure:
# - Database connection (Postgres)
# - Redis connection
# - S3 storage (optional)
# - LLM API keys
# - Authentication provider

The CLI installer handles:

Database schema deployment
Environment configuration
Service initialization
Health checks and validation

Option 2: Docker Compose

Run the complete stack locally with Docker:

# Clone the repository
git clone https://github.com/elenaviter/kdcube-ai-app.git
cd kdcube-ai-app

# Copy environment template
cp .env.example .env

# Edit .env with your configuration
# - Set database credentials
# - Add LLM API keys (OpenAI, Anthropic, etc.)
# - Configure storage settings

# Start all services
docker-compose -f app/ai-app/deployment/docker/all_in_one/docker-compose.yml up -d

# View logs
docker-compose logs -f

# Access the UI at http://localhost:8080

Initial Configuration

After installation, configure the platform:

Set up LLM providers: Add API keys for OpenAI, Anthropic, or Gemini
Configure authentication: Set up Cognito or use SimpleIDP for development
Create first project: Use the CLI or admin API to create a tenant/project
Upload knowledge: Ingest documents into the knowledge base
Register bundles: Deploy your first agent workflow

Development vs. Production: Use local Postgres and Redis for development. For production, use managed services (RDS, ElastiCache, S3) for better reliability and scaling.

Deployment Options

KDCube APP supports multiple deployment models to fit your infrastructure requirements:

Docker Compose (Development)

Perfect for local development and testing:

All services run in containers
Local Postgres and Redis
File system storage
SimpleIDP authentication
Ideal for: Development, testing, demos

Kubernetes (Production)

Enterprise-grade production deployment:

Horizontal scaling of workers and API
High availability and fault tolerance
Health checks and auto-healing
Rolling updates with zero downtime
Ideal for: Production, multi-tenant SaaS

AWS Deployment

Fully managed AWS infrastructure:

Compute: ECS Fargate for stateless services
Database: RDS Postgres with Multi-AZ
Cache: ElastiCache Redis
Storage: S3 for artifacts
Auth: Cognito for user management
Networking: VPC with private subnets
Ideal for: Enterprise production, regulated industries

Storage Configuration

Component	Development	Production
Database	Local Postgres	RDS Postgres (Multi-AZ)
Cache	Local Redis	ElastiCache Redis (Cluster mode)
Storage	File system	S3 (versioning enabled)
Auth	SimpleIDP	Cognito with MFA
Graph DB	Optional (disabled)	Neo4j (optional)

Scaling Considerations

Design for horizontal scaling:

Stateless API: Scale web tier independently based on traffic
Worker pools: Scale processor workers based on queue depth
Redis relay: Fan-out pattern supports distributed connections
Database: Read replicas for read-heavy workloads
Storage: S3 automatically scales with demand

Agent Definitions

KDCube includes reference implementations demonstrating platform capabilities. These agents can be forked and customized for your use cases:

Conversation Mapping

Analyze and visualize conversation flows with turn-by-turn mapping and pattern detection.

Error Tracking

Monitor, analyze, and report system errors with automatic categorization and alerting.

Identifying Focus Areas

Extract key topics and focus areas from conversations using semantic analysis.

Benchmark Builder

Create and manage evaluation benchmarks for model performance testing.

Model Benchmarking

Compare LLM performance across metrics with automated testing and reporting.

LLM Distillation

Train smaller models from larger ones using knowledge distillation techniques.

Continuous Refinement

Iteratively improve agent responses using feedback loops and learning.

Knowledge Analyst

Enterprise RAG agent with hybrid search, citations, and audit trails.

Code Execution

Execute Python code in sandboxed Docker containers with resource limits.

Web Research

Multi-backend web search with Brave and DuckDuckGo integration.

Strategic Solver

ReAct agent with planning, tool selection, and adaptive reasoning.

Document Processing

Extract, analyze, and process documents with multi-format support.

Marketing Writer

Generate marketing content with brand compliance and output validation.

Customization: All agent implementations are open source. Fork them to create specialized agents for your domain, or use them as templates for building new agents from scratch.

API & Integration

Client Transports

Connect to KDCube using multiple transport protocols:

SSE (Server-Sent Events)

// Connect to SSE stream
const eventSource = new EventSource('/sse/stream?session_id=xxx');

eventSource.addEventListener('chat_start', (e) => {
    const data = JSON.parse(e.data);
    console.log('Chat started:', data);
});

eventSource.addEventListener('chat_delta', (e) => {
    const data = JSON.parse(e.data);
    console.log('Token:', data.content);
});

eventSource.addEventListener('chat_complete', (e) => {
    const data = JSON.parse(e.data);
    console.log('Chat completed:', data);
});

Socket.IO

// Connect to Socket.IO
const socket = io('http://localhost:8080', {
    auth: { token: 'your-auth-token' }
});

socket.on('connect', () => {
    console.log('Connected');
});

socket.emit('chat_message', {
    content: 'Hello!',
    bundle_id: 'my-agent'
});

socket.on('chat_start', (data) => {
    console.log('Chat started:', data);
});

socket.on('chat_delta', (data) => {
    console.log('Token:', data.content);
});

socket.on('chat_complete', (data) => {
    console.log('Chat completed:', data);
});

REST API

// Send chat message via REST
fetch('/api/chat/send', {
    method: 'POST',
    headers: {
        'Content-Type': 'application/json',
        'Authorization': 'Bearer your-token'
    },
    body: JSON.stringify({
        content: 'Hello!',
        bundle_id: 'my-agent',
        project_id: 'my-project'
    })
})
.then(response => response.json())
.then(data => console.log(data));

Authentication

KDCube supports multiple authentication methods:

Header-based: Authorization: Bearer <token>
Cookie-based: Secure cookies with infra exchange
SSE query params: ?token=<token> for compatibility
Socket.IO auth: Auth payload during handshake

Bundle API

Register custom agent workflows as bundles:

from kdcube_ai_app.infra.plugin import Bundle

class MyAgentBundle(Bundle):
    def __init__(self):
        super().__init__(
            bundle_id="my-agent",
            name="My Custom Agent",
            version="1.0.0"
        )
    
    async def process(self, message, context):
        # Your agent logic here
        response = await self.llm.generate(message)
        
        # Emit streaming events
        await self.emit('chat_delta', {'content': response})
        
        return {'response': response}

# Register the bundle
registry.register(MyAgentBundle())

Security & Governance

Multi-Tenant Isolation

KDCube provides enterprise-grade isolation:

Schema-level isolation: Each tenant/project gets a dedicated Postgres schema
Storage segmentation: S3 buckets or prefixes per tenant
Namespace separation: Redis keys namespaced by tenant/project
Policy enforcement: Control plane manages cross-tenant policies

Authentication Model

Secure, delegated authentication with hosted 2FA:

Delegated auth: ProxyLogin service handles authentication flow
Hosted 2FA: Built-in two-factor authentication UI
Cookie exchange: Secure token exchange with masked cookies
Session management: Automatic session resolution and upgrade
Provider support: Cognito (production), SimpleIDP (development)

Gateway & Rate Limiting

Protect your infrastructure with multiple layers of control:

Gateway Layer

Request rate limiting: Configurable limits per endpoint
Backpressure: Queue capacity monitoring and rejection
Circuit breakers: Automatic failure detection and recovery
Input validation: Message and attachment size limits

Economics Layer

Tier policies: Different limits per user tier
Per-user quotas: Individual usage caps
Concurrency locks: Prevent oversubscription
Budget enforcement: Hard caps on spending

Code Execution Sandbox

Mission-critical isolation for untrusted code:

Docker containers: Ephemeral, isolated execution environment
Network isolation: No direct external network access
Tool proxy: All tool calls routed through supervisor
Resource limits: CPU, memory, and execution time constraints
Privilege separation: Non-root execution
Data exfiltration prevention: Controlled output channels

Audit & Compliance

Comprehensive logging for regulatory compliance:

Complete execution lineage: Track every step of agent execution
Usage logging: All API calls and tool executions logged
Cost tracking: Per-tenant, per-user spending records
Access logs: Authentication and authorization events
Data provenance: Source tracking for all generated content

Advanced Topics

Dynamic Bundles

Bundles are runtime-loadable workflows with custom logic:

Hot reload: Update agents without restarting services
Custom endpoints: Expose bundle-specific APIs
Storage integration: Access knowledge base and context
Event emission: Stream results via ChatCommunicator
Framework support: LangGraph, LangChain, or custom Python

Context Management

Sophisticated context reconciliation:

Turn-ordered memories: Sequential conversation context
User preferences: Cross-conversation user settings
Artifact tracking: Files and generated content
Signal extraction: Automatic pattern detection
Context reconciliation: Merge and deduplicate context

Horizontal Scaling

Scale to thousands of concurrent users:

Stateless API: Scale web tier horizontally
Worker pools: Independent scaling of processor workers
Queue-based: Fair scheduling across user types
Redis relay: Session-scoped pub/sub for efficient fan-out
Load balancing: Distribute requests across instances

Observability & Monitoring

Production-grade monitoring and debugging:

Health checks: Service health and heartbeat monitoring
Metrics collection: Request rates, latencies, error rates
Usage dashboards: Real-time spending and usage visualization
Conversation browser: Search and replay conversations
Performance profiling: Identify bottlenecks and optimize

Documentation Resources

Comprehensive documentation for developers and operators:

Platform Documentation

System Architecture

Architecture Overview →

Gateway & Policy

Gateway Documentation →

Economics System

Economics Guide →

Knowledge Base

KB Documentation →

SDK & Development

SDK Index

SDK Overview →

AI Bundle SDK

Bundle Development →

Tool Subsystem

Tools & Runtime →

Streaming System

Communication Guide →

Deployment Guides

Docker Compose

All-in-One Setup →

CLI Installer

CLI Guide →

Database Setup

Schema Deployment →

Monitoring

Observability Setup →

Examples & Tutorials

Example Bundles

Bundle Examples →

Streaming Examples

Code Examples →

Skills Registry

Skills Guide →

Load Testing

Testing Guide →

Community Support: Join our GitHub discussions for questions, feature requests, and community contributions. For enterprise support including SLA guarantees and custom development, contact us at info@kdcube.tech.

Ready to Get Started?

Deploy your first agent in under 30 minutes

Start with GitHub Contact Sales