Prompt Serve — Enterprise AI Platform Data Sheet · version 2026.05
This document summarises the Prompt Serve platform characteristics for reference in the Terms of Reference (TOR), proposal, statement of work (SOW), and compliance sheets for Thai government agencies, state enterprises, and large organisations.
Product name (canonical) Prompt Serve Category Enterprise AI Platform (on-premise / private cloud) Deployment Docker-based, Linux x86_64 Architecture Microservices, container-orchestrated Licence Commercial Licence — Group of companies Languages supported Thai, English (UI + LLM responses) Standards compliance PDPA, OWASP Top 10, ISO/IEC 27001 alignment Audit log retention Minimum 90 days (configurable up to 7 years)
1. Technical Specifications 1.1 Component Matrix Component Technology Port Database Function Portal (Frontend) React 18 + Vite + TypeScript + Tailwind 3000 - UI for admins and users - manage channels, agents, workflows, and KB. Supports TH/EN and light/dark themes. Portal Service (Backend) Kotlin + Spring Boot 3.5 WebFlux 8080 PostgreSQL REST/SSE API, OAuth2 proxy, user/role management, BFF to AI Gateway. AI Gateway Go 1.25 + eino framework 4000 / 4001 SQLite 3 LLM proxy, agent runtime, workflow engine, RAG and OCR pipeline. Agent Runtime Go + eino (internal) - Agentic execution with tool calling and multi-step reasoning. PyWorker Python 3.11 + FastAPI (internal) - Document processing and OCR orchestration. Hermes Sidecar Python 3.11 + FastAPI (internal) SQLite 3 Self-learning skills, memory, and scheduler. Documentation Portal Next.js 15 + Fumadocs 3012 - User manual, API spec, and data sheet for internal teams and partners.
1.2 External Stack Dependencies Service Version Function Self-hosted Keycloak 24.x Identity provider, OIDC/SAML, user federation, SSO. Yes PostgreSQL 16.x Portal Service primary database. Yes OpenSearch 2.13+ RAG (vector + full-text), traces, usage logs, audit events. Yes MinIO Latest stable S3-compatible object storage for documents, uploads, fixtures. Yes Presidio Analyzer Latest stable PII detection (Microsoft OSS, port 5001). Yes Caddy 2.8.x TLS termination, reverse proxy, rate limiting, Let's Encrypt. Yes Infinity Embedding Latest bge-m3 embeddings (production VPS). Optional (OpenRouter fallback)
1.3 LLM Provider Support Provider Models API Compatibility OpenAI GPT-4o, GPT-4o-mini, o1, o3 Native Anthropic Claude Opus 4.x, Sonnet 4.x, Haiku 4.x Native Google Gemini 2.5 Pro, Flash, Pro Vision Native (VLM for OCR) OpenRouter 100+ models routed Universal Typhoon Typhoon-2 (SCB 10X, Thai-tuned) OpenAI-compatible Local / On-premise vLLM, Ollama, LM Studio, Infinity OpenAI-compatible BYOK (Bring Your Own Key) Any OpenAI-compatible endpoint Configurable
1.4 Performance Targets Metric Target Notes Chat completion latency (p50) < 2 seconds Streaming start; full response depends on model and length. Chat completion latency (p95) < 5 seconds Streaming start. Embedding latency (p95) < 500 ms bge-m3 on Infinity (single GPU). OCR latency (p95) < 8 seconds Per page, Gemini Flash VLM. RAG search latency (p95) < 300 ms Hybrid vector + BM25. API throughput 500+ RPS Per AI Gateway node, scalable. Concurrent users 1,000+ Per Portal Service node. Document upload 200 MB max Single file, configurable. Workflow execution 30 min max (sync), unlimited (async) Async pattern recommended for long jobs.
1.5 Storage Capacity Item Default allocation Scaling PostgreSQL 20 GB Up to 2 TB (RDS-compatible) SQLite (AI Gateway) 5 GB WAL mode, auto-vacuum OpenSearch 100 GB Multi-node cluster, scalable to TB. MinIO 500 GB Distributed mode, scalable to PB. KB Documents per bucket Unlimited Practical: 100K+ docs per bucket. Vector embeddings per index 10M+ OpenSearch k-NN scalable.
1.6 Protocols & Standards Layer Protocol / Standard Transport HTTPS (TLS 1.3), HTTP/2, HTTP/3 (QUIC) API REST/JSON (OpenAPI 3.1), Server-Sent Events (SSE), WebSocket (A2UI). Auth OAuth 2.1, OIDC, SAML 2.0, Bearer JWT (RS256). LLM OpenAI Chat Completions v1, Anthropic Messages v1, custom. Tool calling OpenAI tools format, MCP (Model Context Protocol). Storage S3 API v4 (SigV4 signing) Search OpenSearch v2 API, k-NN (HNSW), BM25 Webhooks Standard HTTP + HMAC-SHA256 signature verification.
2. Functional Capabilities 2.1 Core AI Features Feature Status LLM Proxy (OpenAI-compatible) Production Streaming responses (SSE) Production Multi-turn conversation Production Tool calling / Function calling Production Vision / Multimodal Production OCR Engine (Thai supported) Production Embeddings Production Fine-tuning support Roadmap Batch processing Production
2.2 Agent Capabilities Feature Status Multi-skill agents (composable system prompts) Production Knowledge Base bindings (RAG) Production Tool integration (MCP + built-in) Production Fallback model chain (B9) Production Async execution + status polling Production Agent versioning + promotion Production AGENTS.md context files (versioned) Production Self-learning via Hermes curator Production Activity audit log Production
2.3 Workflow Engine Feature Status Visual workflow editor (React Flow) Production DAG validation + cycle detection Production Node types: agent, condition, http, code, ocr, kb_search Production Node types: presidio_mask, email_send, webhook_send Production Human-in-the-Loop (HITL) approval Production Workflow versioning + atomic activation Production Async + sync execution modes Production Test playground (live SSE streaming) Production Document workflow trigger (KB ingestion) Production Workflow templates library Production
2.4 Knowledge Base (RAG) Feature Status Document upload (PDF, DOCX, XLSX, image) Production OCR for image-based PDFs (Gemini VLM) Production Automatic chunking (semantic split) Production Embedding generation (bge-m3) Production Vector search (HNSW k-NN) Production Full-text search (BM25 with Thai analyzer) Production Hybrid search (RRF rerank) Production Tabular KB (text-to-SQL) Production Citation extraction Production Knowledge Graph (entity extraction) Beta Multi-tenant bucket isolation Production
2.5 Channel & Integration Feature Status Channel types: WEB_APP, API, WEBHOOK Production Provider integrations: LINE, Telegram, Messenger, Slack Production Channel API key per integration Production Webhook signature verification (HMAC-SHA256) Production Per-channel rate limit + quota Production Channel + User JWT dual-auth Production AGENTS.md per-channel context Production
2.6 Evaluation Framework Feature Status Test case management (manual + bulk import) Production Verifier plugins (declarative DSL + JavaScript) Production LLM-as-Judge ensemble (3-5 judges) Production Pairwise A/B testing Production Cycle execution with SSE event stream Production Cost tracking per evaluation run Production Shadow runs (production traffic mirror) Production Calibration with human labels Production Promotion governance + audit + rollback Production LLM-suggested rubrics Production
2.7 Observability Feature Status Trace hierarchy (Trace > Span > Generation) Production Usage logs with full I/O capture Production Cost tracking (per VK, per channel, per agent) Production Cost breakdown by trace Production Resource price configuration Production AI-generated dashboards Production Datatable UI for log inspection Production User feedback collection (thumbs/score/comment) Production Audit event log (admin actions) Production
2.8 Security & Compliance Feature Status PII detection + masking (Presidio) Production PII masking audit log Production Guardrails (input/output filtering) Production Configurable masking rules per VK Production Encrypted credential vault Production RBAC (role-based access control) Production Multi-tenant resource isolation Production Audit trail (all admin + critical actions) Production Session encryption (AES-256) Production Secrets at rest encryption Production
2.9 Identity & SSO Feature Status Keycloak integration (OIDC) Production Multi-realm support Production Custom OIDC providers per realm Production SAML 2.0 federation Production LDAP/AD user federation Production Identity linking (multi-provider per user) Production JWT refresh + silent renewal Production Logout + session revocation Production
2.10 Administration Feature Status Web-based admin console Production Virtual Key management (CRUD + budget) Production Model management (CRUD + test + probe) Production Channel CRUD + key regeneration Production Agent CRUD + KB/skill bindings Production User management + role assignment Production Branding customization (logo, theme, footer) Production SSO realm config UI Production
3. Compliance & Standards 3.1 Personal Data Protection Act (PDPA) Requirement Implementation Data subject identification Per-user audit trail, identity resolve API. PII detection at ingestion Presidio integration (Thai + English). PII masking before LLM call Configurable per virtual key/channel. PII masking audit log Full retention, queryable. Right to erasure DELETE endpoints for user data, memory, history. Consent management Configurable consent flow in Portal. Data residency On-premise / private cloud deployment. Cross-border transfer None by default (Thai-hosted LLMs available).
3.2 OWASP Top 10 (2021) OWASP Mitigation A01 Broken Access Control RBAC + tenant isolation + per-resource auth checks. A02 Cryptographic Failures TLS 1.3, AES-256 at rest, RS256 JWT signing. A03 Injection Parameterized queries, input validation, sandboxed code execution. A04 Insecure Design Defense-in-depth (Caddy + Gateway + RBAC). A05 Security Misconfiguration Hardened Docker images, .env validation, secret rotation playbook. A06 Vulnerable Components Dependency scanning (security_ops skill), regular updates. A07 Identification/Auth Failures Keycloak (proven OIDC), MFA-ready, brute-force rate limit. A08 Software & Data Integrity Signed Docker images, SBOM tracking. A09 Security Logging & Monitoring OpenSearch audit log + alerts. A10 Server-Side Request Forgery URL allowlist in HTTP node, no recursion to internal services.
3.3 ISO/IEC 27001 Alignment Control Area Coverage A.5 Information security policies Documented in deployment guide. A.8 Asset management Inventory via Portal admin. A.9 Access control RBAC + Keycloak + audit log. A.10 Cryptography TLS + AES + JWT RS256. A.12 Operations security Patching, backup, change mgmt via deploy.sh. A.13 Communications security TLS termination at edge. A.14 System acquisition/development Submodule-based versioning, signed releases. A.16 Information security incident management Incident response playbook in knowledge_base/lessons_learned.md. A.18 Compliance PDPA, OWASP, sector-specific (Thai law).
3.4 Thai Government Standards Standard Compliance ETDA cybersecurity standard Audit log + access control + encryption. MDES information security regulation RBAC + audit + retention. Thai gov data sovereignty On-premise option, Thai-hosted LLMs. DGA / central gov federation OIDC/SAML federation, custom IDP support.
3.5 Audit Log Retention Event type Default retention User login/logout 90 days · max 7 years API requests 90 days · max 7 years Admin actions (CRUD) 1 year · max 7 years Workflow approvals 1 year · max 7 years PII masking events 1 year · max 7 years LLM I/O full content 30 days · max 1 year
4. Deployment & Operations 4.1 Hardware Specifications Minimum (POC / Development)
Component Spec CPU 8 cores (Intel Xeon / AMD EPYC equivalent) RAM 16 GB Disk 200 GB SSD Network 100 Mbps OS Ubuntu 22.04 LTS / RHEL 9
Recommended (Production, ~100 concurrent users)
Component Spec CPU 16-32 cores RAM 64 GB (128 GB with local embedding) Disk 1 TB NVMe SSD + 2 TB HDD (sequential storage) GPU (optional) NVIDIA A10 / L4 (for local embedding/LLM) Network 1 Gbps internal, 100 Mbps internet OS Ubuntu 22.04 LTS
Large Scale (Production, 1,000+ users)
Component Spec Architecture Multi-node Docker Swarm / Kubernetes AI Gateway nodes 3+ (load balanced) Portal Service nodes 2+ (active-active) PostgreSQL HA cluster (primary + 2 replicas) OpenSearch 3-node cluster minimum MinIO Distributed mode (4+ nodes) Cache layer Redis (optional)
4.2 Software Requirements Item Version Docker Engine 24.x or newer Docker Compose v2.20 or newer Linux Kernel 5.15 or newer OpenSSL 3.0 or newer Git 2.40 or newer (for source deployment)
4.3 Network Topology
4.4 Subdomain Layout (per partner) Subdomain Purpose Auth app.{domain} Portal (admin + user UI) Keycloak SSO hub.{domain} App launcher Optional chat.{domain} Main chat application Keycloak SSO s3.{domain} MinIO presigned URLs SigV4 gw.{domain} AI Gateway public proxy VK / Channel key doc.{domain} Documentation portal SSO (manual + API), public (datasheet) {app}.{domain} Customer-specific apps (FPO, TCC, FDA, etc.) Per-app role
4.5 Scaling Guidance Scale tier Concurrent users Setup Pilot 10-50 Single VPS, all stacks co-located Small Production 50-500 Dedicated DB host + app VPS Medium Production 500-5,000 Multi-node Docker Swarm, replicated DB Enterprise 5,000+ Kubernetes, HA stack, multi-region
4.6 Backup & Disaster Recovery Item Notes RTO RPO PostgreSQL Daily full + WAL streaming 1 hour 5 minutes SQLite (AI Gateway) Hourly snapshot to MinIO 1 hour 1 hour OpenSearch Snapshot to S3-compatible 4 hours 1 hour MinIO Replication to secondary site 2 hours Real-time Configuration Git repo + .env.uat templates 30 minutes Latest commit
4.7 Update & Patching Dependency scanning: Weekly automated scan via the security_ops skill (Trivy + pip-audit/npm-audit/gosec).Security patches: Within 7 days of CVE published (CVSS >= 7.0 triggers immediate ticket).Feature updates: Monthly release cycle aligned with upstream LLM provider model releases.Major versions: Quarterly cadence.Deployment: Zero-downtime rolling updates via deploy.sh with health checks gating each container restart.4.8 Monitoring & Alerting Health checks: /health (AI Gateway), /actuator/health (Portal Service).Metrics: Prometheus-compatible (optional).Logs: JSON-structured via Docker log driver, integrates with Loki/ELK if available.Alerts: Configurable webhook into Slack/email/PagerDuty.4.9 Supported Deployment Modes Mode Description On-premise Customer's data center, full air-gapped possible Private Cloud Customer's AWS/Azure/GCP account Hybrid Frontend in cloud + LLM on-premise (data residency) Managed (SaaS) Group of companies managed hosting
4.10 Pricing & Licensing Model Perpetual licence + annual maintenance & support.Subscription (annual / multi-year).Pay-per-use (LLM tokens + storage).Hybrid model (licence + token bundle).For more information:
This document is for reference in TOR and proposal documents only. Actual specifications in each project may be adjusted to match the procuring organisation's specific requirements.