Prompt Serve
Data SheetPublic

Prompt Serve — Enterprise AI Platform

Data Sheet · version 2026.05

This document summarises the Prompt Serve platform characteristics for reference in the Terms of Reference (TOR), proposal, statement of work (SOW), and compliance sheets for Thai government agencies, state enterprises, and large organisations.

Product name (canonical)Prompt Serve
CategoryEnterprise AI Platform (on-premise / private cloud)
DeploymentDocker-based, Linux x86_64
ArchitectureMicroservices, container-orchestrated
LicenceCommercial Licence — Group of companies
Languages supportedThai, English (UI + LLM responses)
Standards compliancePDPA, OWASP Top 10, ISO/IEC 27001 alignment
Audit log retentionMinimum 90 days (configurable up to 7 years)

1. Technical Specifications

1.1 Component Matrix

ComponentTechnologyPortDatabaseFunction
Portal (Frontend)React 18 + Vite + TypeScript + Tailwind3000-UI for admins and users - manage channels, agents, workflows, and KB. Supports TH/EN and light/dark themes.
Portal Service (Backend)Kotlin + Spring Boot 3.5 WebFlux8080PostgreSQLREST/SSE API, OAuth2 proxy, user/role management, BFF to AI Gateway.
AI GatewayGo 1.25 + eino framework4000 / 4001SQLite 3LLM proxy, agent runtime, workflow engine, RAG and OCR pipeline.
Agent RuntimeGo + eino(internal)-Agentic execution with tool calling and multi-step reasoning.
PyWorkerPython 3.11 + FastAPI(internal)-Document processing and OCR orchestration.
Hermes SidecarPython 3.11 + FastAPI(internal)SQLite 3Self-learning skills, memory, and scheduler.
Documentation PortalNext.js 15 + Fumadocs3012-User manual, API spec, and data sheet for internal teams and partners.

1.2 External Stack Dependencies

ServiceVersionFunctionSelf-hosted
Keycloak24.xIdentity provider, OIDC/SAML, user federation, SSO.Yes
PostgreSQL16.xPortal Service primary database.Yes
OpenSearch2.13+RAG (vector + full-text), traces, usage logs, audit events.Yes
MinIOLatest stableS3-compatible object storage for documents, uploads, fixtures.Yes
Presidio AnalyzerLatest stablePII detection (Microsoft OSS, port 5001).Yes
Caddy2.8.xTLS termination, reverse proxy, rate limiting, Let's Encrypt.Yes
Infinity EmbeddingLatestbge-m3 embeddings (production VPS).Optional (OpenRouter fallback)

1.3 LLM Provider Support

ProviderModelsAPI Compatibility
OpenAIGPT-4o, GPT-4o-mini, o1, o3Native
AnthropicClaude Opus 4.x, Sonnet 4.x, Haiku 4.xNative
GoogleGemini 2.5 Pro, Flash, Pro VisionNative (VLM for OCR)
OpenRouter100+ models routedUniversal
TyphoonTyphoon-2 (SCB 10X, Thai-tuned)OpenAI-compatible
Local / On-premisevLLM, Ollama, LM Studio, InfinityOpenAI-compatible
BYOK (Bring Your Own Key)Any OpenAI-compatible endpointConfigurable

1.4 Performance Targets

MetricTargetNotes
Chat completion latency (p50)< 2 secondsStreaming start; full response depends on model and length.
Chat completion latency (p95)< 5 secondsStreaming start.
Embedding latency (p95)< 500 msbge-m3 on Infinity (single GPU).
OCR latency (p95)< 8 secondsPer page, Gemini Flash VLM.
RAG search latency (p95)< 300 msHybrid vector + BM25.
API throughput500+ RPSPer AI Gateway node, scalable.
Concurrent users1,000+Per Portal Service node.
Document upload200 MB maxSingle file, configurable.
Workflow execution30 min max (sync), unlimited (async)Async pattern recommended for long jobs.

1.5 Storage Capacity

ItemDefault allocationScaling
PostgreSQL20 GBUp to 2 TB (RDS-compatible)
SQLite (AI Gateway)5 GBWAL mode, auto-vacuum
OpenSearch100 GBMulti-node cluster, scalable to TB.
MinIO500 GBDistributed mode, scalable to PB.
KB Documents per bucketUnlimitedPractical: 100K+ docs per bucket.
Vector embeddings per index10M+OpenSearch k-NN scalable.

1.6 Protocols & Standards

LayerProtocol / Standard
TransportHTTPS (TLS 1.3), HTTP/2, HTTP/3 (QUIC)
APIREST/JSON (OpenAPI 3.1), Server-Sent Events (SSE), WebSocket (A2UI).
AuthOAuth 2.1, OIDC, SAML 2.0, Bearer JWT (RS256).
LLMOpenAI Chat Completions v1, Anthropic Messages v1, custom.
Tool callingOpenAI tools format, MCP (Model Context Protocol).
StorageS3 API v4 (SigV4 signing)
SearchOpenSearch v2 API, k-NN (HNSW), BM25
WebhooksStandard HTTP + HMAC-SHA256 signature verification.

2. Functional Capabilities

2.1 Core AI Features

FeatureStatus
Production
Production
Production
Production
Production
Production
Production
Roadmap
Production

2.2 Agent Capabilities

FeatureStatus
Production
Production
Production
Production
Production
Production
Production
Production
Production

2.3 Workflow Engine

FeatureStatus
Production
Production
Production
Production
Production
Production
Production
Production
Production
Production

2.4 Knowledge Base (RAG)

FeatureStatus
Production
Production
Production
Production
Production
Production
Production
Production
Production
Beta
Production

2.5 Channel & Integration

FeatureStatus
Production
Production
Production
Production
Production
Production
Production

2.6 Evaluation Framework

FeatureStatus
Production
Production
Production
Production
Production
Production
Production
Production
Production
Production

2.7 Observability

FeatureStatus
Production
Production
Production
Production
Production
Production
Production
Production
Production

2.8 Security & Compliance

FeatureStatus
Production
Production
Production
Production
Production
Production
Production
Production
Production
Production

2.9 Identity & SSO

FeatureStatus
Production
Production
Production
Production
Production
Production
Production
Production

2.10 Administration

FeatureStatus
Production
Production
Production
Production
Production
Production
Production
Production

3. Compliance & Standards

3.1 Personal Data Protection Act (PDPA)

RequirementImplementation
Per-user audit trail, identity resolve API.
Presidio integration (Thai + English).
Configurable per virtual key/channel.
Full retention, queryable.
DELETE endpoints for user data, memory, history.
Configurable consent flow in Portal.
On-premise / private cloud deployment.
None by default (Thai-hosted LLMs available).

3.2 OWASP Top 10 (2021)

OWASPMitigation
RBAC + tenant isolation + per-resource auth checks.
TLS 1.3, AES-256 at rest, RS256 JWT signing.
Parameterized queries, input validation, sandboxed code execution.
Defense-in-depth (Caddy + Gateway + RBAC).
Hardened Docker images, .env validation, secret rotation playbook.
Dependency scanning (security_ops skill), regular updates.
Keycloak (proven OIDC), MFA-ready, brute-force rate limit.
Signed Docker images, SBOM tracking.
OpenSearch audit log + alerts.
URL allowlist in HTTP node, no recursion to internal services.

3.3 ISO/IEC 27001 Alignment

Control AreaCoverage
Documented in deployment guide.
Inventory via Portal admin.
RBAC + Keycloak + audit log.
TLS + AES + JWT RS256.
Patching, backup, change mgmt via deploy.sh.
TLS termination at edge.
Submodule-based versioning, signed releases.
Incident response playbook in knowledge_base/lessons_learned.md.
PDPA, OWASP, sector-specific (Thai law).

3.4 Thai Government Standards

StandardCompliance
Audit log + access control + encryption.
RBAC + audit + retention.
On-premise option, Thai-hosted LLMs.
OIDC/SAML federation, custom IDP support.

3.5 Audit Log Retention

Event typeDefault retention
User login/logout90 days · max 7 years
API requests90 days · max 7 years
Admin actions (CRUD)1 year · max 7 years
Workflow approvals1 year · max 7 years
PII masking events1 year · max 7 years
LLM I/O full content30 days · max 1 year

4. Deployment & Operations

4.1 Hardware Specifications

Minimum (POC / Development)

ComponentSpec
CPU8 cores (Intel Xeon / AMD EPYC equivalent)
RAM16 GB
Disk200 GB SSD
Network100 Mbps
OSUbuntu 22.04 LTS / RHEL 9

Recommended (Production, ~100 concurrent users)

ComponentSpec
CPU16-32 cores
RAM64 GB (128 GB with local embedding)
Disk1 TB NVMe SSD + 2 TB HDD (sequential storage)
GPU (optional)NVIDIA A10 / L4 (for local embedding/LLM)
Network1 Gbps internal, 100 Mbps internet
OSUbuntu 22.04 LTS

Large Scale (Production, 1,000+ users)

ComponentSpec
ArchitectureMulti-node Docker Swarm / Kubernetes
AI Gateway nodes3+ (load balanced)
Portal Service nodes2+ (active-active)
PostgreSQLHA cluster (primary + 2 replicas)
OpenSearch3-node cluster minimum
MinIODistributed mode (4+ nodes)
Cache layerRedis (optional)

4.2 Software Requirements

ItemVersion
Docker Engine24.x or newer
Docker Composev2.20 or newer
Linux Kernel5.15 or newer
OpenSSL3.0 or newer
Git2.40 or newer (for source deployment)

4.3 Network Topology

4.4 Subdomain Layout (per partner)

SubdomainPurposeAuth
app.{domain}Portal (admin + user UI)Keycloak SSO
hub.{domain}App launcherOptional
chat.{domain}Main chat applicationKeycloak SSO
s3.{domain}MinIO presigned URLsSigV4
gw.{domain}AI Gateway public proxyVK / Channel key
doc.{domain}Documentation portalSSO (manual + API), public (datasheet)
{app}.{domain}Customer-specific apps (FPO, TCC, FDA, etc.)Per-app role

4.5 Scaling Guidance

Scale tierConcurrent usersSetup
Pilot10-50Single VPS, all stacks co-located
Small Production50-500Dedicated DB host + app VPS
Medium Production500-5,000Multi-node Docker Swarm, replicated DB
Enterprise5,000+Kubernetes, HA stack, multi-region

4.6 Backup & Disaster Recovery

ItemNotesRTORPO
PostgreSQLDaily full + WAL streaming1 hour5 minutes
SQLite (AI Gateway)Hourly snapshot to MinIO1 hour1 hour
OpenSearchSnapshot to S3-compatible4 hours1 hour
MinIOReplication to secondary site2 hoursReal-time
ConfigurationGit repo + .env.uat templates30 minutesLatest commit

4.7 Update & Patching

  • Dependency scanning: Weekly automated scan via the security_ops skill (Trivy + pip-audit/npm-audit/gosec).
  • Security patches: Within 7 days of CVE published (CVSS >= 7.0 triggers immediate ticket).
  • Feature updates: Monthly release cycle aligned with upstream LLM provider model releases.
  • Major versions: Quarterly cadence.
  • Deployment: Zero-downtime rolling updates via deploy.sh with health checks gating each container restart.

4.8 Monitoring & Alerting

  • Health checks: /health (AI Gateway), /actuator/health (Portal Service).
  • Metrics: Prometheus-compatible (optional).
  • Logs: JSON-structured via Docker log driver, integrates with Loki/ELK if available.
  • Alerts: Configurable webhook into Slack/email/PagerDuty.

4.9 Supported Deployment Modes

ModeDescription
On-premiseCustomer's data center, full air-gapped possible
Private CloudCustomer's AWS/Azure/GCP account
HybridFrontend in cloud + LLM on-premise (data residency)
Managed (SaaS)Group of companies managed hosting

4.10 Pricing & Licensing Model

  • Perpetual licence + annual maintenance & support.
  • Subscription (annual / multi-year).
  • Pay-per-use (LLM tokens + storage).
  • Hybrid model (licence + token bundle).

For more information:

This document is for reference in TOR and proposal documents only. Actual specifications in each project may be adjusted to match the procuring organisation's specific requirements.