Hospital Information System (HIS) - Technical Architecture
A high-performance, distributed healthcare infrastructure comprised of 9+ decoupled microservices. The system enforces strict data privacy through PII-sanitized event choreography and ensures authoritative consistency via synchronous gRPC/Protobuf protocols. It implements advanced distributed patterns including Transactional Outbox, Sagas, and Two-Tier Hybrid Caching for high-throughput clinical workflows.
System Architecture
High-Level Technical Topology
(Port: 4004)"] Auth["Auth Service
(JWT Security)"] end subgraph "Synchronous Clinical Core (gRPC Concern)" PS["Patient Service
(Port: 8080)"] DS["Doctor Service
(Port: 8083)"] AS["Appointment Service
(Port: 8084)"] end subgraph "Transactional Outbox Publishers" AD["Admission Service
(Port: 8086)"] SS["Support Service
(Port: 8085)"] end subgraph "Infrastructure Layer (Persistence)" Kafka[("Apache Kafka
(Event Stream)")] Redis[("Redis Shared Cache
(L1/L2 Store)")] end subgraph "Event-Driven Consumers" BI["Billing Service
(Invoicing)"] NO["Notification Service
(Dispatch)"] end %% Ingress Flow LB --> GW GW --> Auth GW -- "REST" --> AS GW -- "REST" --> AD GW -- "REST" --> PS GW -- "REST" --> DS %% Sync gRPC Connections AS -- "gRPC/9090" --> PS AS -- "gRPC/9005" --> DS AD -- "gRPC/9090" --> PS NO -- "gRPC Fallback" --> PS %% Reliability: Transactional Outbox Pattern AD -- "SQL Commit" --> AD_DB[(PostgreSQL)] SS -- "SQL Commit" --> SS_DB[(PostgreSQL)] AD_DB -- "Polling Relay" --> Kafka SS_DB -- "Polling Relay" --> Kafka %% Direct Events AS -- "Produce" --> Kafka %% Consumption Kafka --> BI Kafka --> NO %% Distributed Cache Utilization SS -.-> Redis AD -.-> Redis NO -.-> Redis
| Service Domain | Port | Communication Path | Infrastructure / Pattern Stack |
|---|---|---|---|
| api-gateway | 4004 | Edge Ingress | Reactive WebFilter / JWT Stateless Auth |
| patient-service | 8080 | gRPC Server | CQRS (Dual PostgreSQL Datasources) |
| doctor-service | 8083 | gRPC Server | Provider Master Data / gRPC Registry |
| appointment-service | 8084 | gRPC Client / Kafka | Transactional Write / Dynamic Constraint Check |
| billing-service | 8081 | Kafka Consumer | Idempotent Sink / Ledger Persistence |
| support-service | 8085 | Redis / Outbox | Polling Outbox Pattern / Redis L1 Cache |
| admission-service | 8086 | Redis / gRPC / Outbox | Transactional Outbox / Multi-tier Redis Cache |
| notification-service | 8090 | Redis / Kafka / gRPC | Hybrid Enrichment / Redis L1 Hydration |
The infrastructure utilizes a logical partitioning model. While services share a PostgreSQL cluster for development parity, they maintain strict schema-level isolation. Cross-domain data access is permitted only through gRPC for synchronous reads and Kafka for asynchronous synchronization.
Service Data Models
Each service exposes a REST API and persists its own entity graph. Below are the primary domain models per service.
Clinical Master Data (PMD). Implements CQRS with isolated write/read schemas and provides gRPC clinical history hydration.
pmdRecord PatientRecord (gRPC)
clinicalFlags EnumSet<Flag>
Provider registry. Manages clinical credentials, specialties, and real-time availability via gRPC server stubs.
licenseType String
availability ScheduleState
Booking orchestration. Validates multi-domain constraints via synchronous gRPC before committing to the local schema.
patientId UUID
serviceDate LocalDateTime
Inpatient lifecycle. Uses the Transactional Outbox pattern to bridge clinical state changes to downstream financial events.
bedId UUID
status AdmitStatus
Dispatcher. Uses Hybrid Enrichment (Redis L1 / gRPC L2) to resolve PII from sanitized Kafka event payloads.
hydrator PatientGrpcStub
Support domain. Merges Lab and Inventory operations; publishes transactional state updates via reliable Outbox Relay.
status ServiceStatus
The system is organized into three distinct operational layers to ensure scalability and fault isolation:
- Layer 1: Edge & Security (Ingress) — Reactive API Gateway enforcing stateless JWT authentication and path rewriting.
- Layer 2: Synchronous Clinical Core (gRPC) — Domain services (Appointment/Admission) performing authoritative validations via zero-allocation Protobuf stubs.
- Layer 3: Asynchronous Event Plane (Kafka) — Decoupled side effects (Billing/Notifications) and state synchronization through reliable Outbox patterns.
Request Flow — Authentication
Auth endpoints (/api/auth/**) are whitelisted at the gateway — no JWT filter runs. The gateway strips the /api prefix and forwards to auth-service on port 8089. The auth-service itself has its own Spring Security config that permits /auth/** without a session.
For all non-auth routes, the gateway's jwtAuthenticationFilter (WebFilter) extracts the Bearer token, parses claims using Keys.hmacShaKeyFor(APP_SECRET), extracts roles into SimpleGrantedAuthority objects, and writes the authentication into ReactiveSecurityContextHolder. Clock skew tolerance is set to 5 minutes. Downstream services receive the request without further auth checks.
Core Logic — Hybrid Enrichment
The system uses a "Cache-First" enrichment strategy to handle PII-sanitized events from the Kafka stream. This ensures high performance without sacrificing data authority.
A consumer (e.g., Notification or Billing) receives an event containing surrogate UUIDs (e.g., patientId). No PII is carried on the wire.
The service attempts to resolve the PII from the local L1 Redis cache using the UUID as a key. P95 latency: < 5ms.
On cache miss, the service invokes a gRPC BlockingStub against the patient-service master (Port 9090). This is the authoritative truth.
The retrieved PII is written back to Redis with a TTL of 3600s to satisfy future requests for the same entity.
Core Logic — Admission & Financials
State synchronization between Clinical Admission and Financial Billing is guaranteed via the Transactional Outbox pattern, ensuring at-least-once delivery semantics for discharge events.
Request Flow — Appointment Journey
Appointment creation represents the primary synchronous write path, enforcing strict availability constraints via cross-service gRPC validation before persistence.
Gateway validates the JWT and routes to appointment-service. The AppointmentController maps the request to a domain entity.
The service invokes PatientQueryBlockingStub (Port 9090) to verify clinical existence and DoctorQueryBlockingStub to verify provider availability in real-time.
All gRPC calls are wrapped in Resilience4j instances. If the patient-service latency exceeds 200ms, the circuit opens to protect the calling thread pool.
The appointment state and a corresponding AppointmentScheduled event are committed atomically to the local PostgreSQL schema using JDBC transactions.
Request Flow — Automated Invoicing
The billing service is purely event-driven. It listens on clinical and financial topics to generate PDF records.
Request Flow — Automated Invoicing
The billing service operates as a downstream consumer of clinical events. It ensures financial integrity through idempotent event processing and late-bound PII enrichment.
KafkaListener validates the eventId against the processed_events log to prevent duplicate billing on at-least-once Kafka delivery.
Since the event is PII-sanitized, the service invokes the PatientQueryBlockingStub to retrieve the patient's billing address and full name.
The InvoiceService generates the ledger entry and invokes an external REST adapter to produce the standardized clinical invoice PDF.
The financial record is finalized in the billing_schema and the PDF metadata is linked for audit retrieval.
Request Flow — API Gateway Routing
The gateway is built on Spring Cloud Gateway (WebFlux / Project Reactor). All route definitions live in application.yml. There is no service discovery — routes are statically configured to Docker Compose service names.
| Inbound Path | Upstream | Filters |
|---|---|---|
/api/patients/** |
patient-service:8080 | StripPrefix=1 |
/api/doctors/** |
doctor-service:8083 | StripPrefix=1 |
/api/appointments/** |
appointment-service:8084 | StripPrefix=1 |
/api/auth/** |
auth-service:8089 | StripPrefix=1 |
/api/support/** |
support-service:8085 | StripPrefix=1 |
/api/admission/** |
admission-service:8086 | StripPrefix=1 |
/api/notification/** |
notification-service:8082 | StripPrefix=1 |
/api-docs/** |
— | Swagger UI aggregation |
The gateway uses @EnableWebFluxSecurity, not standard MVC security. ReactiveUserDetailsService is overridden with a no-op bean to suppress Spring Security's autoconfigured form login. Auth is handled entirely by the custom jwtAuthenticationFilter WebFilter.
Fault Tolerance — Resilience4j
Circuit breakers guard all outbound REST calls from appointment-service to patient-service and doctor-service. Configuration is identical for both:
# appointment-service / application.properties resilience4j.circuitbreaker.instances.patientService sliding-window-size = 10 failure-rate-threshold = 50 # % failures to open wait-duration-in-open-state = 10s permitted-calls-in-half-open-state = 3 minimum-number-of-calls = 5 resilience4j.retry.instances.patientService max-attempts = 3 wait-duration = 500ms
State transitions: CLOSED → (failure rate exceeds threshold) → OPEN → (wait 10s) → HALF_OPEN → (3 probe calls pass) → CLOSED. In OPEN state, all calls immediately hit the fallback method which returns false, causing the appointment creation to fail with CustomNotFoundException rather than timing out.
patient-service also configures a circuit breaker around its Kafka producer to prevent Kafka broker unavailability from blocking HTTP request threads.
Observability
Every service exposes a /actuator/prometheus endpoint via Micrometer. Prometheus scrapes all 6 services on a 5-second interval. Grafana reads from Prometheus and serves a pre-provisioned dashboard.
Metrics tracked per service
| Metric | Source | Use |
|---|---|---|
http_server_requests_seconds | Micrometer | Request rate, P95 latency, error rate by status code |
jvm_memory_used_bytes | JVM | Heap / non-heap usage per service |
hikaricp_connections_* | HikariCP | Active, idle, pending, max pool connections |
resilience4j_circuitbreaker_* | Resilience4j | State (CLOSED/OPEN/HALF_OPEN), failure rate, call rate |
jvm_gc_pause_seconds | JVM / G1GC | GC pause rate and duration |
tomcat_threads_* | Tomcat | Current and busy thread counts |
Load testing
Three k6 scripts cover different load profiles: low-stress.js (10 VUs, 30s), medium-stress.js (ramp to 50 VUs), intense-stress.js (ramp to 200 VUs). appointments-stress.js runs a full setup phase — registers a user, creates 50 patients and 50 doctors, then hammers appointment creation at 100 req/s constant arrival rate for 30 seconds. Thresholds: P95 < 2000ms, error rate < 1%.
patient-service achieves 72% instruction coverage measured by JaCoCo. Coverage spans controller (@WebMvcTest), service (Mockito unit tests), repository (@DataJpaTest with H2), and Kafka producer. DTOs, entity classes, config, exception handlers, and generated Protobuf classes are excluded from the report.
CI / CD
Each service has its own GitHub Actions workflow triggered on push to master when files under its directory change. This prevents unrelated service rebuilds.
# Per-service workflow pattern on: push: branches: [master] paths: - 'appointment-service/**' # Kubernetes workflow uses matrix strategy across 6 services strategy: matrix: service: - { name: api-gateway, port: 4004 } - { name: appointment-service, port: 8084 } - ...
The kubernetes-register.yml workflow builds multi-platform images (linux/amd64,linux/arm64) using Docker Buildx, pushes to GHCR, runs Trivy vulnerability scanning, and updates image tags in Kubernetes manifests. Dockerfile uses a two-stage build: Maven builder image + slim eclipse-temurin:21-jdk runner. JVM flags: -XX:MaxRAMPercentage=75.0 -XX:+UseG1GC -XX:+ExitOnOutOfMemoryError.
Observability & Resilience
The system is designed for high observability and fault isolation. Each microservice follows a standard set of cross-cutting paradigms to ensure operational reliability.
| Concern | Implementation Strategy | Technical Detail |
|---|---|---|
| Distributed Tracing | Micrometer Tracing + OTel | 100% sampling rate; traceId and spanId injected into MDC for log correlation across service boundaries. |
| Aggregated Logging | Kafka Appender + Logstash | Services emit JSON logs to his-audit-logs topic. Logstash processes and persists to MongoDB for audit archiving. |
| Fault Tolerance | Resilience4j Circuit Breaker | Synchronous gRPC/REST calls use sliding-window failure thresholds (50%) to prevent cascading failures. |
| Message Reliability | Dead Letter Queues (DLQ) | Kafka consumers implement FixedBackOff (3 retries). Failed records are routed to {topic}.DLQ for manual intervention. |
| Health Monitoring | Liveness/Readiness Probes | Exposed via /actuator/health. Gateway performs periodic health checks before routing traffic to upstreams. |
API Reference & Aggregation
The system provides a unified developer portal. The API Gateway aggregates Swagger/OpenAPI documentation from all downstream services, exposing them through a single ingress point.
Documentation is accessible via /aggregate/{service-name}/v3/api-docs. The Gateway portal (Port 4004) serves as the authority for the full clinical API surface.