HIS v1.0: A Deep System Design Overview

What started as a patient management experiment grew into a full Hospital Information System. This is a technical overview of what it became; the architecture, the patterns, and the tradeoffs.

What Is This System?

The Hospital Information System (HIS) is a production-grade, event-driven microservices architecture designed for comprehensive hospital operations management. It's built on Spring Boot 3.4.4, Java 21+, Apache Kafka, PostgreSQL 15, and fully containerized with Docker. The codebase is over 91% Java by byte count, the rest being Dockerfiles, Makefiles, and shell scripts.

What started as a curiosity about how real-world healthcare and banking software is structured eventually grew into a fully realized distributed system. Let me walk you through the whole thing.

The Architecture at a Glance

The system follows an event-driven microservices pattern. Every service owns its own schema in a shared PostgreSQL database, communicates asynchronously through Apache Kafka, and is routed through a central API Gateway. Here is the big picture:

API Gateway (Port 4004)

Built with Spring Cloud Gateway and Reactive WebFlux. It is the single entry point for all clients. It validates JWT tokens, routes requests to the correct downstream service, and applies rate limiting via Resilience4j.

Auth Service (Port 8089)

Handles user registration, login, and JWT token issuance (HS256). Also manages Role-Based Access Control (RBAC). Every other service validates roles at the method level using @PreAuthorize.

Patient Management (Port 8080)

Core service for patient records, medical history, and clinical data. Consumes lab-result-completed.v1 events from Kafka to keep patient records updated without direct service-to-service calls.

Doctor Service (Port 8083)

Manages doctor profiles, shifts, leave requests, and clinical ordering. Publishes lab order events using the Outbox Pattern for guaranteed delivery, and consumes results when the Lab Service finishes processing.

Lab Service (Port 8087)

Full LIMS (Laboratory Information Management System) workflow. Consumes lab orders, processes results, and publishes completion events also through the Outbox Pattern. Dead-letter topics catch failed consumer events.

Appointment Service (Port 8084)

Manages scheduling, doctor rostering, and patient appointment availability. Emits appointment-created events picked up by the Notification Service.

Billing Service (Port 8081)

Handles invoice creation, insurance splitting, and payment tracking. Consumes lab order events to auto-generate cost entries without polling any other service's database.

Admission Service (Port 8086)

Bed management and inpatient ward logistics. Supports patient admission and discharge workflows, emitting admission events for downstream consumers.

Inventory Service (Port 8088)

Tracks medical supplies and raises stock alerts. Operates as a pure event consumer, no synchronous API exposure to other services.

Notification Service (Port 8090)

Subscribes to all meaningful domain events (appointments, billing, admissions) and triggers the appropriate notifications. A pure listener, no other service calls it directly.

The Data Layer: One Database, Many Schemas

This was one of the more deliberate architectural decisions. Instead of running a separate PostgreSQL instance per service (which would be expensive and operationally painful for a solo developer), I used schema isolation within a single PostgreSQL database. Each service gets its own schema and its own dedicated database user with minimal privileges:

-- Each service gets its own isolated context
CREATE SCHEMA patient_schema;
CREATE USER patient_user WITH ENCRYPTED PASSWORD 'patient_pass_123';
GRANT ALL PRIVILEGES ON SCHEMA patient_schema TO patient_user;

-- The JDBC connection locks the service to its schema
jdbc:postgresql://postgresql:5432/patient_db?currentSchema=patient_schema

This means if the billing service's credentials are ever compromised, the attacker cannot read patient records. The isolation is enforced at the database level, not just at the application level. It's the logical equivalent of a "database per service" without the operational overhead.

The Event Bus: Apache Kafka

Kafka runs in KRaft mode (no Zookeeper) at version 3.9.0. All asynchronous communication flows through it. Here is the core topic topology:

lab-order-placed.v1

Published by Doctor Service. Consumed by Lab Service and Billing Service.

lab-result-completed.v1

Published by Lab Service. Consumed by Doctor Service and Patient Service.

appointment-created

Published by Appointment Service. Consumed by Notification Service.

billing-invoice-created

Published by Billing Service. Consumed by Notification Service.

Consumer groups are used throughout (lab-order-group, doctor-lab-result-group, billing-lab-order-group, etc.) which means any service can be horizontally scaled by simply adding more instances â€” Kafka handles the rebalancing automatically.

Reliability: The Outbox Pattern

Publishing a Kafka event and saving to a database in the same transaction is the classic Dual Write Problem. If the database write succeeds but Kafka is down, the event is lost. If Kafka succeeds but the database crashes, the event fires for data that was never persisted.

The solution I implemented is the Transactional Outbox Pattern. Instead of publishing directly to Kafka, the service writes the event to an outbox table in the same database transaction as the business data. A separate relay process reads from the outbox and publishes to Kafka, marking rows as "sent" after successful delivery:

// In a single @Transactional block:
// 1. Save the domain entity
doctorLabOrderRepository.save(labOrder);

// 2. Save the event to the outbox (same transaction)
outboxRepository.save(OutboxEvent.builder()
    .topic("lab-order-placed.v1")
    .payload(labOrderJson)
    .status(OutboxStatus.PENDING)
    .build());

// The relay picks it up and publishes to Kafka separately

ğŸ’¡ Why This Matters: The Outbox Pattern guarantees that a business operation and its associated event are atomically consistent. If Kafka is down, events accumulate in the outbox and are published when it recovers. No events are lost, no duplicates from retries without idempotency.

Resilience: Dead Letter Queues

Even with reliable event publishing, consumers can fail. A malformed payload, a downstream database error, or an unexpected null can crash a consumer. Without a safety net, that message is either dropped or blocks the partition indefinitely.

The system uses Dead Letter Topics (DLTs) as the safety net. When a consumer fails after exhausting retries, the failed message is forwarded to a designated dead-letter topic. A separate monitoring process can inspect, replay, or alert on those messages without impacting the live consumer pipeline.

âš ï¸ Important: The combination of Outbox Pattern (producer side) and Dead Letter Queues (consumer side) creates end-to-end resilience for the entire event pipeline. Together, they answer the question: "What happens to our system when things go wrong?" and the answer is: "It keeps going."

Security Architecture

Security is layered at three levels, each enforcing independently:

Layer 1: API Gateway

All external requests pass through the gateway. JWT tokens are decoded, signatures verified, and claims extracted. Invalid tokens are rejected before any service ever sees the request.

Layer 2: Method-Level Security

Each service uses @PreAuthorize annotations to enforce role-based access at the business logic level. For example, only a DOCTOR or RECEPTIONIST can admit a patient, and only the owning doctor or an ADMIN can create a lab order.

Layer 3: Database-Level Isolation

As described above, each service connects with a user that has privileges only within its own schema. Compromising the application layer of one service cannot expose another service's data.

// Layer 2 in practice
@PreAuthorize("hasRole('ADMIN') or @securityService.isDoctorOwner(authentication, #doctorId)")
public CreateLabOrderResponseDto createLabOrder(@PathVariable UUID doctorId, ...) { ... }

@PreAuthorize("hasAnyRole('DOCTOR', 'RECEPTIONIST')")
public ResponseEntity<Admission> admitPatient(...) { ... }

Observability Stack

A distributed system without observability is just chaos without a window. The HIS ships with a full observability stack:

Prometheus

Scrapes /actuator/prometheus from every service every 15 seconds. Stores 7 days of time-series metrics.

Grafana

Provisioned dashboards for real-time service monitoring, JVM metrics, and custom business metrics.

Micrometer Tracing

OpenTelemetry bridge for distributed trace propagation across services. Correlation IDs flow through every log entry.

Logstash + MongoDB

Logback Kafka Appender ships all service logs to Kafka â†’ Logstash â†’ MongoDB. Searchable audit trail by user, service, and timestamp.

Deployment: Docker Compose with 16 Services

The full system runs with a single make dev-up command via Docker Compose. In total there are 16 containers: the 10 Java microservices, Kafka, PostgreSQL, MongoDB, Logstash, Prometheus, and Grafana. Resource limits are defined per container to avoid a single service consuming the host:

# Typical Java service constraints
deploy:
  resources:
    limits:
      cpus: '1.0'
      memory: 512M
    reservations:
      cpus: '0.25'
      memory: 256M

# Kafka gets more headroom
deploy:
  resources:
    limits:
      cpus: '1.5'
      memory: 1G

Multi-stage Docker builds keep the runtime images lean: a Maven build stage compiles the fat jar, and the runtime stage only contains the JRE and the jar. The Makefile wraps all common operations, so there is no need to remember long Docker commands during development.

The Maven BOM Strategy

With 10 Java services, managing dependency versions across all POMs manually is a maintenance nightmare. The project uses a Bill of Materials (BOM) module at the root. Every service imports the BOM and never specifies a version for shared dependencies, the BOM owns all of them:

<!-- patient-management-bom/pom.xml -->
<dependencyManagement>
  <dependencies>
    <dependency>
      <groupId>org.springframework.boot</groupId>
      <artifactId>spring-boot-starter-parent</artifactId>
      <version>3.4.4</version>
      <type>pom</type>
      <scope>import</scope>
    </dependency>
    <!-- JJWT, gRPC, Micrometer, SpringDoc, Resilience4j... -->
  </dependencies>
</dependencyManagement>

âœ… Result: Upgrading Spring Boot for all 10 services is a single line change in one file. No drift, no version mismatches, no "works on my service" incidents.

What I'd Do Differently

Looking back at the system objectively, a few things stand out as natural next steps rather than regrets:

Kubernetes over Docker Compose: The resource management, health checks, and auto-scaling capabilities of K8s are the right home for this system. Helm charts would replace the current Docker Compose files.
gRPC for internal calls: The infrastructure is already in place (Protocol Buffers 4.29.1, gRPC 1.68.0) but not yet activated. Some synchronous inter-service calls would benefit from the schema enforcement and lower overhead of gRPC over REST.
CQRS for analytics: The current read/write model shares the same schema. Separating them via event sourcing would enable real-time reporting without impacting write latency.
HashiCorp Vault for secrets: Environment variables in Docker Compose get the job done for development, but a production system should pull secrets from a proper secrets manager with rotation policies.
HIPAA compliance layer: Healthcare data requires encryption at rest (TDE), data retention policies, and formal access audit trails. MongoDB + Logstash covers audit logging, but the other controls need dedicated attention.

Final Thoughts

When I started this project, my primary question was: "How do real hospital systems work?" The answer I built is event-driven, schema-isolated, observable, and resilient. It is not perfect, but it is an honest reflection of the constraints faced by a solo developer building architecture that mirrors production-grade standards.

Patterns in this system such as Outbox, DLQ, RBAC, schema isolation, and BOM dependency management are more than academic exercises. They represent practical solutions to distributed systems problems that I had to understand deeply to implement from scratch. This understanding constitutes the primary outcome of the project.

The architecture is well-suited for healthcare operations: careful data management, audit trails, and high availability are all addressed. With the recommended enhancements (Kubernetes, gRPC, CQRS), it is ready to scale to enterprise-level demands while maintaining security and compliance standards.