Dev.Chan64's Blog

Go Home
Show Cover Slide Show Cover Slide

Designing a Robot Operating Platform that Brings Cloud Experience to On-Premises

gpt-4-turbo has translated this article into English.


Here is the English translation of the provided Markdown document, preserving all the original Markdown formatting:


This is a design case aimed at implementing cloud-level development experiences and control flows even in on-premise environments.
The key is not just transferring the technology stack, but reproducing the entire developer experience in the cloud.


Design Philosophy

I believe that the essence of design is not replicating technology, but transplanting familiar flows into new environments.
That is, developers and operators should be able to maintain the same development and operational experience regardless of the environment.

This design aimed to create a structure where “the flow itself can be maintained” even in environments where direct use of cloud technologies is not possible.


Problem Definition and Operational Constraints

This project faced the challenge of operating multiple autonomous robots within a secured closed network environment.
The original system was completely optimized for the cloud with components like AWS Lambda and WebSocket, but the actual operational environment lacked internet access, which imposed the following constraints:

Environmental Constraints


Overall Architecture Structure

The entire system was configured to reflect sensor and state messages from ROS-based robots in real-time on the control server and UI through a NATS-based cluster messaging structure.
Additionally, the serverless event processing flow (Lambda style) was implemented by porting it to a Kubeless-based FaaS structure.

Architecture Diagram

graph TD

subgraph RB1[Robot1]
  ROS[ROS Node]
  AGENT[Agent - YAML & Binary]
  SVC[ROS Update Handler]
  KUBE[Kubeless Function]
  NATS1[NATS - Robot1]
end

RB2[Robot2,...] --> NATS2

subgraph NATS[NATS Cluster]
  NATS1 <--> NATS2[NATS - Robot2,...] <--> NATS_MAIN[NATS - Control Server]
end

ROS --> AGENT --> NATS1
AGENT --> MQTT[AWS IoT MQTT]
NATS --> WS[WebSocket Agent] --> UI[Touch UI / WebView]

MQTT --> SVC --> KUBE
S3[S3 Package Download and Execution] --> SVC

Structure Summary:


Key Design Elements and Implementation

The core of this project was to implement a structure that secures real-time responsiveness and operational flexibility at cloud-level in a restricted on-premise environment.
The design was centered around the following five areas:

1. Messaging Processing Structure

2. UI Integration Structure

3. Automated Deployment Flow

4. Fault Detection and Monitoring

5. Operation and Maintenance Strategy

This design allowed us to secure a structure that consistently works for real-time control, remote deployment, and fault response even on-premise.


Technology Selection and Design Rationale

During the design process, considering the environmental constraints that prevented the use of existing cloud-based technologies, we reviewed and selected technological alternatives that could operate on-premise with similar functionalities.
Here are the rationales for major technology elements:

Item Choice or Transition Design Rationale
Kafka → NATS ✅ Chose NATS
(Excluded Kafka)
Kafka has strengths in high throughput, but issues with Python client latency and installation complexity were concerns. In contrast, NATS is lightweight and performs stably in Python environments, advantageous for real-time responses.
Lambda → Kubeless ✅ Adopted Kubeless
(On-premise)
Used Kubeless to replicate Lambda’s serverless architecture on-premise. Being Kubernetes-native allows for local execution and deployment via Helm, making it a viable alternative.
MQTT Trigger-Based Automation ✅ Partially Applied Configured on-premise robots to receive cloud MQTT event messages and operate internally via ROS and Kubeless. Not applied to entire app deployment, experimentally introduced for core functions.
Prometheus + Grafana ⚪ Experimentally Applied Configured some nodes with Prometheus collectors and attempted Grafana integration for visualization of message throughput and state metrics. Not essential for operation but considered for scalability and diagnostics.
Video Stream Messaging ❌ Excluded from Operational Deployment Initially designed to handle video frame messages via NATS, but excluded from actual operational structure due to bandwidth and real-time issues in a closed network environment.

This approach of selecting technologically adapted alternatives and redesigning structures allowed us to maintain the original cloud-based development flow seamlessly on-premise.


Design Outcomes and Insights

This design was not merely about changing technologies, but a structural experiment and tuning process to bridge the gap between the operational environment and technological structure.
The overall results can be summarized from three perspectives:

1. Balance Between Real-Time Responsiveness and Structural Stability

2. Coexistence Strategy of Automation and Manual Operation

3. Designing Compromise Points Between Design Intent and Reality Constraints

This project is not just a simple combination of robot systems, messaging structures, and cloud technologies, but a philosophical realization of structural design aimed at maintaining the original development experience even within constraints.


In Conclusion

This case was not about a technical problem but about how to design the flow.
The reason we could implement the development experience and operational structure, which seemed only possible in cloud environments, on-premise at almost the same level was because “we didn’t just replicate technology, we transplanted the experience.”

The Role of a Designer

In my view, a designer’s role is not just about bringing in new technology.
A true designer is someone who designs familiar flows to work in new environments.
That is, even if the structure changes, making sure the user’s experience does not change, that’s the power of design.

Future Possibilities

This structure is not confined to a single project.

Design is always a compromise with reality, but the way of that compromise should be such that it is imperceptible to the user. That is what we should aim for in structural perfection.


Go Home
Tags: Platformization Business Integration Project