COMING SOON
Primo Software
Primo is an advanced AI software platform specifically designed to optimize the inference of generative AI models, ensuring seamless deployment and performance at scale. It consists of components to support model optimization, model fine tuning, RAG-based inference, and orchestration across complex model pipelines. The initial version of Primo is focused on supporting applications that require low latency performance across several industry verticals such as healthcare and life sciences, robotics, supply chain management, and public safety. The Primo platform runs on top of Esperanto’s ET-SoC-1 based platforms produced in partnership with Penguin Solutions.
Esperanto delivers the first RISC-V support for Ollama
Esperanto focuses on supporting open-source models, such as small language models and vision language models. As a demonstration of our ability to run a variety of open-source generative models, we are introducing a backend AI infrastructure consisting of ET-SoC-1 based servers from Penguin, Primo software stack running on these servers, family of open-source models, and example applications running on these systems. We wrapped the ET-SoC-1 and Primo based generative AI infrastructure around Ollama to enable people to access our generative AI system from the web to run their own applications or evaluate one of our existing demonstration applications. Esperanto will be making all of the models supported on the Primo system available in HuggingFace.
The integration of the Esperanto Inference Server into Ollama marks a significant milestone in the democratization of Large Language Models (LLMs). This integration introduces the first RISC-V architecture-based hardware support to this popular open-source LLM framework, opening new horizons for innovation and accessibility in AI.
We invite developers, researchers, and enthusiasts to explore these new possibilities and contribute to the growing ecosystem of open and accessible AI technologies.

Transforming Open-Source SLM
into RISC-V Optimized Code
The Primo AI/ML Model Development SDK enables developers to efficiently create and deploy AI/ML models on Esperanto’s RISC-V based accelerator, leveraging a suite of open-source tools for optimization at various stages of the development pipeline.

The Primo AI/ML Model Development SDK consists
of four main categories of tools
Fine-tuning
Including
LoRA and Flash
Attention
Quantization
Including
AWQ and other
quantization-related
tools
Exporting
Including
Jupyter and Torch
Dynamo
ML Compilation
Including
ONNX Runtime and
other ML compiler
technologies
3rd party applications deployed on cloud services like AWS Microservice environment can interface via Ollama’s Web API with any number of Ollama Server instances, attached to an Esperanto RISC-V backend.
Check out Esperanto’s supported applications
Open Web UI
In browser chat application
Open WebUI is an extensible, feature-rich, and user-friendly selfhosted WebUI designed to operate entirely offline. It supports various LLM runners, including Ollama and OpenAI-compatible APIs. In this sample application, our intent is to show how we integrated OpenWebUI to the Esperanto Inference Server via Ollama framework.
Each chat prompt is serviced by a single ET-SoC accelerator card.
MORE DETAILS COMING SOON
Our stack supports workloads
in a variety of industries
Healthcare

Pharmaceutical
