Development of a Runtime Measurement System

Chapter Overview

Introduction

I am developing a system designed to accurately measure and analyze the runtime of various software and hardware processes. The idea for this system, which lies at the intersection of software and hardware, arose from the need for precise timing and measurement solutions within the context of embedded systems development.

Let's examine a practical use case.

Our Use Case - The Development Process

Our developed system (System A) handles two distinct events: Event A (entering) and Event B (exiting).

These events are represented by rising and falling edges on a digital signal line. The time interval between these two events (t_B-t_A) represents the runtime of the process we want to measure.

Since our system must fulfill specific timing requirements, t_B-t_A must be within the maximum allowed time constraints. Therefore we add the green line Δt_{max_allowed}.

When we look into our system, the system consists of two modules. Module 1 is responsible for processing input data, while Module 2 handles output data. Our System A fulfills the requirement. Therefore we release this system with version 1.0.

Let's assume a fictitious customer requests a new feature for our system. Leading to a third module being added to the system. After a few months the Module 3 is almost completed and integrated into the system. Integration tests can finally be performed to ensure that the new module works correctly with the existing modules.

After integrating Module 3, we observe that the runtime (t_B-t_A) has increased and now exceeds the maximum allowed time constraint (Δt_{max_allowed}). This indicates that the addition of Module 3 has negatively impacted the system's performance, leading to a violation of the timing requirements.

Conclusion - The Development Process

Now we got a better understanding of the engineering perspective. But to really understand the full problem, we have to have a look into the development workflow we used above.

Our Use Case - The Development Workflow

Summary

This workflow clearly demonstrates the need for a more structured approach to runtime monitoring. With the current development process and workflow, we are not able to effectively monitor and manage runtime performance during our processes and workflows. One-shot measurements are not sufficient to capture the dynamic behavior of the system over time.

The summary was easy, I guess. Before we dive into the solution space, let's explore the problem more deeply.

The Problem Statement

With the described use case above, we've realized we have different problems that need to be addressed.

Workflow related challenges

With our current development process to develop our System A, we face several challenges:

Lack of early detection for performance issues since we rely on prerelease runtime analysis.
Inability to track performance trends during the development phase over time
One-shot manual measurements are time-consuming and error-prone

What we need is:

Detect timing violations immediately, not just before release
Prevent costly late-stage fixes
Track performance trends throughout development
Rather than discovering issues at release time, systematic automated measurements during each development phase allow us to maintain control over system timing constraints.
Replace our one-shot measurements directly before release as they provide only one data point, but do not offer insights into performance trends over time or under varying conditions.

Measurement accuracy related challenges

Software-based measurements are popular and easy to realize. However, they change the system under test since you add measurement code. Accuracy is limited and dependent on the speed the embedded system is clocked at and the overhead introduced by the software-defined runtime measurement. This results in massively changing the time behavior of the system. To reduce the intrusiveness, changing the measurement methodology to use external hardware can be useful.

The Idea / The Improvement Statement

Based on the problems identified above, the solution is clear: we need an automated, non-intrusive, and easy-to-integrate runtime measurement system that can be embedded directly into the development workflow.

Instead of relying on software-based measurements that alter system behavior, we propose using an FPGA-based external measurement system that:

Monitors the system's input and output signals with a minimum need for modifying the system under test
Provides high-resolution timing measurements using an internal or external oscillator
Requires minimal setup and wiring
Can be integrated into existing development workflows
Generates performance reports and trend analysis

This approach eliminates the overhead of software-based measurements while providing developers with continuous visibility into system performance. By catching timing violations early and tracking performance trends throughout development, we can prevent costly late-stage fixes and maintain strict control over timing constraints.

The FPGA acts as an independent observer, passively measuring the time between Event A (entering) and Event B (exiting) without interfering with the system's actual operation. This provides accurate, reliable measurements that reflect the true system behavior.

Let's enter the solution space.

Proposed Development Workflow with Runtime Analysis (During Development)

This proposal seems obvious. You might think: "I have thought of that just by reading this in 5 minutes." However, the challenge lies in implementing and maintaining such a systematic approach throughout development. Since:

We want to prevent additional effort and guidance for developers for timing analysis. It must be optional instead of obligatory. We trust the developer to decide by himself when to perform timing analysis.
We want to prevent buying resource intensive tracing tools and infrastructure even though there is no need to look into each of our modules internal behaviour. We just want to measure the overall system performance t_B-t_A at this validation stage. Everything else is overkill.
We want to prevent time consuming setup, wiring. If it is hard to integrate, it will not be used.

Highlevel System Design Diagram

    graph TD
    I(Measurement Input Signal) --- F(FPGA)
    O(Oscillator) --- F(FPGA)
    F(FPGA) ---|UART| S(Smaller CPU)
    S --- |USB/ ETH| L(Larger CPU)

    style F fill:#FFFFED,stroke:#FCE992
    style I fill:none,stroke:#228B22
    linkStyle 0 stroke:#228B22

FPGA Focused System Design Diagram

    graph TD
    I(Measurement Input Signal) --- F
    Hz(1Hz Debug Pin) --- F
    D(Debug Pin) --- F
    O(optional external Oscillator) --- F
    
    subgraph F[FPGA]
        direction TB
        subgraph Left[" "]
            direction TB
            RE[Rising & Falling Edge Detector] --- CC
            CC[ Clock Counter]
            CC --- DSM[Calulation & Measurement Data Storage Module]
            DSM --- U
            OO(Onboard Oscillator) --- CC
            RE --- DSM
        end
        subgraph U[UART]
            direction TB
            URX[UART RX]
            UTX[UART TX]
        end
        Left ~~~ U
    end
    
    F ---|UART| S(Smaller CPU - intermediate storage)
    S ---|USB/ ETH| L(Larger CPU - visualization)
    
    style I fill:none,stroke:#228B22
    style Hz fill:none,stroke:#000000
    style D fill:none,stroke:#000000
    style Left fill:none,stroke:none

Toolchain for Simulation and Verification of Verilog Design

VVP

vvp is the runtime engine that executes the default compiled form generated by Icarus Verilog.

GTKWave

Programm to view waveforms. Used after simulation.

Configuration file: .gtkw

    graph LR
        d(Design Implementation in Verilog) -->|.verilog / .v| IV(iverilog compiler)
        IV---> |.vvp| VVP(vvp)
        VVP---> |.vcd| G(Gtkwave)
        G ---> d

Toolchain for Flashing and Verification on Real Hardware

    graph LR
        .verilog --> YS(Yosys Script)
        YS-->|.ys| Y(Yosys)
        .pcf --> N(Nextpnr)
        Y --> |.json, .blif,synth.v| N(Nextpnr)
        N --> |.asc| I(icepack)
        I --> |.bin| BF2(bin2uf2 Conversion)
        BF2 --> |.uf2| FPGA(Flash to FPGA)
        style .verilog fill:none,stroke:none
        style .pcf fill:none,stroke:none

Development Roadmap & Future Enhancements

Add / Verify option to integrate an external oscillator.
Evaluate memory storage size on the FPGA side. Develop option to use shared RAM?
Add option to configure the UART data rate. Currently only 9600 bits/second.
Add available measurement points (pins).
Stabilize tx and rx of UART communication from and to FPGA
Implement firmware for raspberry pi (smaller CPU) to request & receive the data and publish is via ETH
Data visualization implementation on larger CPU
Stable protocol definition

System Resolution

Resolution and accuracy depends on the clocking speed of the system. What is our accuracy if we use an oscillator with frequency f?

Frequency (f)	Period (T)
12 MHz	83.33 ns
50 MHz	20 ns
100 MHz	10 ns
500 MHz	2 ns
1 GHz	1 ns

This means with a clocking oscillator of 100 MHz we can measure time events with a resolution of 10 ns.

Resolution: the smallest time increment you can distinguish (determined by clock period)
Accuracy: how close your measurement is to the true value (affected by clock stability, jitter, calibration, etc.)