Parallel Programming & Processing on RISC-V

In these labs, we study a series of examples in parallel programming and processing. For each example, we measure execution time and analyze the resulting speedup across different algorithms and execution modes.

In the introductory part, we outline the content, objectives, methodology, and overall organization of the labs. Each laboratory session contains a sequence of practical examples.

Lab 1 introduces the essential elements of assembly programming on RISC-V.

Labs 2 and 3 focus on vector processing (SIMD). We begin with simple kernels—vector addition, dot product, and matrix multiplication—then move to more demanding tasks such as π approximation and a basic FFT filter. Serial baselines are written in C, while vectorized versions are implemented in RISC-V assembly using the vector extension.

In Labs 4 and 5, we turn to image processing, starting with simple image negation and advancing to color-space conversion. Each task is implemented both serially and with vectorization to quantify potential speedups. Image I/O (decode/encode) is handled with OpenCV. We also explore dynamic image generation using OpenGL, rendering directly into video memory. As with static images, we record runtimes and evaluate speedups.

Lab 6 introduces OpenMP to leverage multicore (MIMD) parallelism with threads. We revisit several workloads—π computation, matrix multiplication, and Mandelbrot rendering—and compare performance across execution modes.

In Lab 7, we explicitly contrast scalar (SISD), vector (SIMD), and multicore (MIMD) implementations using the π example. Finally, we demonstrate combined MIMD×SIMD approaches for π calculation and matrix multiplication. In all cases, we report speedups as a function of problem size to highlight scaling behavior and the practical benefits of each technique.

Finally, in Lab 8 we focus on two AI applications that leverage both vector processing and multi-core execution to accelerate the inference of the provided models.

Labs (separate pages)

Lab 0 — Setup & Warm-Up

Prepare your environment: install the toolchain, verify your simulator/hardware setup, and run a first “hello” program to confirm everything works.

Setup Toolchain First Run

Lab 1 — RISC-V Assembly Basics

Essential elements of assembly programming on RISC-V.

RISC-V ASM Toolchain Basics

Lab 2 — SIMD Vector Kernels

Vector add, dot product, matmul; C baseline + RVV assembly.

SIMD RVV Kernels

Lab 3 — SIMD: π + FFT Filter

Demanding vector workloads: π approximation and FFT filter.

SIMD π FFT

Lab 4 — Image Processing: Negation

Serial vs vectorized image negation; measure speedups.

Images OpenCV Vectorization

Lab 5 — Color Space + OpenGL

Color conversion + dynamic rendering into video memory.

OpenCV OpenGL Speedup

Lab 6 — OpenMP (MIMD)

Threads on multicore: π, matmul, Mandelbrot; compare modes.

OpenMP Threads Multicore

Lab 7 — SISD vs SIMD vs MIMD + MIMD×SIMD

Contrast modes on π; then combine MIMD×SIMD for π & matmul.

SISD SIMD MIMD×SIMD

Lab 8 — AI Acceleration (SIMD + Multicore)

Two AI applications using vector + multicore to speed inference.

AI Inference SIMD + MIMD