38th GI/ITG International Conference on Architecture of Computing Systems, Apr 2025
Kiel, Allemagne
Laurent CABARET, Céline HUDELOT, Régis Pierrard, Jean-Philippe POLI

Fuzzy spatial relations are increasingly utilized in visual reasoning tasks, such as semantic annotation and object recognition. However, these tasks often rely on compute-intensive fuzzy morphological operators, leading to significant latency during relation evaluation. Addressing this challenge requires optimized implementations that are tailored to modern architectures. Previous works introduced the Reverse (R) and Parallel Reverse (PR) algorithms for Intel processors, leveraging OpenMP and SIMD intrinsics. In this work, we extend these contributions to embedded systems by targeting ARM-based processors and NVIDIA embedded GPUs. Specifically, we propose three architecture-specific implementations: PR64N, using 64-bit NEON SIMD instructions; PR128N, using 128-bit NEON instructions; and PRGPU, a GPU-optimized version based on CUDA. Our evaluation is conducted on the NVIDIA Jetson Orin Nano Super platform, an advanced ARM-based system-on-chip designed for low-power edge AI applications. The results demonstrate that our CPU implementations achieve near-peak performance by fully exploiting the platform's memory bandwidth. Meanwhile, the GPU implementation efficiently offloads fuzzy spatial relation evaluations, allowing the CPU to manage additional workloads. These findings underline the suitability of our methods for enabling visual reasoning tasks on resource-constrained embedded systems and contribute to the broader discussion on addressing heterogeneous architectures with tailored algorithms.