c - How to update an array in vectorized assembly(AVX)? -


inline void addition(double * x, const double * vx,uint32_t size){     /*for (uint32_t i=0;i<size;++i){         x[i] = x[i] + vx[i];     }*/     __asm__ __volatile__ (     "1: \n\t"       "vmovupd    -32(%0), %%ymm1\n\t"     "vmovupd    (%0), %%ymm0\n\t"     "vaddpd     -32(%1), %%ymm0, %%ymm0\n\t"     "vaddpd     (%1), %%ymm1, %%ymm1\n\t"      "vmovupd    %%ymm0, -32(%0)\n\t"     "vmovupd    %%ymm1, (%0)\n\t"      "addq   $128, %0\n\t"     "addq   $128, %1\n\t"      "addl   $-8, %2\n\t"     "jne    1b"         :          : "r" (x),"r"(vx),"r"(size)         : "ymm0", "ymm1"     ); } 

i practicing assembly(avx instructions) right write above piece of code in inline assembly replace c code in original function(which commented out). compiling process successful when try run program, error happens: bus error: 10 thoughts bug? didn't know what's wrong here. compiler version clang 602.0.53. thank you!

inline assembly complicated beast, if want practice avx assembly use separate asm file don't have put compiler. in exchange, need observe calling convention though.

you have issues constraints. example, change input registers without telling compiler , can cause sorts of weird problems elsewhere in compiler generated code. need specify memory clobber obvious reasons.

also, learn use debugger can find exact cause of problems , fix own code.

failing that, @ least comment code can figure out intentions. in case, particularly puzzled why use -32 offset address before array. think wanted +32 there. using 2 avx registers @ 32 bytes each, of course need advance pointers 64 not 128. have ymm0 , ymm1 swapped in initial load.

this code seems work fine me:

#include <stdio.h> #include <stdint.h>  inline void addition(double * x, const double * vx,uint32_t size){     /*for (uint32_t i=0;i<size;++i){         x[i] = x[i] + vx[i];     }*/     __asm__ __volatile__ (     "1: \n\t"       "vmovupd    32(%0), %%ymm0\n\t"     "vmovupd    (%0), %%ymm1\n\t"     "vaddpd     32(%1), %%ymm0, %%ymm0\n\t"     "vaddpd     (%1), %%ymm1, %%ymm1\n\t"      "vmovupd    %%ymm0, 32(%0)\n\t"     "vmovupd    %%ymm1, (%0)\n\t"      "addq   $64, %0\n\t"     "addq   $64, %1\n\t"      "addl   $-8, %2\n\t"     "jne    1b"         : "+r" (x),"+r"(vx),"+r"(size)         :         : "ymm0", "ymm1", "memory"     ); }  int main() {     double x[] = { 1, 2, 3, 4, 5, 6, 7, 8 };     double vx[] = { 9, 10, 11, 12, 13, 14, 15, 16 };     int i;     addition(x, vx, 8);     for(i = 0; < 8; i++) printf("%g ", x[i]);     putchar('\n');     return 0; } 

Comments

Popular posts from this blog

javascript - jQuery: Add class depending on URL in the best way -

caching - How to check if a url path exists in the service worker cache -

Redirect to a HTTPS version using .htaccess -