• Ali Shariat's avatar
    reduce memory access in linear solve loop · ace8c5c8
    Ali Shariat authored
    By defining this temp variable we use that fact that `i` is never
    equal to `Li[j]`. Compiler does not have this information.
    
    binary code for the loop changes from
    ```
            movsx   rdi, DWORD PTR [rdx+rax*4]
            vmovss  xmm0, DWORD PTR [rcx+rax*4]
            add     rax, 1
            lea     rdi, [r8+rdi*4]
            vmovss  xmm1, DWORD PTR [rdi]
            vfnmadd132ss    xmm0, xmm1, DWORD PTR [r9]
            vmovss  DWORD PTR [rdi], xmm0
    ```
    
    to
    
    ```
            movsx   rdi, DWORD PTR [rdx+rax*4]
            vmovss  xmm0, DWORD PTR [rcx+rax*4]
            add     rax, 1
            lea     rdi, [r8+rdi*4]
            vfnmadd213ss    xmm0, xmm1, DWORD PTR [rdi]
            vmovss  DWORD PTR [rdi], xmm0
    ```
    
    notice the drop of the first `vmovss` by the compiler.
    ace8c5c8