Working with LLVM on Apple Silicon
Setup
On my MacBook Air M1 with MacOS 26.3.1, clang --version returns the following:
Apple clang version 21.0.0 (clang-2100.0.123.102)
Target: arm64-apple-darwin25.3.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin
To investigate compiler optimization we need different version:
brew search llvm
brew install llvm@18
brew info llvm@18
I want to investigate whether compiler optimization will introduce branching-on-secret for the following function:
void ct_select(uint8_t *r, const uint8_t *a, const uint8_t *b, size_t len, int8_t selector)
{ // Select one of the two input arrays to be moved to r
// If (selector == 0) then load r with a, else if (selector == -1) load r with b
OQS_MEM_BLACK_BOX(selector);
for (size_t i = 0; i < len; i++) {
r[i] = (~selector & a[i]) | (selector & b[i]);
}
}
Build the project:
# from $LIBOQS_DIR/build
cmake -GNinja \
-DCMAKE_C_COMPILER="/opt/homebrew/opt/llvm@18/bin/clang-18" \
-DOQS_MINIMAL_BUILD="KEM_frodokem_640_aes" \
-DCMAKE_C_FLAGS="-DOQS_DISABLE_MEM_BLACK_BOX" \
-DCMAKE_BUILD_TYPE="MinSizeRel" \
..
ninja
Now we have $LIBOQS_DIR/build/src/kem/frodokem/CMakeFiles/frodokem.dir/external/frodo/frodo640aes.c.o. We can use LLVM object dump to convert it back to human-readable assembly.
/opt/homebrew/opt/llvm@18/bin/llvm-objdump -d \
src/kem/frodokem/CMakeFiles/frodokem.dir/external/frodo/frodo640aes.c.o \
> frodo640aes.c.o.S
Use a text editor to inspect frodo640aes.c.o.S. Search for ct_select. Get the following relevant section:
0000000000000eb8 <_oqs_kem_frodokem_640_aes_ct_select>:
eb8: b4000143 cbz x3, 0xee0 <_oqs_kem_frodokem_640_aes_ct_select+0x28>
ebc: 2a2403e8 mvn w8, w4
ec0: 38401429 ldrb w9, [x1], #0x1
ec4: 3840144a ldrb w10, [x2], #0x1
ec8: 0a080129 and w9, w9, w8
ecc: 0a04014a and w10, w10, w4
ed0: 2a090149 orr w9, w10, w9
ed4: 38001409 strb w9, [x0], #0x1
ed8: f1000463 subs x3, x3, #0x1
edc: 54ffff21 b.ne 0xec0 <_oqs_kem_frodokem_640_aes_ct_select+0x8>
ee0: d65f03c0 ret
Let’s read this line by line.
Analysis
eb8: CBZ x3, 0xee0
eb8: b4000143 cbz x3, 0xee0 <_oqs_kem_frodokem_640_aes_ct_select+0x28>
cbz <rn> <label> is “compare and branch on zero”. rn is the register holding the operand. label is the branch destination.
x3 is a general-purpose register. According to ARM Procedure Call Standard, on 64-bit ARM, there are 31 general purpose registers, each with a width of 64 bits. In a 64-bit context, they are referred to by x0 through x30; in a 32-bit context, they are referred to by w0 through w30. The first eight (indexed 0 through 7) are used for parameter and results in procedure calls.
Recall the function signature of ct_select:
void ct_select(uint8_t *dst, const uint8_t *lhs, const uint8_t *rhs, size_t len,
int8_t selector);
x3 correspond to the size_t len, so this instruction says “if len is 0,
then skip to label 0xee0, which is the ret instruction. This makes sense:
if len == 0, then nothing is done, so skip directly to return.
ebc: MVN w8, w4
MVN <rd> <op2> is called “Move Not”. It takes the value from <op2>
perform a bitwise logical NOT, and store into <rd>. rd has to be a register
, while op2 is a flexible second operand.
From the function signature we know that w4 holds the value for int8_t selector
(since selector is only 8-bit in length, using 32-bit context is reasonable).
FrodoKEM’s implementation guarantees selector to be either 0xFF or 0x00:
OQS_STATUS crypto_kem_dec(unsigned char *ss, const unsigned char *ct,
const unsigned char *sk) { /* ... */
// Needs to avoid branching on secret data using constant-time implementation.
int8_t selector = ct_verify(Bp, BBp, PARAMS_N*PARAMS_NBAR)
| ct_verify(C, CC, PARAMS_NBAR*PARAMS_NBAR);
// If (selector == 0) then load k' to do ss = F(ct || k'), else if
// (selector == -1) load s to do ss = F(ct || s)
ct_select((uint8_t*)Fin_k, (uint8_t*)kprime, (uint8_t*)sk_s, CRYPTO_BYTES,
selector);
/* ... */
}
So w8 holds the logical opposite of selector and serves as the opposite mask.
Loop body
LDR <rt>, [<rn>], #offset stands for “Load Register”.
By default LDR loads a word (in 64-bit ARM a word is 32-bit wide). LDRB is a
special case where B stands for unsigned byte. Because B is unsigned, the
extended bits are set to 0. There is also LDRSB, where SB stands for signed
byte, in which case the extended bits are set to the sign bit.
There are many possible syntaxes for LDR. The one used here means “load the
word (or unsigned byte) from the memory address stored at register rn into
register rt, then increment rn by the literal value of offset”. This
is called “post-indexing”. The counterpart is “pre-indexing”, with a syntax of
[<rn> #offset], which means to update the register before the accessing
the value.
In ct_select, x1 and x2 respectively denote the LHS and RHS. If selector
is 0x00, the LHS is selected, else the RHS is selected.
We can now look at the body of the loop:
ec0: 38401429 ldrb w9, [x1], #0x1
ec4: 3840144a ldrb w10, [x2], #0x1
ec8: 0a080129 and w9, w9, w8
ecc: 0a04014a and w10, w10, w4
ed0: 2a090149 orr w9, w10, w9
ed4: 38001409 strb w9, [x0], #0x1
ed8: f1000463 subs x3, x3, #0x1
edc: 54ffff21 b.ne 0xec0 <_oqs_kem_frodokem_640_aes_ct_select+0x8>
This is a very literal translation of the following C code:
// w9, w10 registers are used as local variables
uint8_t w9, w10;
do {
// LDR w9, [x1], #0x1
w9 = *lhs;
lhs += 1;
// LDR w10, [x2], #0x1
w10 = *rhs;
rhs += 1;
// AND w9, w9, w8
w9 = w9 & (~selector);
// AND w10, w10, w4
w10 = w10 & selector;
// ORR w9, w10, w9
w9 = w10 | w9;
// STRB w9, [x0], #0x1
*dst = w9;
dst += 1;
// SUBS x3, x3, #0x1
len -= 1;
} while (
// B.NE 0xec0
len != 0
)
Note that because at eb8 the CBZ instruction already makes sure that len
is greater than 0, it is safe to do a loop body first before checking len.
If len is 0, then the next instruction ret is to return. This function does
not return anything.
Conclusion
A very literal translation of the C code. It does not seem to branch on secret
(in this context we specifically refer to selector).