

# Microarchitectural Minefields: 4K-Aliasing Covert Channel and Multi-Tenant Detection in IaaS Public Clouds

**DEAN SULLIVAN,** ORLANDO ARIAS<sup>\*</sup>, TRAVIS MEADE<sup>\*</sup>, YIER JIN UNIVERSITY OF FLORIDA, <sup>\*</sup>UNIVERSITY OF CENTRAL FLORIDA



### TL;DR

4K-Aliasing timing channel:

- Speculatively executed younger writes falsely aliasing with older loads
- Side effect of memory ordering in the memory order buffer
- Measurable across address spaces
  - Processes
  - Virtual machines

On public IaaS clouds:

- Fast and robust covert channel
- Practical multi-tenant detection



# Timing Channel Background



### **Typical Architecture**









### **Typical Architecture**



#### Covert Channel in the Cloud





#### Covert Channel in the Cloud







#### Covert Channel Related Works

A lot of great work has made these covert channels

- Fast
- Robust
- Practical



### Limitations of Prior Covert Channels

Speed bounded by time to access shared resource

Susceptible to detection



# Can we do as good, or better, with a core-private resource?



#### Not this...













# Faster? → Send more!

#### Core private? $\rightarrow$ Avoid detection!

14 February 20, 2018 – NDSS'18



#### Partial Memory Hierarchy





#### **Complete Memory Hierarchy**









# Memory Ordering Buffer

Handles in-flight memory loads and stores that execute:

- Out-of-order
- Speculatively

#### Enforce memory ordering rules:

- Retire loads and stores with correct values
- For example:
  - Loads can be reordered with older stores to different locations

#### Implements methods for dynamically extracting ILP

- Memory disambiguation prediction
- Store-to-load forwarding



4K-Aliasing?

Intel assumes dependency between 4 KB separated memory loads and stores



# 4K-Aliasing?

Intel assumes dependency between 4 KB separated memory loads and stores

Avoids potential write-after-read hazard



## 4K-Aliasing?

Intel assumes dependency between 4 KB separated memory loads and stores

Avoids potential write-after-read hazard

• When a later write passes an earlier read

mov rax, [rbx] \\read
mov [rbx], rcx \\write

• The earlier read **must NOT** load the result written by the later store



# Loads and stores separated by 4 KB will falsely alias













































#### False 4K-Aliasing



Performance of memory copy routine falls off when source and destination buffer are separated by n \* 4 KB





















**Step 1**: Fill MOB with 4K addresses

**Step 2**: Load from 4K-aligned address





**Step 1**: Fill MOB with 4K addresses

**Step 2**: Load from 4K-aligned address
# **4K-Aliasing Timing Channel**





**Step 1**: Fill MOB with 4K addresses

**Step 2**: Load from 4K-aligned address

# **4K-Aliasing Timing Channel**





**Step 1**: Fill MOB with 4K addresses

**Step 2**: Load from 4K-aligned address

Step 3: Latency of 4K-aliasing load slow



#### Single Process



**Across Processes** 







UF FLORIDA





UF FLORIDA



UF FLORIDA



**PO** 

OS

CPU

### 4K Latency Across Micro-Arch Families



#### **4K Latency Across Processes**



Considerable
background noise

UF FLORIDA

 Similar cycle latency to single process



# Can we eliminate background noise?



# Improving 4K-Aliasing Latency



- Linear correlation between no. of aliasing loads and cycle latency
- We can improve measured latency by adding more loads within measurement window



#### Improving 4K-Aliasing Threshold





#### Improving 4K-Aliasing Threshold





#### Improving 4K-Aliasing Threshold





# 4K-Aliasing Modulation: Separated by 256 B



 Issue 4K aligned load every 16<sup>th</sup> time



# 4K-Aliasing Modulation: Separated by 256 B



 Issue 4K aligned load every 16<sup>th</sup> time

4K signal?



# 4K-Aliasing Modulation: Separated by 256 B



- Issue 4K aligned load every 16<sup>th</sup> time
- 4K signal?
- Better threshold



#### 4K Load vs. Error Rate





#### Communication Protocol?



#### **Detecting Sender**

1-wire communication





#### **Detecting Sender**

1-wire communication



Automatically detect sender



#### Detecting Receiver

Use store-to-load forwarding loop

**Competition** for hyperthreading resources degrades performance



### Message Recovery

Initialization and completion messages

• Our channel is fast, so we can deal with repeated tries

Break the message up into packets

Limits impact of retransmission



# In-House Channel Capacity

|                | 256 B  | 512 B  | 1024 B | 2048 B |
|----------------|--------|--------|--------|--------|
| <b>8</b> 0     | 0.0075 | 0.0029 | 0.0093 | 0.0057 |
| <b>E</b> 1     | 0.0502 | 0.0159 | 0.0134 | 0.0267 |
| Bits per Ch.   | 0.824  | 0.927  | 0.918  | 0.886  |
| Ch. Cap (Mbps) | 1.62   | 1.83   | 1.81   | 1.75   |



# In-House Channel Capacity

|                       | 256 B  | 512 B  | 1024 B | 2048 B |
|-----------------------|--------|--------|--------|--------|
| <b>E</b> 0            | 0.0075 | 0.0029 | 0.0093 | 0.0057 |
| <b>E</b> <sub>1</sub> | 0.0502 | 0.0159 | 0.0134 | 0.0267 |
| Bits per Ch.          | 0.824  | 0.927  | 0.918  | 0.886  |
| Ch. Cap (Mbps)        | 1.62   | 1.83   | 1.81   | 1.75   |



# In-House Channel Capacity

|                | 256 B  | 512 B  | 1024 B | 2048 B |
|----------------|--------|--------|--------|--------|
| <b>6</b> 0     | 0.0075 | 0.0029 | 0.0093 | 0.0057 |
| <b>E</b> 1     | 0.0502 | 0.0159 | 0.0134 | 0.0267 |
| Bits per Ch.   | 0.824  | 0.927  | 0.918  | 0.886  |
| Ch. Cap (Mbps) | 1.62   | 1.83   | 1.81   | 1.75   |







# Requires HW Hyperthreading?!?



# Requires HW Hyperthreading?!?

EC2 does it

GCE does it

Azure does it



# Requires HW Hyperthreading?!? EC2 does it GCE does it Lowers the total cost of ownership Azure does it

67 February 20, 2018 – NDSS'18























# Challenges

Separating 4K-aliasing from background noise

- Establish baseline without cooperating VM
- Iteratively scale-up VM instances transmitting 4K signal
- Repeat the measurement 5 times


## Challenges

#### Launch strategy

- Launch pairwise sender and receiver VMs
- Utilize prior [1] colocation placement strategies
- Scale up to 20 pairwise sender/receiver VMs

[1] V. Varadarajan, et al. A placement vulnerability study in multi-tenant public clouds. Usenix Security Symposium, 2015.



## Challenges

#### Efficient test setup

- Sender continuously transmits/Receiver polls for 4K-aliasing for 10 s
- Decrease measurement time by launching all senders at once
- Sequentially launch receiver VMs every hour



#### **Colocation Results**





#### **Colocation Results**





### **Colocation Results**

As good as cross-core multi-tenant detection techniques

- WRT launch strategy
- No. of instance pairs to detect multi-tenancy



#### What about Cross-Core?





#### M.F. Chowdury and D.M. Carmean. Maintaining processor ordering by checking load addresses of unretired load instructions against snooping store address. Feb 3 2004, US Patent 6,687,809



#### Conclusion

Out-of-Order execution and speculative execution are new attack vectors

4K-aliasing (ab)uses speculation on memory instructions and the microarchitecture used to maintain memory consistency

We demonstrate 4K-aliasing on public laaS clouds

- Fast and robust covert channel
- Practical multi-tenant detection



# Questions?