rdar://30517694: Context switch does not restore full FPU state

Context switch does not restore full FPU state

Originator:	mark
Number:	rdar://30517694	Date Originated:	2017-02-14
Status:	Open	Resolved:
Product:	macOS + SDK	Product Version:	10.12.3 16D32
Classification:	Serious Bug	Reproducible:	Always

Area:
Something not on this list

Summary:
An x86_64 process can use certain x87 instructions. Whenever an x87 instruction executes, the FIP (FPU Instruction Pointer Offset) register within the CPU is updated to contain the address of the instruction. See Intel SDM Volume 1 (253665-001) §8.1.8.

The FIP register is 64 bits wide on a 64-bit CPU. In a 64-bit process, instructions of x87 instructions can be 64 bits wide.

If an x87 instruction is executed in a 64-bit process, and a context switch occurs, when the process next runs, the FIP register (and related FDP, FPU Data Pointer Offset register) are not restored correctly. Only the low 32 bits are restored.

The attached test program exhibits this behavior. It prints the contents of the FIP register (obtained via the FXSAVE instruction) three times. The first time, the FPU has been initialized but has not done anything else, and FIP is zero. It then allocates a block of memory, prints the address of this block, and writes an x87 instruction to it. It executes the newly written code, and then prints the FIP register’s value again, which should equal the address of the allocated block. Finally, it sleeps for 100ms to effect a context switch, and upon resuming, it prints the FIP register’s value one last time. The last print should show a value equal to the previous one, but it does not.

There are three versions of the FXSAVE structure documented by the Intel SDM Volume 2A (253666-061) §3.2 in the FXSAVE section. There’s the 32-bit FXSAVE format (table 3-43) which is not relevant here. Then there’s the FXSAVE64 format (table 3-46) and 64-bit FXSAVE format (table 3-47). The difference between the two is that the 64-bit FXSAVE format limits the FIP and FDP fields to 32 bits, and provides additional 16-bit fields for FCS and FDS. The FXSAVE64 format omits FCS and FDS, and contains full 64-bit FIP and FDP fields.

For saving and restoring the context of a 64-bit process, the FXSAVE64 format is correct. However, the kernel uses the FXSAVE format. See 10.12.3 xnu-3789.41.3/osfmk/i386/fpu.c, which only uses the fxsave instruction, and never fxsave64.

FCS and FDS may be more relevant to 32-bit processes where 286-style segmentation is available in more than a simplified vestigial form. Even so, modern CPUs always force FCS and FDS to 0. See Intel SDM Volume 1 (253665-061) §8.1.8 under the “FXSAVE, XSAVE, and XSAVEOPT” bullet regarding FCS and FDS deprecation, and Intel SDM Volume 2A (253666-061) §3.2 in the CPUID section, table 3-8, CPUID, with EAX=07h, ECX=0, EBX bit 13: “deprecates FPU CS and DS values if 1.” On these CPUs, no additional information can be gleaned from the FXSAVE structure over the FXSAVE64 structure even for 32-bit processes that may be using segmentation.

It would be correct to use the fxsave64 format for 64-bit processes, and it would not be incorrect to use it for 32-bit processes, although the fxsave format is valid for 32-bit processes too.

Steps to Reproduce:
$ clang++ -std=c++11 fpu_ip.cc -o fpu_ip
$ ./fpu_ip

Expected Results:
$ ./fpu_ip
fxsave.fpu_ip_64 = 0x0
block = 0x10f786000
fxsave.fpu_ip_64 = 0x10f786000
fxsave.fpu_ip_64 = 0x10f786000

Note that “block” and the second two installments of “fxsave.fpu_ip_64” are identical.

Actual Results:
$ ./fpu_ip
fxsave.fpu_ip_64 = 0x0
block = 0x1028ed000
fxsave.fpu_ip_64 = 0x1028ed000
fxsave.fpu_ip_64 = 0x28ed000

Note that the final “fxsave.fpu_ip_64” has been truncated to 32 bits.

Version:
10.12.3 16D32
xnu-3789.41.3

Notes:

Configuration:

Attachments:
'fpu_ip.cc' was successfully uploaded. http://pastebin.com/KZNk8xyH

Comments

2017-07-20 20:36 UTC to Apple

I still experience this on 10.13db3 17A306f with xnu-4532.0.0.0.1~23 on a MacBook Pro (15-inch, 2016) (MacBookPro13,3) with a 2.7GHz Intel Core i7-6820HQ:

% clang++ -std=c++11 fpu_ip.cc -o fpu_ip % ./fpu_ip fxsave.fpu_ip_64 = 0x0 block = 0x10327e000 fxsave.fpu_ip_64 = 0x10327e000 fxsave.fpu_ip_64 = 0x327e000

Again, the last two lines in fpu_ip’s output should be identical, and should match the “block” line, but the fpu_ip restored after a context switch (the final line printed) is truncated to 32 bits.

sysdiagnose_2017.07.20_16-30-20-0400-Mac_OS_X-MacBookPro13,3-17A306f.tar.gz attached 200.63 MB

By mark at July 20, 2017, 8:37 p.m. (reply...)

2017-07-20 19:58 from Apple

Engineering has requested the following information regarding your bug report:

Please verify this issue with the macOS 10.13 beta and update your bug report at https://bugreport.apple.com/ with your results.

macOS 10.13 beta 3 (17A306f) https://developer.apple.com/download/

If the issue persists and it's applicable to your bug report, please attach a new sysdiagnose captured in the latest build and attach it to the bug report.

macOS sysdiagnose Instructions: https://developer.apple.com/services-account/download?path=/OS_X/OS_X_Logs/sysdiagnose_Logging_Instructions.pdf

For a complete list of logging instructions visit: https://developer.apple.com/bug-reporting/profiles-and-logs/

By mark at July 20, 2017, 8:02 p.m. (reply...)

Please note: Reports posted here will not necessarily be seen by Apple. All problems should be submitted at bugreport.apple.com before they are posted here. Please only post information for Radars that you have filed yourself, and please do not include Apple confidential information in your posts. Thank you!

Open Radar

Community bug reports

Context switch does not restore full FPU state

Comments

2017-07-20 20:36 UTC to Apple

2017-07-20 19:58 from Apple