memset_pattern* apparently twice as slow as it need be

Originator:dognotdog
Number:rdar://17259579 Date Originated:11.06.2014
Status:Open Resolved:
Product:OS X Product Version:
Classification:Performance Reproducible:Always
 
Summary:
About 5 years ago, we implemented a bunch of routines to set a large amount of memory to the pattern of 0xDEADBEEF, to find out the fastest way. In the attached code, on x86, deadbeef_dog is 2x as fast as the next closer alternatives, including memset_pattern4/8/16.

This came up again just now, and we tested again, it's still the same. So here's a radr about it.

Steps to Reproduce:
Compile katie.c as instructed in the comments in the file, run it, check results.

Expected Results:
Expected that memset_pattern (deadbeef_applesauce(), deadbeef_pattern4()) is at least as fast as deadbeef_dog() for large amounts of memory being set.

Actual Results:
memset_pattern is only half the speed of the deadbeef_dog() implementation.

Version:
OS X 10.6 through 10.9 on various hardware

Notes:


Configuration:
Tested on Core 2 Duo and newer processors.

Attachments:
'katie.c' was successfully uploaded.

Comments


Please note: Reports posted here will not necessarily be seen by Apple. All problems should be submitted at bugreport.apple.com before they are posted here. Please only post information for Radars that you have filed yourself, and please do not include Apple confidential information in your posts. Thank you!