三、测试与总结
1、用C语言编写一个自测程序test1.c
#include#include #include #include #include #include "mul384.h"int main(int argc, char *argv[]){ uint64_t a[6], b[6], c[12], i; //初始化数据 a[5] = 0xEEEEEEEEEEEEEEEE; a[4] = 0xCCCCCCCCCCCCCCCC; a[3] = 0xCD8964545891EBC5; a[2] = 0xF8F0D7E9F7D5C30A; a[1] = 0x3E7EB0B141E265DC; a[0] = 0xC459138BCE6E7F2D; b[5] = 0xCD8964545891EBC5; b[4] = 0xF8F0D7E9F7D5C30A; b[3] = 0x3E7EB0B141E265DC; b[2] = 0xC459138BCE6E7F2D; b[1] = 0xEEEEEEEEEEEEEEEE; b[0] = 0xCCCCCCCCCCCCCCCC; //整数乘法运算 for(i = 0; i < 100000000; i++) mul384(c, a, b); //结果输出 printf("%016lx%016lx%016lx%016lx\n", c[11], c[10], c[9], c[8]); printf("%016lx%016lx%016lx%016lx\n", c[7], c[6], c[5], c[4]); printf("%016lx%016lx%016lx%016lx\n", c[3], c[2], c[1], c[0]); exit(EXIT_SUCCESS);}
编译并运行:
gcc -Wall -O2 test1.c mul384.s -o test1time ./test1bfd590d741994274668a339bec91ebef2acb6a25db704d7c7fe4faabc9b8246b687462a249cfed730f66a066ce6f3c5d3b1a0c40603abeb865aaafee147c0410c086d7bc4c82ac9e655392a75eef149809285ef9273c26162fb8bd29c14133dcreal 0m7.524suser 0m7.520ssys 0m0.003s
2、调用用GMP运算库编写对比测试程序test2.c
#include#include #include #include #include #include int main(int argc, char *argv[]){ mpz_t a, b, c; int i = 0; //初始化内存 mpz_init2(a, 384); mpz_init2(b, 384); mpz_init2(c, 768); //初始化数据 mpz_init_set_str(a, "EEEEEEEEEEEEEEEE" "CCCCCCCCCCCCCCCC" "CD8964545891EBC5" "F8F0D7E9F7D5C30A" "3E7EB0B141E265DC" "C459138BCE6E7F2D", 16); mpz_init_set_str(b, "CD8964545891EBC5" "F8F0D7E9F7D5C30A" "3E7EB0B141E265DC" "C459138BCE6E7F2D" "EEEEEEEEEEEEEEEE" "CCCCCCCCCCCCCCCC", 16); //整数乘法运算 for(i = 0; i < 100000000; i++) mpz_mul(c, a, b);//c = a *b //结果输出 mpz_out_str(stdout, 16, c); fprintf(stdout, "\r\n"); //释放内存 mpz_clear(a); mpz_clear(b); mpz_clear(c); exit(EXIT_SUCCESS);}
编译并运行(为了照顾阅读,我把test2的输出复制到博文后折行了)
gcc -Wall -O2 test2.c -lgmp -o test2time ./test2bfd590d741994274668a339bec91ebef2acb6a25db704d7c7fe4faabc9b8246b687462a249cfed730f66a066ce6f3c5d3b1a0c40603abeb865aaafee147c0410c086d7bc4c82ac9e655392a75eef149809285ef9273c26162fb8bd29c14133dcreal 0m9.518suser 0m9.517ssys 0m0.001s
编程测试相关操作系统以及平台信息(cpuinfo信息折行显示)
uname -a2.6.32-504.8.1.el6.x86_64 #1 SMP Wed Jan 28 21:11:36 UTC 2015 x86_64 x86_64 x86_64 GNU/Linuxcat /proc/cpuinfoprocessor : 0vendor_id : GenuineIntelcpu family : 6model : 37model name : Intel(R) Core(TM) i3 CPU M 370 @ 2.40GHzstepping : 5microcode : 4cpu MHz : 933.000cache size : 3072 KBphysical id : 0siblings : 4core id : 0cpu cores : 2apicid : 0initial apicid : 0fpu : yesfpu_exception : yescpuid level : 11wp : yesflags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt lahf_lm arat dts tpr_shadow vnmi flexpriority ept vpidbogomips : 4787.92clflush size : 64cache_alignment : 64address sizes : 36 bits physical, 48 bits virtual
3、全文小结
对比输出数据和程序运行时间,可以初步肯定这个384位乘法的x64汇编程序的编写成果。对于追求极限速度和代码精简的场合来说,这个汇编小程序有它独到的价值。当然,它的缺点与优点一样突出:
优点:高速、小巧、独立、线程安全、可重入、运算过程无内访问。
缺点:CPU限制(必须支持SSE42),平台限制(必须是基于AMD64 ABI的操作系统),用户态限制(不能用于内核态),16字节对齐限制。