About TestCPU

Programmed by Robert Smid
Detects about 60 processor types and displays picture of cpu
Detects processor features and cache sizes
Contains processor museum with pictures and description
Shows memory transfer speed using different instructions
Contains standard benchmarks

English translation of this manual by Geri Simm
Some parts (a worse translation) added by me :-)


Minimum system requirements

Operating system Windows 95/98/NT/2000
32-bit processor i386 (with FPU)
4 MB free memory
256 color desktop
Mouse recommended


Splashscreen

Close all applications first, to ensure they don't influence results. When you run the exe-file, a small splashscreen will be displayed. While this window is visible the program is detecting your CPU type and frequency. Then it runs memory benchmarks and a set of standard CPU/FPU benchmarks (Dhrystones, Whetstones, MIPS, MFLOPS).
If this splashscreen is displayed for a long time, possibly TestCPU is waiting for maximum priority, which the benchmarks demand, or the memory tests need more time than expected.



PAGE 1 - PROCESSOR

First page contains basic informations about CPU.
I use three methods to find the CPU type.
First method
uses subtle differences between the listed CPUs to distinguish between them. The detection method is shown in brackets :

Intel i386SX
Intel i386DX
Intel i486SX
Intel i486DX
Intel Pentium
Cyrix Cx486S
Cyrix Cx486DX
NexGen Nx586
NexGen Nx586FP

(has POPAD bug)
(without FPU)
(has AC bit in flags register)
(has ID bit in flags register, signs CPUID instruction support)
(without FPU)
(Cyrix CPUs don't change undefined flags after division)
(without FPU)
(NexGen CPUs don't change zero-flag after division)

Second method
uses the results of first method to identify these CPUs by the frequency:

AMD Am386SX
AMD Am386DX
Intel i486DX2
AMD Am486DX4
Cyrix Cx486DX2
Cyrix Cx486DX4
Cyrix 6x86
Cyrix 6x86MX
(40MHz)
(40MHz)
(50 and 66MHz)
(100 and 120MHz)
(66 and 80 MHz)
(100MHz)
(all versions, 80 to 150MHz)
(all versions from 166MHz)

Third method
is performed only on those CPUs which support the CPUID instruction. That is, mainly all CPUs manufactured after the first Intel Pentium, introduced in 1993. Also new 486s from 1994 on support this instruction. However, motherboards with the Cyrix 5x86, Cyrix 6x86 and NexGen CPUs installed will usually have this instruction disabled in BIOS; for correct detection it needs to be enabled by software. This CPUID instruction returns enough information about CPU type to allow all new CPUs to be easily identified. Here are the CPUs recognized by this method:

Intel i486DX
Intel i486SX
Intel i486DX2
Intel i486SL
Intel i486SX2
Intel i486DX4
Intel i486DX4 OverDrive
Intel Pentium
Intel Pentium OverDrive
Intel Pentium MMX
Intel Pentium MMX OverDrive
Intel Pentium Pro
Intel Pentium II OverDrive
Intel Pentium II
Intel Pentium II Xeon
Intel Pentium II PE (mobile)
Intel Celeron
Intel Celeron A (Slot1)
Intel Celeron A (Socket370)
Intel Pentium III
Intel Pentium III Xeon
Intel Pentium III E
Intel Pentium III E Xeon

UMC U5S
UMC U5D
UMC U486SX2
UMC U486DX2

AMD Am486DX2
AMD Am486DX4
AMD Am5x86
AMD K5
AMD K6
AMD K6-2
AMD K6-III
AMD Athlon

Cyrix MediaGX
Cyrix Media GXm
Cyrix 5x86
Cyrix 6x86
Cyrix 6x86MX
Cyrix M-II

Centaur/IDT WinChip
Centaur/IDT WinChip 2
Centaur/IDT WinChip 2A
Centaur/IDT WinChip 2B
Centaur/IDT WinChip 3

Rise mP6

NexGen Nx586
NexGen Nx586FP
NexGen Nx686


The CPUID instruction can be executed in several levels.
First level of CPUID instruction

returns a vendor specific string:

"GenuineIntel"
"AuthenticAMD"
"CyrixInstead"
"NexGenDriven"
"CentaurHauls"
"RiseRiseRise"
"UMC UMC UMC"
this string is returned by Intel processors
returned by AMD processors
returned by Cyrix processors
returned by NexGen processors
returned by Centaur/IDT processors
returned by Rise processors
returned by UMC processors


Second level of the CPUID instruction
returns information about type, family, model, stepping(revision) and other CPU features.
This application contains a small database, which adds a short description to this detected information. Here's a brief explanation of the values returned by CPUID:

TYPE has these values:
0 - means that TestCPU is using the primary (or only) CPU in the system
1 - means an OverDrive processor is installed, i.e. an upgrade by direct replacement of a CPU on old motherboards
2 - means a secondary (auxiliary) CPU in a multiprocessor system

FAMILY is almost equivalent to generation and denotes CPU "performance" class:
4 - all 486s, AMD 5x86, Cyrix 5x86
5 - Intel Pentium and Pentium MMX, AMD K5 and K6, Cyrix 6x86, all Centaur/IDT WinChip, Rise mP6
6 - Intel Pentium Pro, Celeron, Pentium II and Pentium III, AMD Athlon, Cyrix 6x86MX and M-II

MODEL is a number which specifies a model in a family:
for example in family 4:
0 - i486DX
3 - i486DX2
8 - i486DX4
for example in family 5:
2 - Pentium
4 - Pentium MMX
for example in family 6:
1 - Pentium Pro
5 - Pentium II
6 - Celeron
7 - Pentium III
NOTE all these CPUs come from Intel. Other manufacturers' CPUs will not use this scheme.

STEPPING this number is incremented according to small changes in CPU design.

BRAND - a new field from Intel to distinguish some of their CPUs. Known values are:
0 - not supported
2 - Intel Pentium III
3 - Intel Pentium III Xeon


Third level of CPUID instruction
is supported only by Intel 6th generation CPUs (from Pentium Pro) and returns information about cache sizes as represented by these hexadecimal values:

$06 - processor has 8kB L1 cache for instructions
$08 - processor has 16kB L1 cache for instructions
$0A - processor has 8kB L1 cache for data
$0C - processor has 16kB L1 cache for data
$40 - processor has no L2 cache (Celeron detection method)
$41 - processor has 128kB L2 cache (CeleronA)
$42 - processor has 256kB L2 cache (mobile Pentium II)
$43 - processor has 512kB L2 cache (Pentium II and III)
$44 - processor has 1MB L2 cache (Xeon version)
$45 - processor has 2MB L2 cache (Xeon version)
$82 - processor has 256kB L2 cache (Pentium III E)

$4x - means 4-way cache (all)
$8x - means 8-way cache (Pentium III E)


Fourth level of CPUID instruction
is supported from Intel Pentium III up, and returns a processor serial number.

AMD, Cyrix and Centaur CPUs support some more levels of CPUID, which can be used for detection of special features (for example 3Dnow! technology), their cache sizes or a string (a CPU name), which is coded on-chip. Strings returned by these CPUs are:

AMD-K5(tm) Processor
AMD-K6tm w/ multimedia extensions
AMD-K6(tm) 3D processor
AMD-K6(tm)-2 Processor
AMD-K6(tm) 3D+ Processor
AMD-K6(tm)-III Processor
AMD-K7(tm) Processor
IDT WinChip 2
IDT WinChip 2-3D
AMD K5
AMD K6
AMD K6-2
AMD K6-2
AMD K6-III
AMD K6-III
AMD K7
Centaur/IDT C2
Centaur/IDT C2


CPU frequency is determined via two methods.
First method

measures the execution time of some CPU instructions, and compares this time to a table of values by CPU type. These values are the particular number of clock cycles needed by the identified CPU for that execution. TestCPU divides the relevant value by the execution time, giving the CPU frequency.

A disadvantage of this method is that the frequency won't be measured accurately if the correct CPU type wasn't detected, or if you have a new processor which is missing from the table!

Frequency detection via first method:
number of clock cycles needed for execution (from the table) / execution time = frequency
120000 'ticks' / 0.0012 seconds = 100 MHz

Second method
is applied only to CPUs with a Time Stamp Counter implemented. TSC counts processor clock cycles. It is incremented on each internal processor clock cycle and allows the most accurate timing method on PCs. NB this counter is reset to zero after a processor reset. It is 64 bits wide, which is enough to count more than 5850 years, if a processor runs at 100MHz. The CPUID instruction is used to check for an implemented TSC. All new CPUs will have TSC, Intel supports it from the Pentium processor upward, AMD from K5, and Cyrix from 6x86MX. The frequency can be theoretically measured to one clock cycle precision; actually the clock rate can alter slightly because of hardware factors. Even so, TestCPU measures frequency to a precision of 0.001 MHz.

Frequency detection via second method:
1) read TSC and write it into T1 variable
2) wait exactly one second (while TSC is automatically incremented)
3) read TSC again, and write it into T2
4) frequency in Hertz is calculated from the difference T2-T1

P-Rating
Some CPUs have a suffix of PR beside frequency, meaning Performance Rating. This extra label comes from processor vendors AMD, Cyrix and IBM, who wish to suggest that their CPUs have better code execution and execute faster (at a given frequency) than Intel's Pentium or Pentium II processors! They use the Winstone benchmark for comparisons. So for example a processor with Pentium 75 performance is labelled PR75. Here's a PR table with clock speeds:

processor:
NexGen Nx586-PR75
NexGen Nx586-PR80
NexGen Nx586-PR90
NexGen Nx586-PR100
NexGen Nx586-PR110
NexGen Nx586-PR120

AMD Am5x86-PR75
AMD K5-PR75
AMD K5-PR90
AMD K5-PR100
AMD K5-PR120
AMD K5-PR133
AMD K5-PR166
AMD K5-PR200

Cyrix 5x86-PR75
Cyrix 5x86-PR90
Cyrix 6x86-PR90
Cyrix 6x86-PR120
Cyrix 6x86-PR133
Cyrix 6x86-PR150
Cyrix 6x86-PR166
Cyrix 6x86-PR200
Cyrix 6x86MX-PR133
Cyrix 6x86MX-PR150
Cyrix 6x86MX-PR166
Cyrix 6x86MX-PR200
Cyrix 6x86MX-PR233
Cyrix 6x86MX-PR266
Cyrix M-II PR300
Cyrix M-II PR333
Cyrix M-II PR366
Cyrix M-II PR400
Cyrix M-II PR433

Rise mP6-PR166
Rise mP6-PR233
Rise mP6-PR266
Rise mP6-PR333
Rise mP6-PR366

IDT WinChip2A-PR200
IDT WinChip2A-PR233
IDT WinChip2A-PR300
IDT WinChip3-PR233

internal / external clock speed in MHz:
70 / 35
75 / 37,5
84 / 42
93 / 46,5
102 / 51
111 / 55,5

133 / 33
75 / 50
90 / 60
100 / 66
90 / 60
100 / 66
116,7 / 66
133 / 66

100 / 33
120 / 40
80 / 40
100 / 50
110 / 55
120 / 60
133 / 66
150 / 75
100 / 50, 110 / 55
120 / 60, 125 / 50
133 / 66, 137,5 / 55, 150 / 50
150 / 75, 165 / 55, 166 / 66
166 / 83, 187,5 / 75, 200 / 66
207,5 / 83, 225 / 75, 233 / 66
207,5 / 83, 225 / 75, 233 / 66
250 / 83
250 / 100
285 / 95
300 / 100

166 / 83
190 / 95
200 / 100
237,5 / 95
250 / 100

200 / 66
233 / 66
250 / 100
200 / 66



PAGE 2 - FEATURES

Processor features, which are returned with CPUID instruction
are displayed on the second page. There are some interesting features for users:

Processor contains floating-point unit This item signifies the presence of a floating-point unit (FPU) directly on-chip which all modern CPUs (from 486DX) will include. The FPU is used for real number calculations.
Time stamp counter TSC provides the most accurate timing method on a PC ; it allows precise measurement of processor frequency
Multiprocessor support (chip contains APIC) This signifies the presence of APIC, which permits symmetrical multiprocessing. If this item is crossed out then APIC is either disabled or not supported.
Processor serial number Means a serial number is enabled. This controversial feature can be disabled .. (then this item is crossed out) software for this purpose is available from Intel.
MMX technology Signifies a processor instruction set extension. 57 new MMX instructions accelerate graphics and multimedia processing. It was introduced with Intel's Pentium MMX processor. Today it is supported by all processor manufacturers.
Fast save and restore FP/MMX/SSE Signifies the ability of this processor to switch rapidly between FPU, MMX and SSE modes.
Intel Streaming SIMD Extensions (SSE) Signifies the presence of a second instruction set extension - 70 new instructions which speed up the processing of 3D graphics, sound and internet. Supported from Intel's Pentium III processor upwards.

The next few features are supported only by Intel-competitive processors:

Partial SSE support Signifies an instruction set extension of Athlon (and newer) processor. It supports the SSE-MMX and SSE-MEM instructions.
Cyrix extended MMX Signifies an instruction set extension for 6x86MX, M-II and newer processors. These processors support some new MMX instructions.
AMD 3Dnow! Signifies support for 21 instructions from AMD for 3D graphics speed up, which was first introduced with the K6-2 processor. This graphics instruction set is also supported by IDT WinChip processors.
AMD extended 3Dnow! AMD Athlon (and newer) processors have additional 3D instructions.



PAGE 3 - MEMORY 1

MOV test
is first of two memory benchmarks, which measures transfer speed of memory and caches. These memory speeds are measured by transferring the same data block twice. During the first transfer, a data block is loaded into cache; the second time data is transferred from the cache. The block size is increased repeatedly from 2kB to 2MB. The transfer speed drops considerably above a certain block size, indicating that the cache has reached full capacity. Cache size can be detected in this way. I use a pair of MOV instructions in this test - one of the most commonly used instructions. However transfers with a MOV instruction are slower than transfers via a MOVSD instruction, which is used in my second memory test. Here's a part of the source code that I use for this transfer:

@repeat: mov eax,[esi]
mov [edi],eax
add esi,4
add edi,4
dec ecx
jnz @repeat

ESI contains source address, EDI holds target address and ECX contains the number of repetitions. To transfer 4K, 1024 repetitions are needed, because 32 bits (4 bytes) are transferred at a time. Instruction MOV EAX,[ESI] reads data from source address [ESI] in memory into EAX register in the processor. Second instruction MOV [EDI],EAX writes data from EAX back to target address in memory in [EDI]. Next two instructions ADD ESI,4 and ADD EDI,4 increment pointers to source and target address to point at next 4 byte, which we will transfer. Next instruction DEC ECX decrements the ECX register by one, so that the loop will be repeated only 1023 times. The last instruction JNZ @REPEAT forces repetition of this loop until ECX reaches zero.

Cache
memory is fast and small: it holds frequently used data which can be accessed quickly in the cache. Cache memory was used on 386 motherboards for the first time. From the 486 processor on, two caches are used. There is a first level cache ( L1 ), which is on-chip, and a second level cache ( L2 ), which is usually on the motherboard. This second level cache is bigger, but slower than the other. In Pentium - class processors the first level cache is divided into code and data cache.

Some on-die cache sizes:

processor:
Intel i486SX/DX/DX2
Intel i486DX4
Intel Pentium
Intel Pentium MMX
Intel Pentium Pro
Intel Pentium II/III
Intel Celeron
Intel Celeron A
Intel Pentium III E

AMD Am486DX/DX2/DX4
AMD Am5x86
AMD K5
AMD K6/K6-2
AMD K6-III
AMD Athlon

Cyrix Cx486SLC/DLC
Cyrix Cx486S
Cyrix Cx486DX/DX2/DX4
Cyrix 5x86/6x86
Cyrix 6x86MX/M-II

IDT WinChip/WinChip2
IDT WinChip3

Rise mP6

NexGen Nx586
NexGen Nx686

UMC 486

IBM 486SLC

L1 cache (for instructions + for data)
8 kB
16 kB
16 kB (8 kB + 8 kB)
32 kB (16 kB + 16 kB)
32 kB (16 kB + 16 kB)
32 kB (16 kB + 16 kB)
32 kB (16 kB + 16 kB)
32 kB (16 kB + 16 kB)
32 kB (16 kB + 16 kB)

8 kB
16 kB
24 kB (16 kB + 8 kB)
64 kB (32 kB + 32 kB)
64 kB (32 kB + 32 kB)
128 kB (64 kB + 64 kB)

1 kB
2 kB
8 kB
16 kB
64 kB

64 kB (32 kB + 32 kB)
128 kB (64 kB + 64 kB)

16 kB (8 kB + 8 kB)

32 kB (16 kB + 16 kB)
48 kB (16 kB + 32 kB)

8 kB

16 kB

L2 on-die cache:




256 kB to 1 MB
512 kB (Xeon to 2 MB)

128 kB
256 kB





256 kB
512 kB



PAGE 4 - MEMORY 2

MOVSD test
works similarly to the first memory bench. It too measures memory transfer speed, respectively cache transfer speed, but uses MOVSD instructions. This instruction is faster than a pair of MOV instructions, because a modern processor will pre-fetch reads using burst cycles and write-combine data into burst cycles.

The method of transferring the data block is similar: ESI contains a source address, EDI holds a target address and ECX counts the number of repetitions. Then a REP MOVSD instruction is executed. The REP prefix means that the following instruction will be executed repeatedly. Number of repetitions is determined in ECX register, whose value is decreased on every iteration, until the value is zero. Then execution of the REP MOVSD loop finishes. MOVSD instruction moves data from source address [ESI] to target address [EDI] and increases both pointers (ESI and EDI) to point to the next location.



PAGE 5 - CALCULATIONS

This page contains five mathematical benchmarks:
First benchmark calculates a large factorial using integer numbers, so this benchmark only uses the processor (no FPU). The factorial of 10001 is the product of all numbers from 1 to 10001:

10001! = 1 * 2 * 3 * 4 * ....... * 9998 * 9999 * 10000 * 10001

This factorial result is stored in 14812 bytes (over 14 kB) of memory - a 118496-bit number! The result will still fit into an on-die L1 cache in most processors. It will not reflect main memory performance.

Second benchmark calculates the first 30000 prime numbers. This benchmark uses only integer numbers again, so it tests just the processor, and uses about 120 kB memory, which is accessed only once. Caching isn't needed.

A lot of intelligent algorithms exist to determine primes, I used the following :

Number n is prime, if it isn't divisible by numbers bigger than 1 and smaller than n. But it will be enough to find divisibility by numbers (or factors) from 2 to the square root of number n. All even numbers beyond two can't be primes, because by definition they are divisible by the number two.

Third benchmark calculates the determinant of a 9x9 matrix using Laplace transformation. This benchmark works with (regular) matrices of real numbers, so it will use both the processor and the FPU. A Laplace transformation is based on decomposition of an original matrix into smaller determinants, which are then decomposed etc, until a single number is obtained. In TestCPU this is done by recursive procedure calls. Here's a decomposition of a 3 x 3 matrix as an example of this benchmark in mathematical terms:

| 9 8 7 |
| 4 5 6 |
| 0 1 2 |
= (-1)1+1* 9 * | 5 6 |
| 1 2 |
+ (-1)1+2* 8 * | 4 6 |
| 0 2 |
+ (-1)1+3* 7 * | 4 5 |
| 0 1 |
= .... and so on

Fourth benchmark

more coming soon :-)