NET EXPRESS

sales@netexpresslabs.com, tech@netexpresslabs.com, Silicon Valley, California

Hit your reload button to view the latest information.


Home Page

Order Page

Sale Page

Current CPU Prices

Prices updated 08/25/2003 - these prices may be out of date. Please contact us via e-mail for updates (sales@netexpresslabs.com).

We now have the new AMD Opteron (Hammer) 64-bit x86 CPUs shipping. 

Part #

P4 Xeon Prestonia

Dual Processor (SMP) Socket-603 CPUs with 533MHz FSB

with Hyperthreading

The P4 Xeon Prestonia CPUs feature a Pentium IV core based on a 0.13 micron process. These CPUs feature hyperthreading which basically means that each physical CPU has two logical CPUs inside. So when you boot up your new dual processor Prestonia system your operating systems (Linux, Windows, etc) will see four CPUs, not two. These CPUs have extraordinary performance. The P4 Xeon Prestonia CPUs are dual processor (SMP) capable. The Prestonia based models are replacing the older Foster models. The newer Prestonia models feature a larger 512K cache. These Models require a special Xeon motherboard and a special power supply. These fit in a normal ATX chassis. Fans sold separately.

Current Price

X2.0B

Intel Xeon 2.0Ghz 512K  Socket-604 with 533MHz FSB BX80532KE2000D

$279

X2.4B

Intel Xeon 2.4Ghz 512K  Socket-604 with 533MHz FSB BX80532KE2400D

$299

X2.6B

Intel Xeon 2.66Ghz 512K  Socket-604 with 533MHz FSB BX80532KE2667D

$358

X2.8B

Intel Xeon 2.8Ghz 512K  Socket-604 with 533MHz FSB BX80532KE2800D

$432

X3.066B

Intel Xeon 3.066Ghz 512K  Socket-604 with 533MHz FSB BX80532KE3066D

$572

X3.066B-1MB

Intel Xeon 3.066Ghz 1MB  Socket-604 with 533MHz FSB

$784

Part #

P4 Xeon Prestonia

Dual Processor (SMP) Socket-603 CPUs with 400MHz FSB

with Hyperthreading

Same processors as above but with 400Mhz FSB. 

Current Price

X1.8-512

Intel Xeon 1.8Ghz 512K  Socket-603 with 400MHz FSB BX80532KC1800D

$230

X2.0-512

Intel Xeon 2.0Ghz 512K  Socket-603 with 400MHz FSB BX80532KC2000D

$249

X2.4-512

Intel Xeon 2.4Ghz 512K  Socket-603  with 400MHz FSB BX80532KC2400D

$286

Part #

P4 "MP" Xeons for Quad CPUs Motherboards, Socket-604

The P4 Xeon "MP" CPUs feature a Pentium IV core based on a 0.13 micron process. They are designed for Quad Processing motherboards. These CPUs feature hyperthreading which basically means that each physical CPU has two logical CPUs inside. So when you boot up your new Quad processor MP Xeon system your operating system will see eight CPUs, not four. These CPUs have extraordinary performance. The MP models feature a 256K L2 cache and 512K or 1MB of L3 cache. These Models require a special Xeon motherboard and a special power supply. These fit in a normal ATX chassis. Fans sold separately.

Current Price

X2.0MP

Intel Xeon 2.0Ghz 256K L2 cache and 1MB L3 Cache 400MHz FSB BX80532KC2000E

$1328

X2.0MP

Intel Xeon 2.0Ghz 256K L2 cache and 2MB L3 Cache 400MHz FSB BX80532KC2000F

$3932

X2.5MP

Intel Xeon 2.5Ghz 256K L2 cache and 1MB L3 Cache 400MHz FSB BX80532KC2500E

$2125

X2.8MP

Intel Xeon 2.8Ghz 256K L2 cache and 2MB L3 Cache 400MHz FSB BX80532KC2800F

$3985

Part #

Pentium IV Northwood Socket-478 CPUs with 512K Cache with 533MHz FSB

The Pentium IV Northwood CPUs feature a larger 512K cache and are based on a 0.13 micron process. They operate in single CPU Pentium IV motherboards. They do not work in dual processor board. Generally special P4 motherboards and power supplies are required. Fans sold separately. These come in two flavors with either a 400MHz or 533MHz Front Side Bus (FSB)

Current Price

P478-2.4B

Intel Pentium IV 2.4Ghz 512K 478 Pin 533MHz FSB BX80532PE2400D

$220

P478-2.66B

Intel Pentium IV 2.66Ghz 512K 478 Pin 533MHz FSB BX80532PE2667D

$258

P478-2.8B

Intel Pentium IV 2.8Ghz 512K 478 Pin 533MHz FSB BX80532PE2800D

$344

P478-3

Intel Pentium IV 3.02Ghz 512K 478 Pin 533MHz FSB BX80532PE3066D

$479

Part #

Pentium IV Northwood Socket-478 CPUs with 512K Cache with 400MHz FSB

Same processors as above but with 400MHz FSB

Current Price

P478-1.8-512

Intel Pentium IV 1.8Ghz 512K  478 Pin 400MHz FSB BX80532PC1800D

$167

Part #

Pentium IV Northwood Socket-478 CPUs with 512K Cache with *800MHz* FSB

Note this is the newest Pentium IV with an *800MHz* FSB.

Current Price

P800-2.4

Intel Pentium IV 2.4Ghz 512K 478 Pin 800MHz FSB BX80532PG2600D

$239

P800-2.6

Intel Pentium IV 2.6Ghz 512K 478 Pin 800MHz FSB BX80532PG2600D

$288

P800-2.8

Intel Pentium IV 2.8Ghz 512K 478 Pin 800MHz FSB BX80532PG2800D

$366

P800-3.0

Intel Pentium IV 3.02Ghz 512K 478 Pin 800MHz FSB BX80532PG3000D

$515

P800-3.2

Intel Pentium IV 3.2Ghz 512K 478 Pin 800MHz FSB 

$736

Part #

AMD Opteron (Code Named Hammer)

The AMD Opteron is a 64-bit x86 compatible CPU that is multi-processor capable. It currently supports a number of operating systems including SuSe and RedHat Linux.    

Current Price

OSA840CC

AMD Opteron 840 1.4Ghz with 1MB L2 Cache OSA840CCO5AI 4-way and 8-way

$849

OSA842CC

AMD Opteron 842 1.6Ghz with 1MB L2 Cache OSA842CCO5AI 4-way and 8-way

$1398

OSA844CC

AMD Opteron 844 1.8Ghz with 1MB L2 Cache OSA844CCO5AI 4-way and 8-way

$2249

OSA240BOX

AMD Opteron 240 1.4Ghz with 1MB L2 Cache 2-way

$348

OSA242BOX

AMD Opteron 242 1.6Ghz with 1MB L2 Cache 2-way

$561

OSA244BOX

AMD Opteron 244 1.8Ghz with 1MB L2 Cache 2-way

$789

OSA246BOX

AMD Opteron 246 with 1MB L2 Cache 2-way

$920

OSA140BOX

AMD Opteron 140 1.4Ghz with 1MB L2 Cache 1-way

$319

OSA142BOX

AMD Opteron 142 1.6Ghz with 1MB L2 Cache 1-way

$399

OSA144BOX

AMD Opteron 144 1.8Ghz with 1MB L2 Cache 1-way

$554

Part #

AMD MP Palomino & Tbred

(Athlon 4 or Athlon MP)
Fan sold separately

The Athlon MP is a CPU that is dual processor capable.   

Current Price

MP2000

AMD MP-2000+ Palomino 1.67GHz Socket A See Specs 266MHz FSB AMP2000BOX

$157

MP2200

AMD MP-2200+ Palomino 1.8GHz Socket A See Specs 266MHz FSB AMSN2200BOX

$157

MP2400

AMD MP-2400+ T/Bred 2GHz Socket A See Specs 266MHz FSB AMSN2400BOX

$175

MP2600

AMD MP-2600+ T/Bred 2.13GHz Socket A See Specs 266MHz FSB AMSN2600BOX

$230

MP2800

AMD MP-2800+ 2.225GHz Socket A See Specs

$299

Part #

AMD XP Barton (Athlon 4 or Athlon XP)
Fan sold separately

The Athlon XP Barton is a CPU that is single processor capable. It has double the size L2 cache as the older T/Bred CPUs.      

Current Price

XP25B

AMD XP-2500+ Barton 1.83GHz Socket A See Specs 333MHz FSB 512K Cache AXDA2500BOX

$130

XP28B

AMD XP-2800+ Barton 2.08GHz Socket A See Specs 333MHz FSB 512K Cache AXDA2800BOX

$242

XP30B

AMD XP-3000+ Barton 2.17GHz Socket A See Specs 333MHz FSB 512K Cache AXDA3000BOX

$351

XP32B

AMD XP-3200+ Barton 2.2GHz Socket A See Specs 400MHz FSB 512K Cache AXDA3000BOX

$560

Part #

AMD XP Thoroughbred (Athlon 4 or Athlon XP)
Fan sold separately

The Athlon XP Thoroughbred is a CPU that is single processor capable.     

Current Price

XP20

AMD XP-2000+ T/Bred 1.67GHz Socket A See Specs 266MHz FSB 256K Cache AXDA2000BOX

$87

XP21

AMD XP-2100+ T/Bred 1.73GHz Socket A See Specs 266MHz FSB 256K Cache

$104

XP22

AMD XP-2200+ T/Bred 1.8GHz Socket A See Specs 266MHz FSB 256K Cache

$104

XP24

AMD XP-2400+ T/Bred 2GHz Socket A See Specs 266MHz FSB 256K Cache AXDA2400BOX

$111

XP25

AMD XP-2500+ T/Bred 1.83GHz Socket A See Specs 333MHz FSB 256K Cache

$130

XP26

AMD XP-2600+ T/Bred 2.133GHz Socket A See Specs 333MHz FSB 256K Cache

$137

XP27

AMD XP-2700+ T/Bred 2.16GHz Socket A See Specs 333MHz FSB 256K Cache

$183

Part #

AMD Duron
Fan sold separately

The Athlon Duron is a socked Atlon with a full speed 64K on chip cache. The Duron uses a 0.18 micro process and some use copper interconnects.   

Current Price

D1100

AMD Duron 1.1GHz 0.18 micron, Socket A See Specs

$34

D1200

AMD Duron 1.2GHz 0.18 micron, Socket A See Specs

$44

D1300

AMD Duron 1.3GHz 0.18 micron, Socket A See Specs

$59

Part #

Intel P4 Celeron, Socket 478 CPUs 
The lastest Celerons are based on a P4 core with 128K cache and fit in a standard socket 478 motherboard. They have a full 400MHz FSB. Fans sold separately.

Current Price

CEL-1.7

Intel Celeron 1.7Ghz-128K MMX, Socket 478 BX80531P170G128

$69

CEL-1.8

Intel Celeron 1.8Ghz-128K MMX, Socket 478 BX80531P180G128

$79

CEL-2

Intel Celeron 2Ghz-128K MMX, Socket 478 BX80531P200G128

$86

CEL-2.1

Intel Celeron 2.1Ghz-128K MMX, Socket 478 BX80531P210G128

$94

CEL-2.2

Intel Celeron 2.2Ghz-128K MMX, Socket 478 BX80531P220G128

$94

CEL-2.4

Intel Celeron 2.4Ghz-128K MMX, Socket 478 BX80531P240G128

$108

CEL-2.6

Intel Celeron 2.6Ghz-128K MMX, Socket 478 BX80531P260G128

$116

 

CPU Fan and Heat Sink

Part #

Description

Current Price

CPU-FAN

Intel (Retail) and most other brand CPU Heat Sink and Fan

$20

 

Technical Considerations

Outline:

Quick Notes

We currently suggest the Pentium II processor for all new systems. The Pentium Pro CPU has been discontinued and there are severe shortages on replacement parts for Pentium Pro computers. Pentium Classic processors are discontinued. Pentium MMX processors are being phased out and will be discontinued in early 1998. Therefore your best investment is in a Pentium II based system. If you can not afford a Pentium II you may wish to wait until the next official Intel CPU price cut (See below for details).

*Be aware that due to shortages in Pentium Pro and Pentium Classic processors many chip brokers are selling remarked processors. Used processors are also on the market. Buy with care. We guarantee all our processors are new and not remarked.

Please read => Pentium II Systems require extra power and cooling. Therefore we only sell Pentium II systems with California PC Products 300W, 25A power supplies, extra 80mm cooling fans and specific qualified chassises, such as the California PC Products 8-bay Steel Chassis. These designs will prevent damage from gray-outs and overheating.

Many vendors are currently selling less expensive Pentium II systems in standard Pentium Class Chassises and power supplies. These are completely inadequate and inappropriate. You should not build a Pentium II system like a Pentium. We are getting *many* reports of system instability and permanent failures caused by overheating and inadequate power when Pentium class chassises and power supplies are implemented with the Pentium II. If you buy a Pentium II system specify a qualified chassis, power supply and fans.

For more information on chassises and power supplies see our tower chassis page, rack mount chassis page and power supply page.

Intel Wholesale CPU Prices for 1997-1998

Traditionally each quarter (every three months) Intel drops CPU prices from 2-48%. These price drops occur on the 28th day of January, April, July and October. Frequently you'll see these drops referred to by quarter as Q1, Q2, Q3 and Q4 respectively. The press generally calls the January 28th, April 28th, July 28th and October 28th price drops February, May, August and November price drops for unknown reasons.

In 1996, the November price drop was temporarily discontinued due to protests from Compaq that it interfered with Christmas sales. However, in March 1997, Intel re-instated the November price drop this year.

In an effort to push Pentium II sales Intel announced that it would add even more price cut dates. The price of the 233MHz Pentium II CPU was cut early on December 28th, 1997, instead of the normal January 28th date. Moreover, additional price cuts were scheduled for April 15th, 1998 and July 7th, 1998, making for a very aggressive year. We have yet to hear from Intel if they plan to add additional price cuts later this year above and beyond the Q3 and Q4 price cuts.

In 1999 price cuts have been rescheduled. The Q1 '99 price cut will be on February 28th. The Q2 price cut will come on March 11. The specific dates of other price drops for 1999 have not been released yet.  

Following a price drop, prices take from three to ten days to occur because Intel brokers must amortize their stocks. Quick drops indicate that the brokers are holding a smaller stock of CPU's. Frequently prices actually rise just before a price drop because brokers and distributors stock less CPU's. Other times the prices may erode a week or two before a price drop as over stocked brokers get nervous and sell off excess inventory. This leads to volatility in both the RAM and CPU market around the time of a CPU price drop.

Below we have tabulated Intel's estimated pricing for 1997-1999. However, these plans are subject to change without notice. The price estimates reflect orders in quantities of at least 10,000 for first tier brokers. Internet prices will vary from a point or two below these prices up to twenty percent higher than these prices for most CPU's. However, due to shortages of some processors street prices may be significantly higher. In order to not encourage more demand for the Pentium Pro processors, Intel plans to keep current Pentium Pro wholesale prices close to their current values until they are effectively replaced by the Deschutes processors in early 1998.

Note that Pentium Pro and Pentium CPU's have been discontinued. 

Intel RoadMap for Pentium II, Celeron, Xeon, Tanner, Katmai, Merced for 1999
Source: CRW Sources
(Yellow = Not released yet, Aqua = discontinued)

CPU Description

Q1 '99 Feb 28

Q2 '99 April 11

Q2 '99 May 16 

Q2 '99 July 18 

Q3 Sept '99

Q4 '99

Celeron 300Mhz, 128K, 66MHz bus, Slot 1 SEPP

02/07/99 $60

RIP

RIP

RIP

RIP

RIP

Celeron 333Mhz, 128K, 66MHz bus, Slot 1 SEPP

02/07/99 $70

$71 April $61 July

RIP

RIP

RIP

RIP

Celeron 366Mhz, 128K, 66MHz bus, Slot 1 PGA/SEPP

02/07/99 $90/100

$81 April $71 July

$73

$73

$73

RIP

Celeron 400Mhz, 128K, 66MHz bus, Slot 1 PGA/SEPP

02/07/99 $130/140

$103

$103

$93

$83

-

Celeron 433Mhz, 128K, 66MHz bus, Slot 1 PGA/SEPP

03/21/99 $169/175

$143

$143

$133

$113

-

Celeron 466Mhz, 128K, 66MHz bus, Slot 1 PGA/SEPP

N/R

$169

$169

$157

$147

-

Celeron 500Mhz, 128K, 66MHz bus, Slot 1 PGA/SEPP

N/R

N/R

N/R

N/R

$185

-

Pentium II 350Mhz 512K (Deschutes) MMX 100MHz Bus, Slot 1

$192

$163

$163

RIP

RIP

RIP

Pentium II 400Mhz 512K (Deschutes) MMX 100MHz Bus, Slot 1

$284

$234

$193

$183

$163

RIP

Pentium II 450Mhz 512K (Deschutes) MMX 100MHz Bus, Slot 1

$475

$396

$268

$230

$213

RIP

Pentium III 450Mhz-512K (Katmai) MMX2 100MHz Bus, Slot 1

$530

$411

$268

$230

$213

-

Pentium III 500Mhz 512K (Katmai) MMX2 100MHz Bus, Slot 1

$764

$625

$482

$423

$299

-

Pentium III 533Mhz 256K Integrated Cache (Katmai) MMX2 133MHz Bus, Slot 1

N/R

N/R

N/R

N/R

$415

-

Pentium III 550Mhz 512K (Katmai) MMX2 100MHz Bus, Slot 1

N/R

N/R

$730

$658

$520

-

Pentium III 600Mhz 256K Integrated Cache (Katmai) MMX2 133MHz Bus, Slot 1

N/R

N/R

N/R

N/R

$761

-

Xeon 450MHz 512K (Deschutes) MMX 100MHz Bus, Slot 2

$824

RIP

RIP

RIP

RIP

RIP

Xeon 450MHz 1MB (Deschutes) MMX 100MHz Bus, Slot 2

$1980

RIP

RIP

RIP

RIP

RIP

Xeon 450MHz 2MB (Deschutes) MMX 100MHz Bus, Slot 2

$3692

RIP

RIP

RIP

RIP

RIP

Xeon 500MHz  512K (Tanner) MMX2 100MHz Bus, Slot 2

$931

$824

$824

$824

RIP

RIP

Xeon 500MHz  1MB (Tanner) MMX2 100MHz Bus, Slot 2

N/R

$1980

$1980

$1980

RIP

RIP

Xeon 500MHz  2MB (Tanner) MMX2 100MHz Bus, Slot 2

N/R

$3692

$3692

$3692

RIP

RIP

Xeon 550MHz 512K (Tanner) MMX2 100MHz Bus, Slot 2

N/R

~$1200

$1200

$1200

$1200

-

Xeon 550MHz 1MB (Tanner) MMX2 Slot 2

N/R

N/R

N/R

N/R

$1980

-

Xeon 550MHz 2MB (Tanner) MMX2 Slot 2

N/R

N/R

N/R

N/R

$3692

-

Xeon 600MHz 256K integrated (Coppermine) MMX2 Slot 2 and Slot M

N/R

N/R

N/R

N/R

$829

-

Xeon 667Mhz 256K Integrated cache (Coppermine) MMX2 Slot 2 and Slot M

N/R

N/R

N/R

N/R

$1040

-

Celeron 300Mhz MMO Notebook

$187

$106

-

-

RIP

RIP

Pentium II (r) (Deschutes) 300Mhz MMO Notebook

$321

$187

-

-

RIP

RIP

Pentium II (r) (Deschutes) 333Mhz MMO Notebook

$465

$316

-

-

-

-

Pentium II (r) (Deschutes) 366Mhz MMO Notebook

$696

$530

-

-

-

-

Pentium III (r) Geyerville 450Mhz 256K integrated cache MMO Notebook 1.3v

N/R

N/R

$341

$341

$341

-

Pentium III (r) Geyerville 500Mhz 256K integrated cache MMO Notebook 1.3v

N/R

N/R

$520

$520

$520

-

Pentium III (r) Geyerville 600Mhz 256K integrated cache MMO Notebook 1.6v

N/R

N/R

$761

$761

$761

-

Merced, Slot M

N/R

N/R

N/R

N/R

N/R

-

 

Intel Wholesale Pentium II, Celeron and Xeon Prices for 1997 and 1998
(Pricing in 10,000 unit quantities)
Source: CRW Sources
(Yellow = Not released yet, Aqua = discontinued)

CPU Description

Q2 '97 March 28

Q3 '97 July 28

Q4 '97 Oct 28

Q1 '98 Jan 28

Q2 '98 April 15

Q2B '98 June 7/29

Q3 '98 July 28

Q3B '98 Sept 13

Q4 '98 Oct 25

Celeron 266Mhz-0K, 66MHz bus, Slot 1

-

-

-

-

$160

$110

$86

RIP

RIP

Celeron 300Mhz-0K, 66MHz bus, Slot 1

-

-

-

-

-

$160

$112

$112

RIP

Celeron (Mendecino) 333Mhz-128K, 66MHz bus, Slot 1

-

-

-

-

-

-

-

$210

$155

Pentium II (r) 233Mhz-512K, 66MHz bus, Slot 1

$654

$575

$401

$260 Dec 28

$195

$195

RIP

RIP

RIP

Pentium II (r) 266Mhz-512K, 66MHz bus, Slot 1

$790

$712

$530

$370

$250

$200

$160

RIP

RIP

Pentium II (r) 300Mhz-512K, 66MHz bus, Slot 1

$1980

$830

$738

$520

$370

$300

$210

$210

RIP

Pentium II (r) (Deschutes) 333Mhz-512K, 66MHz bus, Slot 1

-

-

-

$710 ($570 on March 16th)

$480

$400

$315

$280

$181

Pentium II (r) (Deschutes) 350Mhz-512K, 100MHz Bus, Slot 1

-

-

-

-

$610

$510

$420

$370

$202

Pentium II (r) (Deschutes) 400Mhz-512K, 100MHz Bus, Slot 1

-

-

-

-

$810

$710

$585

$570

$353

Pentium II (r) (Deschutes) 450Mhz-512K, 100MHz Bus, Slot 1

-

-

-

-

-

-

$655

$655

$562

Xeon (Deschutes) 400Mhz 512KB, 100MHz Bus, Slot 2

-

-

-

-

-

$1124

$1059

$1059

$825

Xeon (Deschutes) 400Mhz-1MB, 100MHz Bus, Slot 2

-

-

-

-

-

$2836

$2675

$2675

$1990

Xeon (Deschutes) 450Mhz 512KB, 100MHz Bus, Slot 2

-

-

-

-

-

-

-

-

$825

Xeon (Deschutes) 450Mhz 1MB, 100MHz Bus, Slot 2

-

-

-

-

-

-

-

1999

$825

Xeon (Deschutes) 450Mhz 2MB, 100MHz Bus, Slot 2

-

-

-

-

-

-

-

1999

$1990

Pentium II (r) (Deschutes) 233Mhz MMO Notebook

-

-

-

$466 LX $542 BX

$390

$390

$233

$205

RIP

Pentium II (r) (Deschutes) 266Mhz MMO Notebook

-

-

-

$696 LX $772 BX

$630

$630

$444

$371

$209

Pentium II (r) (Deschutes) 300Mhz MMO Notebook

-

-

-

-

-

-

$637

$371

$371

 

Intel Wholesale Pentium and Pentium Pro CPU Prices for 1997 and 1998
For Historical Purposes - RIP

(Pricing in 10,000 unit quantities)
Source: CRW Sources
(Yellow = Not released yet, Aqua = discontinued)

CPU Description

Q4 '96 Nov Price

Q1 '97 Feb Price

Q2 '97 May Price

Q3 '97 August Price

Q4 '97 Nov Price

Q1 '98 Feb Price

Q2 '98 April 15 to June 29

Pentium (r) 133 MHz

$198

$131

RIP

RIP

RIP

RIP

RIP

Pentium (r) 133 MHz
For Notebooks

$238

$174

$162

RIP

RIP

RIP

RIP

Pentium (r) Classic 150 MHz

$280

$158

$147

$89

$78

RIP

RIP

Pentium (r) Classic 166 MHz

$396

$289

$205

$102

$80

RIP

RIP

Pentium (r) 166 MHz MMX

$400

$349

$265

$138

$110

$95

RIP

Pentium (r) 166 MHz MMX
For Notebooks

$550

$539

$487

$240

$174

$162

RIP

Pentium (r) Classic 200Mhz

$550

$488

$252

$119

$95

RIP

RIP

Pentium (r) MMX 200Mhz

$550

$528

$482

$247

$209

$123

$95

Pentium (r) MMX 233Mhz

N/R

N/R

$583

$367

$290

$193

$140

Pentium Pro (r) 180Mhz-256K

$410

$410

$390

$380

RIP

RIP

RIP

Pentium Pro (r) 200Mhz-256K

$515

$515

$515

$478

RIP

RIP

RIP

Pentium Pro (r) 200Mhz-512K

$1014

$1014

$1014

$1014

RIP

RIP

RIP

Pentium Pro (r) 200Mhz-1MB

N/R

$2650

$2650

$2650

$2650

RIP

RIP

 

CPU Functional Design

This is an introductory document regarding the nature of Modern CPU's

Introduction to the Von-Neumann computer
and the Von-Neumann bottleneck

The design of modern computers is based on the Von-Neumann computer. The Von-Neumann design separates the CPU, main memory and Input/Output into three discrete components connected by two buses, the data bus and the address bus. The Input/Output component includes the console or display, the keyboard, mouse as well as mass storage devices such as disk drives and floppy drives. Data must move from slower devices such as a disk drive, into main memory. The CPU reads data either from main memory or from a memory cache.

In the 1970's when slower scalar CPU's where state of the art, the main memory could feed data to the CPU as fast as the CPU could accept it for processing. But now CPU's have become much faster for reasons we will discuss below. As a consequence the main challenge of modern engineers is to find ways to feed data and instructions to the CPU fast enough so that the CPU does not waste time in an idle state. Hence the current focus on things like faster RAM (like SDRAM and RAMBus), faster data bus speeds (moving from 66MHz to 100MHz), faster and larger cache sizes, and most importantly new ingenious technologies incorporated into the CPU and its associated code compiler as discussed below. This lag of memory access speeds behind CPU speeds is sometimes referred to as the Von-Neumann bottleneck.

What is a CPU and how is it manufactured?
(VHSIC, VLSI, CMOS and BiCMOS)

A little history will help our understanding of modern CPU's. In 1948, Bell Laboratories invented a solid state digital switch which soon replaced vacuum tubes used in very early computers. This solid state digital switch is known as a transistor. A transistor conducts variable amounts of electricity depending on the input current that is applied to it. It can act as a switch which can be switched off or on, symbolically storing a 0 or 1 respectively. This property is due to the use of a semiconductor material in the transistor, like doped Silicon. Solid state merely means that it has no moving parts. In 1959 Texas Instruments demonstrated that many transistors can be manufactured on a single surface making what is collectively called an integrated circuit (IC). Soon thousands of small components were integrated onto a single miniature wafer in a process called large scale integration (LSI). Many discrete LSI's were used in a single computer. Eventually millions of transistors were implemented in chips categorized as very large scale integrated circuits (VLSI). In 1971, Intel, then a memory manufacturer, made the 4004 CPU. It was the first microprocessor and integrated all the logic chips into a single 4-bit CPU.

CPU's can be implemented with very fast and very expensive technology categorized as very high speed integrated circuits (VHSIC) which includes technology such as emitter-coupled logic (ECL) and GaAs based designs used in vector computers. However, this technology is very expensive to implement. Instead most modern CPU's use a VLSI design implemented in CMOS and BiCMOS designs such as the Intel x86 processors and the DEC Alpha. While these designs offer slower clock rates they are inexpensive to produce and hence larger numbers of transistors can be unutilized to add more intelligence and greater parallel performance and pipelining (See below).

CPU's are manufactured in a complex process. Silicon is extracted from Silicon Dioxide and grown into giant crystals 8 inches in diameter. Thousands of processors can be made from a single crystal. These crystals are cut with a saw into hundreds of wafers 8 inches across and less than one millimeter thick. The wafers are carefully polished and chemically treated with a photosensitive material. Meanwhile a giant 3D map of the processor design is created. This map details the placement of each transistor and all devices and connecting conductors. Specific software packages are used which logically take this 3D map and convert it into discrete layers called masks. This is just like representing a building with a series of floor plans, each representing a story of the building. A mask is like a photographic negative. In a process called photolithography light is passed through the mask and projected onto the silicon just like light in an enlarger is passed through a photographic negative to create an image on a print substrate. The photosensitive material reacts to the light. Exposed material may be etched away with solvents and unexposed material may remain. The surface may be coated with new materials. The process is repeated for each of the series of masks until all layers of the processor based on the 3D map have been laid down. There are typically eight or more layers. Finally the 8 inch wafer is cut into separate processors. Each processor is about a third of an inch across. The final processor is packaged in a ceramic material or Plexiglas.

The process of etching a layering used in most CPU's is called complementary metal oxide semiconductor (CMOS) process. All Intel CPU's use the CMOS process except very early 8088's and 8086's which used a non-metal oxide semiconductor (NMOS) process and Pentium classic, Pentium overdrives and the Pentium Pro CPU's which used a bipolar CMOS process (BiCMOS).

The detail of the channels etched into the wafers is as precise as 0.25 microns. Each generation of chips uses a smaller micron process packing more channels and hence more transistors into a smaller square area. Indeed, if each transistor is shrunk to one half the size on each side, then four transistors can fit in the same square area. Thus cutting the size in half will increase the number of transistors by a factor of four in the same square area of silicon.

CPU Architecture

Bus Unit:
The Bus Unit is the place where instructions flow in and out of the microprocessor from the computer's main memory.

Instruction Cache:
The Instruction Cache is a warehouse of instructions right on the chip, so that the microprocessor doesn't have to stop and look in the computer's main memory for instructions. This quick access makes processing fast as instructions are 'fetched' to the Prefetch Unit where they are put in the proper order for processing.

Prefetch Unit:
The Prefetch Unit decides when to order data and instructions from the Instruction Cache or the computer's main memory based on commands or the task at hand. When the instructions come in, the most important task for the Prefetch Unit is to be sure all the instructions are lined up correctly to send off to the Decode Unit.

Decode Unit:
The Decode Unit does just that - it decodes or translates complex machine language instructions into a simple format understood by the Arithmetic Logic Unit (ALU) and the Registers. This makes processing more efficient.

Control Unit:
The Control Unit is one of the most important parts of the microprocessor because it is in charge of the entire process. Based on instructions from the Decode Unit, it creates control signals that tell the Arithmetic Logic Unit (ALU) and the Registers how to operate, what to operate on, and what to do with the result. The Control Unit makes sure everything happens in the right place at the right time.

Arithmetic Logic Unit (ALU):
The ALU is the last stage of processing in the chip. The ALU is the smart part of the chip that performs commands like adding, subtracting, multiplying and dividing. It also knows how to read logic commands like OR, AND, or NOT. Messages from the Control Unit tell the ALU what it should do and then it takes data from its close companion, the Registers, to complete the task. This is really where 2 finally gets added to 3 in our example.

Registers:
The Registers are a mini-storage area for data used by the Arithmetic Logic Unit (ALU) to complete the tasks the Control Unit has requested. The data can come from the data cache, main memory or the control unit and are all stored at special locations within the Registers. This makes retrieval for the ALU quick and efficient.

Data Cache:
The Data Cache works very closely with the "processing partners," the ALU and Registers, and the Decode Unit. This is where specially labeled data from the Decode Unit are stored for later use by the ALU and where final results are prepared for distribution to different parts of the computer.

Main Memory:
This is the big store house of data located within the main computer outside of the microprocessor. At times the Main Memory may send in data or instructions for the Prefetch Unit, which often get stored at an address in the Instruction Cache to be used later.

Pipelining and instruction reordering

Pipelining is an architectural feature analogous to an assembly line. Data and instructions can be passed through a multi-staged pipe. Each stage of the pipe performs a different task. In this way several pieces of data and their instructions can travel down the pipe like cars on an assembly line. Thus many sets of operations can be run in parallel. As one operation is coming to completion with results being returned, another operation is in the middle of processing and yet another operation is just getting started. Pipelining can increase performance three fold.

Input or output registers may be added to the various stages in the pipeline so that one set of registers can start to read new data while current data can be written. The register loading slows down the execution a little but this is more than offset by the ability to overlap the processing at each stage in the pipeline.

Instruction reordering (or out-of-order execution as it is sometimes called) can substantially increase performance when combined with pipelining and a load-and-store architecture (See below). For example, instructions may be reordered such a command to fetch data from main memory and store these data in registers are executed long before subsequent instructions that requires these data. By doing this all the data required for several commands can be stored into the CPU's registers. Next the commands that use the data can be executed consecutively without stalling while waiting for data to be loaded. Instruction reordering along with pipelining can increase performance five fold.

Scalar and Superscalar CPU's

A simple processor is called scalar when it contains a single functional unit for processing instructions. This single functional unit can be a single stage device or it may be pipelined as discussed above. A superscalar processor has more than one functional unit for executing commands that can run in parallel. superscalar processors are sometimes referred to as being implicitly parallel. This is like having two CPU's running simultaneously as in symmetric multiprocessor systems (SMP). SMP systems are said to be explicitly parallel. Superscalar pipelined architectures implement more than one pipelined functional unit. This offers the best of both worlds.

Speculative execution and branch prediction

Branch prediction (also called jump optimization) is useful in situations where the execution of code may take one of two or more mutually exclusive paths. For example, "IF" statements can be used to choose between two alternate possibilities referred to as branches. Branch prediction uses a special algorithm to make the best guess as to which branch will be selected. Then that branch is executed before the selection is even made (This is referred to as speculative execution). This can be very useful in superscalar processors that can use one functional unit to process the main block of code while another functional unit speculatively executes the predicted subsequent branch.

RISC, CISC, Registers and Load-and-Store

Intel's x86 family, which dates back to 1978 is based on a complex instruction set computer (CISC) design. CISC designs typically include well over 200 instructions and more than 10 indirect addressing modes. Instructions size is heterogeneous ranging from 8 to 108 bits long in x86 CPU's. Early model CISC based CPU's tend to take more than one cycle to accomplish an instruction. CISC designs are also characterized by support for high level languages and the ability to perform operations on data in the main memory.

In the mid 1980's an alternate design called reduced instruction set computer (RISC), became common in processors like the HP PA-RISC chips (1986). The philosophy of the RISC chip was to strip down the functionality of the CPU and focus on speeding up core instructions. Superfluous instructions are removed from the CPU resulting in smaller instruction set (frequently under 100 instructions) less than five addressing modes and no micro programming. Generally instruction length is uniform, 32-bits in many systems. Moreover, all operations in RISC chips are performed on registers in a load-and-store architecture.

Registers are small storage cells in the CPU that can be used to hold data. Registers are the fastest form on storage, much faster than the cache memory or RAM. However, register space is valuable as only a few registers are available. In many modern chips 32 floating point registers and 32 fixed point registers are available. Once data is loaded into the registers operations can be executed on these data rapidly.

Part of the brilliance of RISC is the decoupling of data access functions from data manipulation functions (execution of instructions). Data can be preloaded into the registers before an instruction even requests the data. This allows for greater optimization via out-of-order execution resulting in parallel execution. This can greatly increase the number of instructions per second.

However, the lack of available instructions in RISC processors can have its toll on performance and in most cases the lines between RISC and CISC processors has not been clearly defined. Modern CISC processors are called RISC-like because they have implemented many optimizations used in RISC processors at their core. Conversely most so called RISC processors are far from the RISC ideal and incorporate many instructions like a CISC design

The Merced (IA-64) EPIC and LIW

In 1999 Intel and HP will jointly release a new type of 64-bit processor called the Merced (also known as the IA-64) based on a novel explicitly parallel instruction computing (EPIC) design. EPIC is an alternative to both CISC and RISC architectures. This new architecture is a synthesis of and extension of both the Intel x86 family and the HP PA-RISC family that promises to be backward compatible with software from both. It also includes many new technologies.

The Merced is not planned to be a replacement for the 32-bit x86 family. The Merced will be a very high end and very expensive processor aimed at the high end server market. The Initially the Merced family will include CPU's with tens of millions of transistors. Eventually, later models will include hundreds of millions of transistors. It will probably be based on a new 0.18 micron process. It will include 128 fixed point general purpose registers and 128 floating point registers. In addition, 64 new predicate registers will be added. The function of the predicate registers will be explained below.

With the advent of inexpensive CMOS processors has come the promise of hundreds of millions of transistors per processor. The question is how to use all of that power. As we've seen so far, superscalar pipelined CPU's have increased performance more than five fold over scalar models. However, to utilize several powerful pipelines in parallel the processor must allocate tremendous resources to the task of optimizing code on the fly. It must seek out hidden parallelism and redirect chucks of code to each pipeline through branch prediction and speculative execution, out-of-order execution and caching of data. This already ties up a tremendous amount of processor logic. To make use of even more transistors available in the Merced one can not merely add more pipelines. To do so would cause the CPU to spend tremendous amounts of time optimizing the code on the fly and it would create a processor that would far outperform even the fastest data bus. The CPU would become data starved.

To solve these issues Intel and HP have implemented a new architecture called EPIC based on long instruction words (LIW). This new technology will off load much of the task of optimizing code to the compiler. The compiler will implement many of the tasks related to seeking for hidden parallelism in code. The compiler will create binaries based on 128-bit EPIC bundles which include a template, predicate definition, data register definitions and three instructions. The information stored in the EPIC bundle will instruct the processor how to process the instructions for optimum performance.

A CPU is intrinsically linked to its associated compiler and software libraries. The processor, its compiler and its libraries are a cohesive unit that must be intimately balanced. Modern code is written in high level procedural languages like Java, C and C++. These languages are then compiled into machine language and combined with libraries to form a binary that is specifically optimized for a specific processor. The compiler and libraries must be carefully designed to complement the CPU. A optimization function may be integrated into the CPU or the compiler. What Intel and HP have opted to do is move much of the task of optimization out of the processor and into the compiler. This means the CPU will have less work to do on the fly. It also means that compilation will take much longer, code will grow bigger and code will have to be recompiled to be optimized for different version of the Merced. In more complex cases both the CPU and the compiler work together synergistically to achieve remarkable results.

EPIC bundles are created by the compiler at compile time so the burden of optimization is no longer on the processor. Moreover all bundles are of equal size like many RISC instruction sets. EPIC compilers bundle three 40-bit instructions flanked by one 8-bit template into one larger, 128-bit EPIC (LIW) bundle. The flanking 8-bit template instructs the CPU which instructions can be executed in parallel. Each 40-bit instruction includes a the 6-bit address of one of the 64 predicate registers and three 7-bit addresses for three of the 128 floating point or fixed point registers.

When a branch is encountered in the code, the compiler assigns one predicate value to each instruction on one side of the branch and another predicate value to each instruction on the other side of the branch. When the processor executes the code it speculatively executes both branches. When the results are returned it checks the respective predicate registers for a value of 1 or 0. The branch with a predicate value of 1 is returned. The results of the alternate branch are discarded. Note this is very different from current processors that use branch prediction followed by speculative execution of only one of the branches. The Merced will speculatively execute all branches. It will then use the predicate register to determine which results should be eliminated. If for some reason the Merced can not speculatively execute both branches its behavior will default back to a normal Pentium with branch prediction followed by speculative execution of only one of the branches. This may sound like a lot of extra work. However, the Merced has pipelines to spare and in actual tests performance has been doubled using this method.

Moreover, the instructions are re-ordered by the compiler. If a branch is encountered the first instruction of each of the two branches may be packaged into the same EPIC bundle. The next set of instructions from each branch are put into the next bundle and so forth. (Actually there is some overlapping as three instructions are included in each bundle). The processor can then open the bundles and toss instructions from separate branches into separate pipes in parallel, respectively.

The Merced also utilizes out-of-order execution for speculative loading. With speculative loading the compiler arranges code so that instructions to fetch data are speculatively executed long before the instructions that use the data. It then includes a speculative check instruction to check for the data before actually attempting to use these data. This helps to overcome the Von-Neumann bottleneck.

See also:

http://patent.womplex.ibm.com/details?patent_number=5638525

http://www.intel.com/pressroom/archive/releases/sp100997.HTM

MIPS and Clock Rate

Processor performance is generally given by a clock rate. For example a 300MHz Pentium II is faster than a 266MHz Pentium II. Each megahertz (MHz) represents one million switching operations per second. However, we must also account for how many instructions a given processor can process per cycle. If we multiply the instructions per cycle by the clock rate we get a measurement of the number of Millions of Instructions per Second (MIPS). The MIPS value is a more accurate measurement for comparing CPU' of different architectures. The term cycles per instruction (CPI) is also used as the inverse of the instructions per cycle.

MMX Processors and their design enhancements

Originally dubbed the Matrix Math Extensions (MMX) this new technology was renamed to Multi-Media Extensions (MMX) for marketing reasons. Pentium, Pentium II, K6 and M2 CPU's that implement MMX include 57 new instructions that assist the processor in doing Matrix Math used in many graphics packages. This is not the first attempt to implement a MMX-like instruction set. TI and others have attempted this before but failed due to lack of industry clout.

Probably more important than the 57 new instructions are the actual design changes made with the introduction of MMX. The Pentium Pro was given a face lift as the Pentium II (a Pentium Pro with MMX). Along with MMX and much higher clock rates, the Pentium II L1 cache was doubled in size from 16KB in the Pentium Pro to 32KB in the Pentium II. Segment register caches were added to boost 16-bit code performance in the Pentium II and Deeper write buffers were implemented. The transactional L2 cache is still attached to a backside bus. However, the clock speeds was set to one half the internal clock rate. In total the Pentium II is much faster than its ancestor, the Pentium Pro.

The new Pentium MMX CPU's sport an internal 32 KB quad-associative L1 cache which is double the size of the 2-way set associative cache in non-MMX Pentium CPU's. (See the RAM page for a discussion of cache architectures). The Pentium MMX CPU's have also implemented some technology previously only found in the Pentium Pro and Pentium II CPU's. The new MMX will send out up to 16 requests before it must have a response back. In contrast non-MMX Pentiums and Pentium Pro's allow one request and up to 8 requests respectively before the processor receives a response. In addition, many other improvements were made to the Pentium MMX including better branch prediction, an improved instruction decoder, improved use of the second pipeline and deepening of the internal pipelines.

Non-MMX Pentium and Pentium Pro's are