Service Manuals, User Guides, Schematic Diagrams or docs for : Intel Pentium IV Hyper-Threading

<< Back | Home

Most service manuals and schematics are PDF files, so You will need Adobre Acrobat Reader to view : Acrobat Download Some of the files are DjVu format. Readers and resources available here : DjVu Resources
For the compressed files, most common are zip and rar. Please, extract files with Your favorite compression software ( WinZip, WinRAR ... ) before viewing. If a document has multiple parts, You should download all, before extracting.
Good luck. Repair on Your own risk. Make sure You know what You are doing.

Image preview - the first page of the document

>> Download Pentium IV Hyper-Threading documenatation <<

Text preview - extract from the document

          Multiprogramming Performance of the Pentium 4 with
                          Hyper-Threading
                                   James R. Bulpinand Ian A. Pratt

                           University of Cambridge Computer Laboratory
                         J J Thomson Avenue, Cambridge, UK, CB3 0FD.
                                       Tel: +44 1223 331859.
                                   [email protected]


Abstract                                                   1    Introduction

                                                           Intel Corporation's "Hyper-Threading" technol-
Simultaneous multithreading (SMT) is a very fine           ogy [6] introduced into the Pentium 4 [3] line of pro-
grained form of hardware multithreading that allows        cessors is the first commercial implementation of si-
simultaneous execution of more than one thread with-       multaneous multithreading (SMT). SMT is a form
out the notion of an internal context switch. The          of hardware multithreading building on dynamic is-
fine grained sharing of processor resources means that     sue superscalar processor cores [15, 14, 1, 5]. The
threads can impact each others' performance.               main advantage of SMT is its ability to better utilise
                                                           processor resources and to hide memory hierarchy
Tuck and Tullsen first published measurements of the       latency by being able to provide more independent
performance of the SMT Pentium 4 processor with            work to keep the processor busy. Other architectures
Hyper-Threading [12]. Of particular interest is their      for simultaneous multithreading and hardware mul-
evaluation of the multiprogrammed performance of           tithreading in general are described elsewhere [16].
the processor by concurrently running pairs of single-
                                                           Hyper-Threading currently supports two heavy
threaded benchmarks. In this paper we present experi-
                                                           weight threads (processes) per processor, presenting
ments and results obtained independently that confirm
                                                           the abstraction of two independent logical processors.
their observations. We extend the measurements to
                                                           The physical processor contains a mixture of dupli-
consider the mutual fairness of simultaneously execut-
                                                           cated (per-thread) resources such as the instruction
ing threads (an area hinted at but not covered in detail
                                                           queue; shared resources tagged by thread number
by Tuck and Tullsen) and compare the multiprogram-
                                                           such as the DTLB and trace cache; and dynamically
ming performance of pairs of benchmarks running on
                                                           shared resources such as the execution units. The
the Hyper-Threaded SMT system and on a compara-
                                                           resource partitioning is summarised in table 1. The
ble SMP system.
                                                           scheduling of instructions to execution units is pro-
                                                           cess independent although there are limits on how
We show that there can be considerable bias in the         many instructions each process can have queued to
performance of simultaneously executing pairs and          try to maintain fairness.
investigate the reasons for this. We show that the
performance gap between SMP and Hyper-Threaded             Whilst the logical processors are functionally in-
SMT for multiprogrammed workloads is often lower           dependent, contention for resources will affect the
than might be expected, an interesting result given        progress of the processes. Compute-bound processes
the obvious economic and energy consumption advan-         will suffer contention for execution units while pro-
tages of the latter.                                       cesses making more use of memory will contend for
                                                           use of the cache with the possible result of increased
                                                           capacity and conflict misses. With cooperating pro-
   James Bulpin is funded by a CASE award from Marconi     cesses the sharing of the cache may be useful but for
Corporation plc. and EPSRC                                 two arbitrary processes the contention may have a
                      Duplicated               Shared                    Tagged/Partitioned
         Fetch        ITLB                     Microcode ROM             Trace cache
                      Streaming buffers
         Branch       Return stack buffer                                Global history array
         prediction   Branch history buffer
         Decode       State                    Logic                     uOp queue (partitioned)
         Execute      Register rename          Instruction schedulers    Retirement
                                                                         Reorder buffer
                                                                         (up to 50% use per thread)
         Memory                                Caches                    DTLB

                       Table 1: Resource division on Hyper-Threaded P4 processors.


negative effect.

◦ Jabse Service Manual Search 2024 ◦ Jabse Pravopis ◦ onTap.bg ◦ Other service manual resources online : Fixya ◦ eServiceinfo