Library Hours
Monday to Friday: 9 a.m. to 9 p.m.
Saturday: 9 a.m. to 5 p.m.
Sunday: 1 p.m. to 9 p.m.
Naper Blvd. 1 p.m. to 5 p.m.
     
Limit search to available items
Results Page:  Previous Next

Title Optimizing HPC applications with Intel® cluster tools / Alexander Supalov, Andrey Semin, Michael Klemm, Christopher Dahnken. [O'Reilly electronic resource]

Publication Info. Berkeley, CA : ApressOpen, 2014.
New York, NY : Distributed to the Book trade worldwide by Springer
©2014
QR Code
Description 1 online resource (xxiv, 265 pages) : illustrations
text file
PDF
Series The expert's voice in software engineering
Expert's voice in software engineering.
Note Includes index.
Summary Optimizing HPC Applications with Intel® Cluster Tools takes the reader on a tour of the fast-growing area of high performance computing and the optimization of hybrid programs. These programs typically combine distributed memory and shared memory programming models and use the Message Passing Interface (MPI) and OpenMP for multi-threading to achieve the ultimate goal of high performance at low power consumption on enterprise-class workstations and compute clusters. The book focuses on optimization for clusters consisting of the Intel® Xeon processor, but the optimization methodologies also apply to the Intel® Xeon PhiTM coprocessor and heterogeneous clusters mixing both architectures. Besides the tutorial and reference content, the authors address and refute many myths and misconceptions surrounding the topic. The text is augmented and enriched by descriptions of real-life situations.
Bibliography Includes bibliographical references and index.
Language English.
Contents Ch. 1 No Time to Read This Book? -- Using Intel MPI Library -- Using Intel Composer XE -- Tuning Intel MPI Library -- Gather Built-in Statistics -- Optimize Process Placement -- Optimize Thread Placement -- Tuning Intel Composer XE -- Analyze Optimization and Vectorization Reports -- Use Interprocedural Optimization -- Summary -- References -- ch. 2 Overview of Platform Architectures -- Performance Metrics and Targets -- Latency, Throughput, Energy, and Power -- Peak Performance as the Ultimate Limit -- Scalability and Maximum Parallel Speedup -- Bottlenecks and a Bit of Queuing Theory -- Roofline Model -- Performance Features of Computer Architectures -- Increasing Single-Threaded Performance: Where You Can and Cannot Help -- Process More Data with SIMD Parallelism -- Distributed and Shared Memory Systems -- HPC Hardware Architecture Overview -- A Multicore Workstation or a Server Compute Node -- Coprocessor for Highly Parallel Applications -- Group of Similar Nodes Form an HPC Cluster -- Other Important Components of HPC Systems -- Summary -- References -- ch. 3 Top-Down Software Optimization -- The Three Levels and Their Impact on Performance -- System Level -- Application Level -- Microarchitecture Level -- Closed-Loop Methodology -- Workload, Application, and Baseline -- Iterating the Optimization Process -- Summary -- References -- ch. 4 Addressing System Bottlenecks -- Classifying System-Level Bottlenecks -- Identifying Issues Related to System Condition -- Characterizing Problems Caused by System Configuration -- Understanding System-Level Performance Limits -- Checking General Compute Subsystem Performance -- Testing Memory Subsystem Performance -- Testing I/O Subsystem Performance -- Characterizing Application System-Level Issues -- Selecting Performance Characterization Tools -- Monitoring the I/O Utilization -- Analyzing Memory Bandwidth -- Summary -- References -- ch. 5 Addressing Application Bottlenecks: Distributed Memory -- Algorithm for Optimizing MPI Performance -- Comprehending the Underlying MPI Performance -- Recalling Some Benchmarking Basics -- Gauging Default Intranode Communication Performance -- Gauging Default Internode Communication Performance -- Discovering Default Process Layout and Pinning Details -- Gauging Physical Core Performance -- Doing Initial Performance Analysis -- Is It Worth the Trouble? -- Getting an Overview of Scalability and Performance -- Learning Application Behavior -- Choosing Representative Workload(s) -- Balancing Process and Thread Parallelism -- Doing a Scalability Review -- Analyzing the Details of the Application Behavior -- Choosing the Optimization Objective -- Detecting Load Imbalance -- Dealing with Load Imbalance -- Classifying Load Imbalance -- Addressing Load Imbalance -- Optimizing MPI Performance -- Classifying the MPI Performance Issues -- Addressing MPI Performance Issues -- Mapping Application onto the Platform -- Tuning the Intel MPI Library -- Optimizing Application for Intel MPI -- Using Advanced Analysis Techniques -- Automatically Checking MPI Program Correctness -- Comparing Application Traces -- Instrumenting Application Code -- Correlating MPI and Hardware Events -- Summary -- References -- ch. 6 Addressing Application Bottlenecks: Shared Memory -- Profiling Your Application -- Using VTune Amplifier XE for Hotspots Profiling -- Hotspots for the HPCG Benchmark -- Compiler-Assisted Loop/Function Profiling -- Sequential Code and Detecting Load Imbalances -- Thread Synchronization and Locking -- Dealing with Memory Locality and NUMA Effects -- Thread and Process Pinning -- Controlling OpenMP Thread Placement -- Thread Placement in Hybrid Applications -- Summary -- References -- ch. 7 Addressing Application Bottlenecks: Microarchitecture -- Overview of a Modern Processor Pipeline -- Pipelined Execution -- Out-of-order vs. In-order Execution -- Superscalar Pipelines -- SIMD Execution -- Speculative Execution: Branch Prediction -- Memory Subsystem -- Putting It All Together: A Final Look at the Sandy Bridge Pipeline -- A Top-down Method for Categorizing the Pipeline Performance -- Intel Composer XE Usage for Microarchitecture Optimizations -- Basic Compiler Usage and Optimization -- Using Optimization and Vectorization Reports to Read the Compiler's Mind -- Optimizing for Vectorization -- Dealing with Disambiguation -- Dealing with Branches -- When Optimization Leads to Wrong Results -- Analyzing Pipeline Performance with Intel VTune Amplifier XE -- Using a Standard Library Method -- Summary -- References -- ch. 8 Application Design Considerations -- Abstraction and Generalization of the Platform Architecture -- Types of Abstractions -- Levels of Abstraction and Complexities -- Raw Hardware vs. Virtualized Hardware in the Cloud -- Questions about Application Design -- Designing for Performance and Scaling -- Designing for Flexibility and Performance Portability -- Understanding Bounds and Projecting Bottlenecks -- Data Storage or Transfer vs. Recalculation -- Total Productivity Assessment -- Summary -- References.
Subject High performance computing.
Supercomputers.
Superinformatique.
Superordinateurs.
Computer science.
High performance computing
Supercomputers
Indexed Term Computer science
Added Author Supalov, Alexander, author.
Other Form: Printed edition: 9781430264965
ISBN 9781430264972 (electronic bk.)
1430264977 (electronic bk.)
1430264969 (print)
9781430264965 (print)
Standard No. 10.1007/978-1-4302-6497-2 doi
9781430264972
Patron reviews: add a review
Click for more information
EBOOK
No one has rated this material

You can...
Also...
- Find similar reads
- Add a review
- Sign-up for Newsletter
- Suggest a purchase
- Can't find what you want?
More Information