The CUDA Handbook

Nicholas Wilt

The CUDA Handbook

Nicholas Wilt

Producent: Addison Wesley Publishing Company
Rok produkcji: 2013
ISBN: 9780321809469
Ilość stron: 528
Oprawa: Miękka

Wysyłka:
Niedostępna

Cena katalogowa 197,00 PLN brutto

Cena dostępna po zalogowaniu

Dodaj do Schowka

Cena 197,00 PLN

Dodaj do Schowka

Opis: The CUDA Handbook - Nicholas Wilt

The CUDA Handbook begins where CUDA by Example (Addison-Wesley, 2011) leaves off, discussing CUDA hardware and software in greater detail and covering both CUDA 5.0 and Kepler. Every CUDA developer, from the casual to the most sophisticated, will find something here of interest and immediate usefulness. Newer CUDA developers will see how the hardware processes commands and how the driver checks progress; more experienced CUDA developers will appreciate the expert coverage of topics such as the driver API and context migration, as well as the guidance on how best to structure CPU/GPU data interchange and synchronization. The accompanying open source code-more than 25,000 lines of it, freely available at www.cudahandbook.com-is specifically intended to be reused and repurposed by developers. Designed to be both a comprehensive reference and a practical cookbook, the text is divided into the following three parts: Part I, Overview, gives high-level descriptions of the hardware and software that make CUDA possible. Part II, Details, provides thorough descriptions of every aspect of CUDA, including * Memory * Streams and events * Models of execution, including the dynamic parallelism feature, new with CUDA 5.0 and SM 3.5 * The streaming multiprocessors, including descriptions of all features through SM 3.5 * Programming multiple GPUs * Texturing The source code accompanying Part II is presented as reusable microbenchmarks and microdemos, designed to expose specific hardware characteristics or highlight specific use cases. Part III, Select Applications, details specific families of CUDA applications and key parallel algorithms, including * Streaming workloads * Reduction * Parallel prefix sum (Scan) * N-body * Image ProcessingThese algorithms cover the full range of potential CUDA applications.Preface xxi Acknowledgments xxiii About the Author xxv Part I: 1 Chapter 1: Background 3 1.1 Our Approach 5 1.2 Code 6 1.3 Administrative Items 7 1.4 Road Map 8 Chapter 2: Hardware Architecture 11 2.1 CPU Configurations 11 2.2 Integrated GPUs 17 2.3 Multiple GPUs 19 2.4 Address Spaces in CUDA 22 2.5 CPU/GPU Interactions 32 2.6 GPU Architecture 41 2.7 Further Reading 50 Chapter 3: Software Architecture 51 3.1 Software Layers 51 3.2 Devices and Initialization 59 3.3 Contexts 67 3.4 Modules and Functions 71 3.5 Kernels (Functions) 73 3.6 Device Memory 75 3.7 Streams and Events 76 3.8 Host Memory 79 3.9 CUDA Arrays and Texturing 82 3.10 Graphics Interoperability 86 3.11 The CUDA Runtime and CUDA Driver API 87 Chapter 4: Software Environment 93 4.1 nvcc-CUDA Compiler Driver 93 4.2 ptxas-the PTX Assembler 100 4.3 cuobjdump 105 4.4 nvidia-smi 106 4.5 Amazon Web Services 109 Part II: 119 Chapter 5: Memory 121 5.1 Host Memory 122 5.2 Global Memory 130 5.3 Constant Memory 156 5.4 Local Memory 158 5.5 Texture Memory 162 5.6 Shared Memory 162 5.7 Memory Copy 164 Chapter 6: Streams and Events 173 6.1 CPU/GPU Concurrency: Covering Driver Overhead 174 6.2 Asynchronous Memcpy 178 6.3 CUDA Events: CPU/GPU Synchronization 183 6.4 CUDA Events: Timing 186 6.5 Concurrent Copying and Kernel Processing 187 6.6 Mapped Pinned Memory 197 6.7 Concurrent Kernel Processing 199 6.8 GPU/GPU Synchronization: cudaStreamWaitEvent() 202 6.9 Source Code Reference 202 Chapter 7: Kernel Execution 205 7.1 Overview 205 7.2 Syntax 206 7.3 Blocks, Threads, Warps, and Lanes 211 7.4 Occupancy 220 7.5 Dynamic Parallelism 222 Chapter 8: Streaming Multiprocessors 231 8.1 Memory 233 8.2 Integer Support 241 8.3 Floating-Point Support 244 8.4 Conditional Code 267 8.5 Textures and Surfaces 269 8.6 Miscellaneous Instructions 270 8.7 Instruction Sets 275 Chapter 9: Multiple GPUs 287 9.1 Overview 287 9.2 Peer-to-Peer 288 9.3 UVA: Inferring Device from Address 291 9.4 Inter-GPU Synchronization 292 9.5 Single-Threaded Multi-GPU 294 9.6 Multithreaded Multi-GPU 299 Chapter 10: Texturing 305 10.1 Overview 305 10.2 Texture Memory 306 10.3 1D Texturing 314 10.4 Texture as a Read Path 317 10.5 Texturing with Unnormalized Coordinates 323 10.6 Texturing with Normalized Coordinates 331 10.7 1D Surface Read/Write 333 10.8 2D Texturing 335 10.9 2D Texturing: Copy Avoidance 338 10.10 3D Texturing 340 10.11 Layered Textures 342 10.12 Optimal Block Sizing and Performance 343 10.13 Texturing Quick References 345 Part III: 351 Chapter 11: Streaming Workloads 353 11.1 Device Memory 355 11.2 Asynchronous Memcpy 358 11.3 Streams 359 11.4 Mapped Pinned Memory 361 11.5 Performance and Summary 362 Chapter 12: Reduction 365 12.1 Overview 365 12.2 Two-Pass Reduction 367 12.3 Single-Pass Reduction 373 12.4 Reduction with Atomics 376 12.5 Arbitrary Block Sizes 377 12.6 Reduction Using Arbitrary Data Types 378 12.7 Predicate Reduction 382 12.8 Warp Reduction with Shuffle 382 Chapter 13: Scan 385 13.1 Definition and Variations 385 13.2 Overview 387 13.3 Scan and Circuit Design 390 13.4 CUDA Implementations 394 13.5 Warp Scans 407 13.6 Stream Compaction 414 13.7 References (Parallel Scan Algorithms) 418 13.8 Further Reading (Parallel Prefix Sum Circuits) 419 Chapter 14: N-Body 421 14.1 Introduction 423 14.2 Naive Implementation 428 14.3 Shared Memory 432 14.4 Constant Memory 434 14.5 Warp Shuffle 436 14.6 Multiple GPUs and Scalability 438 14.7 CPU Optimizations 439 14.8 Conclusion 444 14.9 References and Further Reading 446 Chapter 15: Image Processing: Normalized Correlation 449 15.1 Overview 449 15.2 Naive Texture-Texture Implementation 452 15.3 Template in Constant Memory 456 15.4 Image in Shared Memory 459 15.5 Further Optimizations 463 15.6 Source Code 465 15.7 Performance and Further Reading 466 15.8 Further Reading 469 Appendix A: The CUDA Handbook Library 471 A.1 Timing 471 A.2 Threading 472 A.3 Driver API Facilities 474 A.4 Shmoos 475 A.5 Command Line Parsing 476 A.6 Error Handling 477 Glossary / TLA Decoder 481 Index 487

Szczegóły: The CUDA Handbook - Nicholas Wilt

Tytuł: The CUDA Handbook
Autor: Nicholas Wilt
Producent: Addison Wesley Publishing Company
ISBN: 9780321809469
Rok produkcji: 2013
Ilość stron: 528
Oprawa: Miękka
Waga: 0.85 kg

The CUDA Handbook

Nicholas Wilt

The CUDA Handbook

Nicholas Wilt

Opis: The CUDA Handbook - Nicholas Wilt

Szczegóły: The CUDA Handbook - Nicholas Wilt

Recenzje: The CUDA Handbook - Nicholas Wilt

Informacje: