Apache Hadoop YARN
Vinod Kumar Vavilapalli, Arun Murthy, Joseph Niemiec
Apache Hadoop YARN
Vinod Kumar Vavilapalli, Arun Murthy, Joseph Niemiec
- Producent: Addison Wesley Publishing Company
- Rok produkcji: 2014
- ISBN: 9780321934505
- Ilość stron: 400
- Oprawa: Miękka
Niedostępna
Opis: Apache Hadoop YARN - Vinod Kumar Vavilapalli, Arun Murthy, Joseph Niemiec
"This book is a critically needed resource for the newly released Apache Hadoop 2.0, highlighting YARN as the significant breakthrough that broadens Hadoop beyond the MapReduce paradigm." -From the Foreword by Raymie Stata, CEO of Altiscale The Insider's Guide to Building Distributed, Big Data Applications with Apache Hadoop(TM) YARN Apache Hadoop is helping drive the Big Data revolution. Now, its data processing has been completely overhauled: Apache Hadoop YARN provides resource management at data center scale and easier ways to create distributed applications that process petabytes of data. And now in Apache Hadoop(TM) YARN, two Hadoop technical leaders show you how to develop new applications and adapt existing code to fully leverage these revolutionary advances. YARN project founder Arun Murthy and project lead Vinod Kumar Vavilapalli demonstrate how YARN increases scalability and cluster utilization, enables new programming models and services, and opens new options beyond Java and batch processing. They walk you through the entire YARN project lifecycle, from installation through deployment. You'll find many examples drawn from the authors' cutting-edge experience-first as Hadoop's earliest developers and implementers at Yahoo! and now as Hortonworks developers moving the platform forward and helping customers succeed with it. Coverage includes * YARN's goals, design, architecture, and components-how it expands the Apache Hadoop ecosystem * Exploring YARN on a single node * Administering YARN clusters and Capacity Scheduler * Running existing MapReduce applications * Developing a large-scale clustered YARN application * Discovering new open source frameworks that run under YARN " This book is a desperately needed resource for administrators, developers, and power-users of the Hadoop YARN framework. It does an excellent job of documenting the (often unknown) history that inevitably lead up to YARN from previous versions of Hadoop, which provides a valuable canvas against which to present the remaining pragmatically-oriented text. Moving from the history of YARN, it wisely jumps right into getting the reader up and running with their own YARN setup (on a single machine or on a larger cluster) such that the rest of the text is not merely conjecturing, but real guidance for a real instance of YARN. Chapters 7 and 8 were the ones I was most looking forward to in the text from the start, as those "core" components of YARN are some of the ones which are least understood and yet concurrently most impacting on performance. They did not disappoint." - Ellis H. Wilson III, Storage ScientistForeword by Raymie Stata xiii Foreword by Paul Dix xv Preface xvii Acknowledgments xxi About the Authors xxv Chapter 1: Apache Hadoop YARN: A Brief History and Rationale 1 Introduction 1 Apache Hadoop 2 Phase 0: The Era of Ad Hoc Clusters 3 Phase 1: Hadoop on Demand 3 Phase 2: Dawn of the Shared Compute Clusters 9 Phase 3: Emergence of YARN 18 Conclusion 20 Chapter 2: Apache Hadoop YARN Install Quick Start 21 Getting Started 22 Steps to Configure a Single-Node YARN Cluster 22 Run Sample MapReduce Examples 30 Wrap-up 31 Chapter 3: Apache Hadoop YARN Core Concepts 33 Beyond MapReduce 33 Apache Hadoop MapReduce 35 Apache Hadoop YARN 38 YARN Components 39 Wrap-up 42 Chapter 4: Functional Overview of YARN Components 43 Architecture Overview 43 ResourceManager 45 YARN Scheduling Components 46 Containers 49 NodeManager 49 ApplicationMaster 50 YARN Resource Model 50 Managing Application Dependencies 53 Wrap-up 57 Chapter 5: Installing Apache Hadoop YARN 59 The Basics 59 System Preparation 60 Script-based Installation of Hadoop 2 62 Script-based Uninstall 68 Configuration File Processing 68 Configuration File Settings 68 Start-up Scripts 71 Installing Hadoop with Apache Ambari 71 Wrap-up 84 Chapter 6: Apache Hadoop YARN Administration 85 Script-based Configuration 85 Monitoring Cluster Health: Nagios 90 Real-time Monitoring: Ganglia 97 Administration with Ambari 99 JVM Analysis 103 Basic YARN Administration 106 Wrap-up 114 Chapter 7: Apache Hadoop YARN Architecture Guide 115 Overview 115 ResourceManager 117 NodeManager 127 ApplicationMaster 138 YARN Containers 148 Summary for Application-writers 150 Wrap-up 151 Chapter 8: Capacity Scheduler in YARN 153 Introduction to the Capacity Scheduler 153 Capacity Scheduler Configuration 155 Queues 156 Hierarchical Queues 156 Queue Access Control 159 Capacity Management with Queues 160 User Limits 163 Reservations 166 State of the Queues 167 Limits on Applications 168 User Interface 169 Wrap-up 169 Chapter 9: MapReduce with Apache Hadoop YARN 171 Running Hadoop YARN MapReduce Examples 171 MapReduce Compatibility 181 The MapReduce ApplicationMaster 181 Calculating the Capacity of a Node 182 Changes to the Shuffle Service 184 Running Existing Hadoop Version 1 Applications 184 Running MapReduce Version 1 Existing Code 187 Advanced Features 188 Wrap-up 190 Chapter 10: Apache Hadoop YARN Application Example 191 The YARN Client 191 The ApplicationMaster 208 Wrap-up 226 Chapter 11: Using Apache Hadoop YARN Distributed-Shell 227 Using the YARN Distributed-Shell 227 Internals of the Distributed-Shell 232 Wrap-up 240 Chapter 12: Apache Hadoop YARN Frameworks 241 Distributed-Shell 241 Hadoop MapReduce 241 Apache Tez 242 Apache Giraph 242 Hoya: HBase on YARN 243 Dryad on YARN 243 Apache Spark 244 Apache Storm 244 REEF: Retainable Evaluator Execution Framework 245 Hamster: Hadoop and MPI on the Same Cluster 245 Wrap-up 245 Appendix A: Supplemental Content and Code Downloads 247 Available Downloads 247 Appendix B: YARN Installation Scripts 249 install-hadoop2.sh 249 uninstall-hadoop2.sh 256 hadoop-xml-conf.sh 258 Appendix C: YARN Administration Scripts 263 configure-hadoop2.sh 263 Appendix D: Nagios Modules 269 check_resource_manager.sh 269 check_data_node.sh 271 check_resource_manager_old_space_pct.sh 272 Appendix E: Resources and Additional Information 277 Appendix F: HDFS Quick Reference 279 Quick Command Reference 279 Index 287
Szczegóły: Apache Hadoop YARN - Vinod Kumar Vavilapalli, Arun Murthy, Joseph Niemiec
Tytuł: Apache Hadoop YARN
Autor: Vinod Kumar Vavilapalli, Arun Murthy, Joseph Niemiec
Producent: Addison Wesley Publishing Company
ISBN: 9780321934505
Rok produkcji: 2014
Ilość stron: 400
Oprawa: Miękka
Waga: 0.52 kg