• Producent: VMWare Press
  • Rok produkcji: 2015
  • ISBN: 9780133811025
  • Ilość stron: 480
  • Oprawa: Miękka
Wysyłka:
Niedostępna
Cena katalogowa 165,00 PLN brutto
Cena dostępna po zalogowaniu
Dodaj do Schowka
Zaloguj się
Przypomnij hasło
×
×
Cena 165,00 PLN
Dodaj do Schowka
Zaloguj się
Przypomnij hasło
×
×

Opis: Virtualizing Hadoop - Steve Jones, Justin Murray, George Trujillo

Plan and Implement Hadoop Virtualization for Maximum Performance, Scalability, and Business Agility Enterprises running Hadoop must absorb rapid changes in big data ecosystems, frameworks, products, and workloads. Virtualized approaches can offer important advantages in speed, flexibility, and elasticity. Now, a world-class team of enterprise virtualization and big data experts guide you through the choices, considerations, and tradeoffs surrounding Hadoop virtualization. The authors help you decide whether to virtualize Hadoop, deploy Hadoop in the cloud, or integrate conventional and virtualized approaches in a blended solution. First, Virtualizing Hadoop reviews big data and Hadoop from the standpoint of the virtualization specialist. The authors demystify MapReduce, YARN, and HDFS and guide you through each stage of Hadoop data management. Next, they turn the tables, introducing big data experts to modern virtualization concepts and best practices. Finally, they bring Hadoop and virtualization together, guiding you through the decisions you'll face in planning, deploying, provisioning, and managing virtualized Hadoop. From security to multitenancy to day-to-day management, you'll find reliable answers for choosing your best Hadoop strategy and executing it. Coverage includes the following: * Reviewing the frameworks, products, distributions, use cases, and roles associated with Hadoop * Understanding YARN resource management, HDFS storage, and I/O * Designing data ingestion, movement, and organization for modern enterprise data platforms * Defining SQL engine strategies to meet strict SLAs * Considering security, data isolation, and scheduling for multitenant environments * Deploying Hadoop as a service in the cloud * Reviewing the essential concepts, capabilities, and terminology of virtualization * Applying current best practices, guidelines, and key metrics for Hadoop virtualization * Managing multiple Hadoop frameworks and products as one unified system * Virtualizing master and worker nodes to maximize availability and performance * Installing and configuring Linux for a Hadoop environmentForeword xix Preface xxi Part I: Introduction to Hadoop Chapter 1 Understanding the Big Data World 1 The Data Revolution 2 Traditional Data Systems 4 Semi-Structured and Unstructured Data 5 Causation and Correlation 7 Data Challenges 8 The Modern Data Architecture 17 Organizational Transformations 20 Industry Transformation 21 Summary 22 Chapter 2 Hadoop Fundamental Concepts 23 Types of Data in Hadoop 23 Use Cases 25 What Is Hadoop? 26 Hadoop Distributions 32 Hadoop Frameworks 32 NoSQL Databases 37 What Is NoSQL? 38 A Hadoop Cluster 42 Hadoop Software Processes 45 Hadoop Hardware Profiles 48 Roles in the Hadoop Environment 56 Summary 59 Chapter 3 YARN and HDFS 61 A Hadoop Cluster Is Distributed 61 Hadoop Directory Layouts 65 Hadoop Operating System Users 67 The Hadoop Distributed File System 67 YARN Logging 70 The NameNode 70 The DataNode 71 Block Placement 75 NameNode Configurations and Managing Metadata 77 Rack Awareness 82 Block Management 83 The Balancer 84 Maintaining Data Integrity in the Cluster 84 Quotas and Trash 92 YARN and the YARN Processing Model 93 Running Applications on YARN 101 Resource Schedulers 107 Benchmarking 112 TeraSort Benchmarking Suite 115 Summary 117 Chapter 4 The Modern Data Platform 119 Designing a Hadoop Cluster 119 Enterprise Data Movement 124 Summary 140 Chapter 5 Data Ingestion 141 Extraction, Loading, and Transformation (ELT) 141 Sqoop: Data Movement with SQL Sources 143 Flume: Streaming Data 148 Oozie: Scheduling and Workfl ow 167 Falcon: Data Lifecycle Management 172 Kafka: Real-time Data Streaming 176 Summary 186 Chapter 6 Hadoop SQL Engines 187 Where SQL Was Born 187 SQL in Hadoop 188 Hadoop SQL Engines 190 Selecting the SQL Tool For Hadoop 190 Now Getting Groovy with Hive and Pig 198 Hive 199 HCatalog 213 Pig 215 Summary 221 Chapter 7 Multitenancy in Hadoop 223 Securing the Access 224 Authentication 225 Auditing 230 Authorization 230 Data Protection 232 Isolating the Data 241 Isolating the Process 251 Summary 255 Part II: Introduction to Virtualization Chapter 8 Virtualization Fundamentals 257 Why Virtualize Hadoop? 258 Introduction to Virtualization 261 Summary 276 References 276 Chapter 9 Best Practices for Virtualizing Hadoop 277 Running Virtualized Hadoop with Purpose and Discipline 277 The Discipline of Purpose Starts with a Clear Target 279 Virtualizing Different Tiers of Hadoop 280 Industry Best Practices 282 Summary 298 Part III: Virtualizing Hadoop Chapter 10 Virtualizing Hadoop 299 How Are Hadoop Ecosystems Going to Be Managed? 300 Building an Enterprise Hadoop Platform That Is Agile and Flexible 301 Clarification of Terms 302 The Journey from Bare-Metal to Virtualization 303 Why Consider Virtualizing Hadoop? 304 Benefits of Virtualizing Hadoop 305 Virtualized Hadoop Can Run as Fast or Faster Than Native 306 Coordination and Cross-Purpose Specialization Is the Future 309 Barriers Can Be Organizational 310 Virtualization Is Not an All or Nothing Option 310 Rapid Provisioning and Improving Quality of Development and Test Environments 311 Improve High Availability with Virtualization 313 Use Virtualization to Leverage Hadoop Workloads 313 Hadoop in the Cloud 314 Big Data Extensions 314 The Path to Virtualization 315 The Software-Defined Data Center 316 Virtualizing the Network 318 vRealize Suite 320 Summary 321 References 322 Chapter 11 Virtualizing Hadoop Master Servers 323 Virtualizing Servers in a Hadoop Cluster 324 Virtualizing the Environment Around Hadoop 325 Virtualizing the Master Hadoop Servers 325 Virtualizing Without the SAN 330 Summary 331 Chapter 12 Virtualizing the Hadoop Worker Nodes 333 A Brief Introduction to the Worker Nodes in Hadoop 333 Deployment Models for Hadoop Clusters 335 The Combined Model 336 The Separated Model 339 Network Effects of the Data-Compute Separation 341 The Shared-Storage Approach to the Data-Compute Separated Model 343 Local Disks for the Application's Temporary Data 345 The Shared Storage Architecture Model Using Network-Attached Storage (NAS) 345 Deployment Model Summary 348 Best Practices for Virtualizing Hadoop Workers 349 Disk I/O 349 The Hadoop Virtualization Extensions (HVE) 354 Summary 357 References 358 Resources 358 Chapter 13 Deploying Hadoop as a Service in the Private Cloud 361 The Cloud Context 361 Stakeholders for Hadoop 362 Overview of the Solution Architecture 368 Summary 370 References 371 Chapter 14 Understanding the Installation of Hadoop 373 Map the Right Solutions to the Right Use Case 373 Thoughts About Installing Hadoop 374 Configuring Repositories 376 Installing HDP 2.2 378 Environment Preparation 378 Setting Up the Hadoop Configuration 389 Starting HDFS and YARN 393 Start YARN 396 Verifying MapReduce Functionality 398 Installing and Configuring Hive 400 Installing and Configuring MySQL Database 401 Installing and Configuring Hive and HCatalog 401 Summary 404 Chapter 15 Configuring Linux for Hadoop 405 Supported Linux Platforms 406 Different Deployment Models 406 Linux Golden Templates 407 Building a Linux Enterprise Hadoop Platform 408 Selecting the Linux Distribution 411 Optimal Linux Kernel Parameters and System Settings 411 epoll 411 Disable Swap Space 412 Disable Security During Install 412 IO Scheduler Tuning 414 Check Transparent Huge Pages Configuration 414 Limits.conf 414 Partition Alignment for RDMs 415 File System Considerations 416 Lazy Count Parameter for XFS 418 Mount Options 418 I/O Scheduler 419 Disk Read and Write Options 421 Storage Benchmarking 421 Java Version 422 Set Up NTP 423 Enable Jumbo Frames 424 Additional Network Considerations 425 Summary 427 Appendix A Hadoop Cluster Creation: A Prerequisite Checklist 429 Appendix B Big Data/Hadoop on VMware vSphere Reference Materials 433 Deployment Guides 433 Reference Architectures 434 Customer Case Studies 434 Performance 434 vSphere Big Data Extensions (BDE) 435 Other vSphere Features and Big Data 436 9780133811025 TOC 7/7/2015


Szczegóły: Virtualizing Hadoop - Steve Jones, Justin Murray, George Trujillo

Tytuł: Virtualizing Hadoop
Autor: Steve Jones, Justin Murray, George Trujillo
Producent: VMWare Press
ISBN: 9780133811025
Rok produkcji: 2015
Ilość stron: 480
Oprawa: Miękka
Waga: 0.76 kg


Recenzje: Virtualizing Hadoop - Steve Jones, Justin Murray, George Trujillo

Zaloguj się
Przypomnij hasło
×
×