Big Data Analytics with Microsoft HDInsight in 24 Hours, Sams Teach Yourself
Manpreet Singh, Arshad Ali
Big Data Analytics with Microsoft HDInsight in 24 Hours, Sams Teach Yourself
Manpreet Singh, Arshad Ali
- Producent: Sams Publishing
- Rok produkcji: 2015
- ISBN: 9780672337277
- Ilość stron: 592
- Oprawa: Miękka
Niedostępna
Opis: Big Data Analytics with Microsoft HDInsight in 24 Hours, Sams Teach Yourself - Manpreet Singh, Arshad Ali
Sams Teach Yourself Big Data Analytics with Microsoft HDInsight in 24 Hours In just 24 lessons of one hour or less, Sams Teach Yourself Big Data Analytics with Microsoft HDInsight in 24 Hours helps you leverage Hadoop's power on a flexible, scalable cloud platform using Microsoft's newest business intelligence, visualization, and productivity tools. This book's straightforward, step-by-step approach shows you how to provision, configure, monitor, and troubleshoot HDInsight and use Hadoop cloud services to solve real analytics problems. You'll gain more of Hadoop's benefits, with less complexity-even if you're completely new to Big Data analytics. Every lesson builds on what you've already learned, giving you a rock-solid foundation for real-world success. Practical, hands-on examples show you how to apply what you learn Quizzes and exercises help you test your knowledge and stretch your skills Notes and tips point out shortcuts and solutions Learn how to...* Master core Big Data and NoSQL concepts, value propositions, and use cases * Work with key Hadoop features, such as HDFS2 and YARN * Quickly install, configure, and monitor Hadoop (HDInsight) clusters in the cloud * Automate provisioning, customize clusters, install additional Hadoop projects, and administer clusters * Integrate, analyze, and report with Microsoft BI and Power BI * Automate workflows for data transformation, integration, and other tasks * Use Apache HBase on HDInsight * Use Sqoop or SSIS to move data to or from HDInsight * Perform R-based statistical computing on HDInsight datasets * Accelerate analytics with Apache Spark * Run real-time analytics on high-velocity data streams * Write MapReduce, Hive, and Pig programs Register your book at informit.com/register for convenient access to downloads, updates, and corrections as they become available.Introduction Part I: Understanding Big Data, Hadoop 1.0, and 2.0 Hour 1: Introduction of Big Data, NoSQL, and Business Value Proposition Types of Analysis Types of Data Big Data Managing Big Data NoSQL Systems Big Data, NoSQL Systems, and the Business Value Proposition Application of Big Data and Big Data Solutions Summary Q&A Hour 2: Introduction to Hadoop, Its Architecture, Ecosystem, and Microsoft Offerings What Is Apache Hadoop? Architecture of Hadoop and Hadoop Ecosystems What's New in Hadoop 2.0 Architecture of Hadoop 2.0 Tools and Technologies Needed with Big Data Analytics Major Players and Vendors for Hadoop Deployment Options for Microsoft Big Data Solutions Summary Q&A Hour 3: Hadoop Distributed File System Versions 1.0 and 2.0 Introduction to HDFS HDFS Architecture Rack Awareness WebHDFS Accessing and Managing HDFS Data What's New in HDFS 2.0 Summary Q&A Hour 4: The MapReduce Job Framework and Job Execution Pipeline Introduction to MapReduce MapReduce Architecture MapReduce Job Execution Flow Summary Q&A Hour 5: MapReduce-Advanced Concepts and YARN DistributedCache Hadoop Streaming MapReduce Joins Bloom Filter Performance Improvement Handling Failures Counter YARN Uber-Tasking Optimization Failures in YARN Resource Manager High Availability and Automatic Failover in YARN Summary Q&A Part II: Getting Started with HDInsight and Understanding Its Different Components Hour 6: Getting Started with HDInsight, Provisioning Your HDInsight Service Cluster, and Automating HDInsight Cluster Provisioning Introduction to Microsoft Azure Understanding HDInsight Service Provisioning HDInsight on the Azure Management Portal Automating HDInsight Provisioning with PowerShell Managing and Monitoring HDInsight Cluster and Job Execution Summary Q&A Exercise Hour 7: Exploring Typical Components of HDFS Cluster HDFS Cluster Components HDInsight Cluster Architecture High Availability in HDInsight Summary Q&A Hour 8: Storing Data in Microsoft Azure Storage Blob Understanding Storage in Microsoft Azure Benefits of Azure Storage Blob over HDFS Azure Storage Explorer Tools Summary Q&A Hour 9: Working with Microsoft Azure HDInsight Emulator Getting Started with HDInsight Emulator Setting Up Microsoft Azure Emulator for Storage Summary Q&A Part III: Programming MapReduce and HDInsight Script Action Hour 10: Programming MapReduce Jobs MapReduce Hello World! Analyzing Flight Delays with MapReduce Serialization Frameworks for Hadoop Hadoop Streaming Summary Q&A Hour 11: Customizing the HDInsight Cluster with Script Action Identifying the Need for Cluster Customization Developing Script Action Consuming Script Action Running a Giraph job on a Customized HDInsight Cluster Testing Script Action with HDInsight Emulator Summary Q&A Part IV: Querying and Processing Big Data in HDInsight Hour 12: Getting Started with Apache Hive and Apache Tez in HDInsight Introduction to Apache Hive Getting Started with Apache Hive in HDInsight Azure HDInsight Tools for Visual Studio Programmatically Using the HDInsight .NET SDK Introduction to Apache Tez Summary Q&A Exercise Hour 13: Programming with Apache Hive, Apache Tez in HDInsight, and Apache HCatalog Programming with Hive in HDInsight Using Tables in Hive Serialization and Deserialization Data Load Processes for Hive Tables Querying Data from Hive Tables Indexing in Hive Apache Tez in Action Apache HCatalog Summary Q&A Exercise Hour 14: Consuming HDInsight Data from Microsoft BI Tools over Hive ODBC Driver: Part 1 Introduction to Hive ODBC Driver Introduction to Microsoft Power BI Accessing Hive Data from Microsoft Excel Summary Q&A Hour 15: Consuming HDInsight Data from Microsoft BI Tools over Hive ODBC Driver: Part 2 Accessing Hive Data from PowerPivot Accessing Hive Data from SQL Server Accessing HDInsight Data from Power Query Summary Q&A Exercise Hour 16: Integrating HDInsight with SQL Server Integration Services The Need for Data Movement Introduction to SSIS Analyzing On-time Flight Departure with SSIS Provisioning HDInsight Cluster Summary Q&A Hour 17: Using Pig for Data Processing Introduction to Pig Latin Using Pig to Count Cancelled Flights Using HCatalog in a Pig Latin Script Submitting Pig Jobs with PowerShell Summary Q&A Hour 18: Using Sqoop for Data Movement Between RDBMS and HDInsight What Is Sqoop? Using Sqoop Import and Export Commands Using Sqoop with PowerShell Summary Q&A Part V: Managing Workflow and Performing Statistical Computing Hour 19: Using Oozie Workflows and Job Orchestration with HDInsight Introduction to Oozie Determining On-time Flight Departure Percentage with Oozie Submitting an Oozie Workflow with HDInsight .NET SDK Coordinating Workflows with Oozie Oozie Compared to SSIS Summary Q&A Hour 20: Performing Statistical Computing with R Introduction to R Integrating R with Hadoop Enabling R on HDInsight Summary Q&A Part VI: Performing Interactive Analytics and Machine Learning Hour 21: Performing Big Data Analytics with Spark Introduction to Spark Spark Programming Model Blending SQL Querying with Functional Programs Summary Q&A Hour 22: Microsoft Azure Machine Learning History of Traditional Machine Learning Introduction to Azure ML Azure ML Workspace Processes to Build Azure ML Solutions Getting Started with Azure ML Creating Predictive Models with Azure ML Publishing Azure ML Models as Web Services Summary Q&A Exercise Part VII: Performing Real-time Analytics Hour 23: Performing Stream Analytics with Storm Introduction to Storm Using SCP.NET to Develop Storm Solutions Analyzing Speed Limit Violation Incidents with Storm Summary Q&A Hour 24: Introduction to Apache HBase on HDInsight Introduction to Apache HBase HBase Architecture Creating HDInsight Cluster with HBase Summary Q&A 9780672337277 TOC 10/26/2015
Szczegóły: Big Data Analytics with Microsoft HDInsight in 24 Hours, Sams Teach Yourself - Manpreet Singh, Arshad Ali
Tytuł: Big Data Analytics with Microsoft HDInsight in 24 Hours, Sams Teach Yourself
Autor: Manpreet Singh, Arshad Ali
Producent: Sams Publishing
ISBN: 9780672337277
Rok produkcji: 2015
Ilość stron: 592
Oprawa: Miękka
Waga: 0.92 kg