NoSQL for Mere Mortals
Dan Sullivan
NoSQL for Mere Mortals
Dan Sullivan
- Producent: Addison Wesley Publishing Company
- Rok produkcji: 2015
- ISBN: 9780134023212
- Ilość stron: 552
- Oprawa: Miękka
Niedostępna
Opis: NoSQL for Mere Mortals - Dan Sullivan
The Easy, Common-Sense Guide to Solving Real Problems with NoSQL The Mere Mortals(R) tutorials have earned worldwide praise as the clearest, simplest way to master essential database technologies. Now, there's one for today's exciting new NoSQL databases. NoSQL for Mere Mortals guides you through solving real problems with NoSQL and achieving unprecedented scalability, cost efficiency, flexibility, and availability. Drawing on 20+ years of cutting-edge database experience, Dan Sullivan explains the advantages, use cases, and terminology associated with all four main categories of NoSQL databases: key-value, document, column family, and graph databases. For each, he introduces pragmatic best practices for building high-value applications. Through step-by-step examples, you'll discover how to choose the right database for each task, and use it the right way. Coverage includes --Getting started: What NoSQL databases are, how they differ from relational databases, when to use them, and when not to Data management principles and design criteria: Essential knowledge for creating any database solution, NoSQL or relational --Key-value databases: Gaining more utility from data structures --Document databases: Schemaless databases, normalization and denormalization, mutable documents, indexing, and design patterns --Column family databases: Google's BigTable design, table design, indexing, partitioning, and Big Data Graph databases: Graph/network modeling, design tips, query methods, and traps to avoid Whether you're a database developer, data modeler, database user, or student, learning NoSQL can open up immense new opportunities. As thousands of database professionals already know, For Mere Mortals is the fastest, easiest route to mastery.Preface xxi Introduction xxv PART I: INTRODUCTION 1 Chapter 1 Different Databases for Different Requirements 3 Relational Database Design 4 E-commerce Application 5 Early Database Management Systems 6 Flat File Data Management Systems 7 Organization of Flat File Data Management Systems 7 Random Access of Data 9 Limitations of Flat File Data Management Systems 9 Hierarchical Data Model Systems 12 Organization of Hierarchical Data Management Systems 12 Limitations of Hierarchical Data Management Systems 14 Network Data Management Systems 14 Organization of Network Data Management Systems 15 Limitations of Network Data Management Systems 17 Summary of Early Database Management Systems 17 The Relational Database Revolution 19 Relational Database Management Systems 19 Organization of Relational Database Management Systems 20 Organization of Applications Using Relational Database Management Systems 26 Limitations of Relational Databases 27 Motivations for Not Just/No SQL (NoSQL) Databases 29 Scalability 29 Cost 31 Flexibility 31 Availability 32 Summary 34 Case Study 35 Review Questions 36 References 37 Bibliography 37 Chapter 2 Variety of NoSQL Databases 39 Data Management with Distributed Databases 41 Store Data Persistently 41 Maintain Data Consistency 42 Ensure Data Availability 44 Consistency of Database Transactions 47 Availability and Consistency in Distributed Databases 48 Balancing Response Times, Consistency, and Durability 49 Consistency, Availability, and Partitioning: The CAP Theorem 51 ACID and BASE 54 ACID: Atomicity, Consistency, Isolation, and Durability 54 BASE: Basically Available, Soft State, Eventually Consistent 56 Types of Eventual Consistency 57 Casual Consistency 57 Read-Your-Writes Consistency 57 Session Consistency 58 Monotonic Read Consistency 58 Monotonic Write Consistency 58 Four Types of NoSQL Databases 59 Key-Value Pair Databases 60 Keys 60 Values 64 Differences Between Key-Value and Relational Databases 65 Document Databases 66 Documents 66 Querying Documents 67 Differences Between Document and Relational Databases 68 Column Family Databases 69 Columns and Column Families 69 Differences Between Column Family and Relational Databases 70 Graph Databases 71 Nodes and Relationships 72 Differences Between Graph and Relational Databases 73 Summary 75 Review Questions 76 References 77 Bibliography 77 PART II: KEY-VALUE DATABASES 79 Chapter 3 Introduction to Key-Value Databases 81 From Arrays to Key-Value Databases 82 Arrays: Key Value Stores with Training Wheels 82 Associative Arrays: Taking Off the Training Wheels 84 Caches: Adding Gears to the Bike 85 In-Memory and On-Disk Key-Value Database: From Bikes to Motorized Vehicles 89 Essential Features of Key-Value Databases 91 Simplicity: Who Needs Complicated Data Models Anyway? 91 Speed: There Is No Such Thing as Too Fast 93 Scalability: Keeping Up with the Rush 95 Scaling with Master-Slave Replication 95 Scaling with Masterless Replication 98 Keys: More Than Meaningless Identifiers 103 How to Construct a Key 103 Using Keys to Locate Values 105 Hash Functions: From Keys to Locations 106 Keys Help Avoid Write Problems 107 Values: Storing Just About Any Data You Want 110 Values Do Not Require Strong Typing 110 Limitations on Searching for Values 112 Summary 114 Review Questions 115 References 116 Bibliography 116 Chapter 4 Key-Value Database Terminology 117 Key-Value Database Data Modeling Terms 118 Key 121 Value 123 Namespace 124 Partition 126 Partition Key 129 Schemaless 129 Key-Value Architecture Terms 131 Cluster 131 Ring 133 Replication 135 Key-Value Implementation Terms 137 Hash Function 137 Collision 138 Compression 139 Summary 141 Review Questions 141 References 142 Chapter 5 Designing for Key-Value Databases 143 Key Design and Partitioning 144 Keys Should Follow a Naming Convention 145 Well-Designed Keys Save Code 145 Dealing with Ranges of Values 147 Keys Must Take into Account Implementation Limitations 149 How Keys Are Used in Partitioning 150 Designing Structured Values 151 Structured Data Types Help Reduce Latency 152 Large Values Can Lead to Inefficient Read and Write Operations 155 Limitations of Key-Value Databases 159 Look Up Values by Key Only 160 Key-Value Databases Do Not Support Range Queries 161 No Standard Query Language Comparable to SQL for Relational Databases 161 Design Patterns for Key-Value Databases 162 Time to Live (TTL) Keys 163 Emulating Tables 165 Aggregates 166 Atomic Aggregates 169 Enumerable Keys 170 Indexes 171 Summary 173 Case Study: Key-Value Databases for Mobile Application Configuration 174 Review Questions 177 References 178 PART III: DOCUMENT DATABASES 179 Chapter 6 Introduction to Document Databases 181 What Is a Document? 182 Documents Are Not So Simple After All 182 Documents and Key-Value Pairs 187 Managing Multiple Documents in Collections 188 Getting Started with Collections 188 Tips on Designing Collections 191 Avoid Explicit Schema Definitions 199 Basic Operations on Document Databases 201 Inserting Documents into a Collection 202 Deleting Documents from a Collection 204 Updating Documents in a Collection 206 Retrieving Documents from a Collection 208 Summary 210 Review Questions 210 References 211 Chapter 7 Document Database Terminology 213 Document and Collection Terms 214 Document 215 Documents: Ordered Sets of Key-Value Pairs 215 Key and Value Data Types 216 Collection 217 Embedded Document 218 Schemaless 220 Schemaless Means More Flexibility 221 Schemaless Means More Responsibility 222 Polymorphic Schema 223 Types of Partitions 224 Vertical Partitioning 225 Horizontal Partitioning or Sharding 227 Separating Data with Shard Keys 229 Distributing Data with a Partitioning Algorithm 230 Data Modeling and Query Processing 232 Normalization 233 Denormalization 235 Query Processor 235 Summary 237 Review Questions 237 References 238 Chapter 8 Designing for Document Databases 239 Normalization, Denormalization, and the Search for Proper Balance 241 One-to-Many Relations 242 Many-to-Many Relations 243 The Need for Joins 243 Executing Joins: The Heavy Lifting of Relational Databases 245 Executing Joins Example 247 What Would a Document Database Modeler Do? 248 The Joy of Denormalization 249 Avoid Overusing Denormalization 251 Just Say No to Joins, Sometimes 253 Planning for Mutable Documents 255 Avoid Moving Oversized Documents 258 The Goldilocks Zone of Indexes 258 Read-Heavy Applications 259 Write-Heavy Applications 260 Modeling Common Relations 261 One-to-Many Relations in Document Databases 262 Many-to-Many Relations in Document Databases 263 Modeling Hierarchies in Document Databases 265 Parent or Child References 265 Listing All Ancestors 266 Summary 267 Case Study: Customer Manifests 269 Embed or Not Embed? 271 Choosing Indexes 271 Separate Collections by Type? 272 Review Questions 273 References 273 PART IV: COLUMN FAMILY DATABASES 275 Chapter 9 Introduction to Column Family Databases 277 In the Beginning, There Was Google BigTable 279 Utilizing Dynamic Control over Columns 280 Indexing by Row, Column Name, and Time Stamp 281 Controlling Location of Data 282 Reading and Writing Atomic Rows 283 Maintaining Rows in Sorted Order 284 Differences and Similarities to Key-Value and Document Databases 286 Column Family Database Features 286 Column Family Database Similarities to and Differences from Document Databases 287 Column Family Database Versus Relational Databases 289 Avoiding Multirow Transactions 290 Avoiding Subqueries 291 Architectures Used in Column Family Databases 293 HBase Architecture: Variety of Nodes 293 Cassandra Architecture: Peer-to-Peer 295 Getting the Word Around: Gossip Protocol 296 Thermodynamics and Distributed Database: Why We Need Anti-Entropy 299 Hold This for Me: Hinted Handoff 300 When to Use Column Family Databases 303 Summary 304 Review Questions 304 References 305 Chapter 10 Column Family Database Terminology 307 Basic Components of Column Family Databases 308 Keyspace 309 Row Key 309 Column 310 Column Families 312 Structures and Processes: Implementing Column Family Databases 313 Internal Structures and Configuration Parameters of Column Family Databases 313 Old Friends: Clusters and Partitions 314 Cluster 314 Partition 316 Taking a Look Under the Hood: More Column Family Database Components 317 Commit Log 317 Bloom Filter 319 Consistency Level 321 Processes and Protocols 322 Replication 322 Anti-Entropy 323 Gossip Protocol 324 Hinted Handoff 325 Summary 326 Review Questions 327 References 327 Chapter 11 Designing for Column Family Databases 329 Guidelines for Designing Tables 332 Denormalize Instead of Join 333 Make Use of Valueless Columns 334 Use Both Column Names and Column Values to Store Data 334 Model an Entity with a Single Row 335 Avoid Hotspotting in Row Keys 337 Keep an Appropriate Number of Column Value Versions 338 Avoid Complex Data Structures in Column Values 339 Guidelines for Indexing 340 When to Use Secondary Indexes Managed by the Column Family Database System 341 When to Create and Manage Secondary Indexes Using Tables 345 Tools for Working with Big Data 348 Extracting, Transforming, and Loading Big Data 350 Analyzing Big Data 351 Describing and Predicting with Statistics 351 Finding Patterns with Machine Learning 353 Tools for Analyzing Big Data 354 Tools for Monitoring Big Data 355 Summary 356 Case Study: Customer Data Analysis 357 Understanding User Needs 357 Review Questions 359 References 360 PART V: GRAPH DATABASES 361 Chapter 12 Introduction to Graph Databases 363 What Is a Graph? 363 Graphs and Network Modeling 365 Modeling Geographic Locations 365 Modeling Infectious Diseases 366 Modeling Abstract and Concrete Entities 369 Modeling Social Media 370 Advantages of Graph Databases 372 Query Faster by Avoiding Joins 372 Simplified Modeling 375 Multiple Relations Between Entities 375 Summary 376 Review Questions 376 References 377 Chapter 13 Graph Database Terminology 379 Elements of Graphs 380 Vertex 380 Edge 381 Path 383 Loop 384 Operations on Graphs 385 Union of Graphs 385 Intersection of Graphs 386 Graph Traversal 387 Properties of Graphs and Nodes 388 Isomorphism 388 Order and Size 389 Degree 390 Closeness 390 Betweenness 391 Types of Graphs 392 Undirected and Directed Graphs 392 Flow Network 393 Bipartite Graph 394 Multigraph 395 Weighted Graph 395 Summary 396 Review Questions 397 References 397 Chapter 14 Designing for Graph Databases 399 Getting Started with Graph Design 400 Designing a Social Network Graph Database 401 Queries Drive Design (Again) 405 Querying a Graph 408 Cypher: Declarative Querying 408 Gremlin: Query by Graph Traversal 410 Basic Graph Traversal 410 Traversing a Graph with Depth-First and Breadth-First Searches 412 Tips and Traps of Graph Database Design 415 Use Indexes to Improve Retrieval Time 415 Use Appropriate Types of Edges 416 Watch for Cycles When Traversing Graphs 417 Consider the Scalability of Your Graph Database 418 Summary 420 Case Study: Optimizing Transportation Routes 420 Understanding User Needs 420 Designing a Graph Analysis Solution 421 Review Questions 423 References 423 PART VI: CHOOSING A DATABASE FOR YOUR APPLICATION 425 Chapter 15 Guidelines for Selecting a Database 427 Choosing a NoSQL Database 428 Criteria for Selecting Key-Value Databases 429 Use Cases and Criteria for Selecting Document Databases 430 Use Cases and Criteria for Selecting Column Family Databases 431 Use Cases and Criteria for Selecting Graph Databases 433 Using NoSQL and Relational Databases Together 434 Summary 436 Review Questions 436 References 437 PART VII: APPENDICES 441 Appendix A Answers to Chapter Review Questions 443 Appendix B List of NoSQL Databases 477 Glossary 481 9780134023212 TOC 3/27/2015
Szczegóły: NoSQL for Mere Mortals - Dan Sullivan
Tytuł: NoSQL for Mere Mortals
Autor: Dan Sullivan
Producent: Addison Wesley Publishing Company
ISBN: 9780134023212
Rok produkcji: 2015
Ilość stron: 552
Oprawa: Miękka
Waga: 0.89 kg