www.pdfstall.online: Big Data: Principles and Best Practices of Scalable Real-Time Data Systems

Wednesday, April 24, 2019

Big Data: Principles and Best Practices of Scalable Real-Time Data Systems

File Size: 7.61 Mb

Description

When I first entered the world of Big Data, it felt like the Wild West of software development. Many were abandoning the relational database and its familiar comforts for NoSQL databases with highly restricted data models designed to scale to thousands of machines. The number of NoSQL databases, many of them with only minor differences between them, became overwhelming. A new project called Hadoop began to make waves, promising the ability to do deep analyses on huge amounts of data. Making sense of how to use these new tools was bewildering.

At the time, I was trying to handle the scaling problems we were faced with at the company at which I worked. The architecture was intimidatingly complex—a web of sharded relational databases, queues, workers, masters, and slaves. Corruption had worked its way into the databases, and special code existed in the application to handle the corruption. Slaves were always behind. I decided to explore alternative Big Data technologies to see if there was a better design for our data architecture.

Content:-

1. A new paradigm for Big Data

PART 1: BATCH LAYER

2. Data model for Big Data

3. Data model for Big Data: Illustration

4. Data storage on the batch layer

5. Data storage on the batch layer: Illustration

6. Batch layer

7. Batch layer: Illustration

8. An example batch layer: Architecture and algorithms

9. An example batch layer: Implementation

PART 2: SERVING LAYER