By Packt Publishing | in eBooks
Big Data analytics relates to the strategies used by organizations to collect, organize, and analyze large amounts of data to uncover valuable business insights that otherwise cannot be analyzed through traditional systems. This book will help you do that. With the help of this guide, you will be able to bridge the gap between the theoretical world of technology with the practical ground reality of building corporate Big Data and data science platforms. You will get hands-on exposure to Hadoop and Spark, build machine learning dashboards using R and R Shiny, create web-based apps using NoSQL databases such as MongoDB and even learn how to write R code for neural networks.
Important Details
Requirements
By Packt Publishing | in Online Courses
Today, organizations have a difficult time working with huge numbers of datasets. In addition, data processing and analysis need to be done in real-time to gain insights. This is where data streaming comes in. This course starts by explaining the blueprint architecture for developing a completely functional data streaming pipeline and installing the technologies used. With the help of live coding sessions, you will get hands-on with architecting every tier of the pipeline. You will also handle specific issues encountered working with streaming data. You will input a live data stream of Meetup RSVPs that will be analyzed and displayed via Google Maps.
Important Details
Requirements
By Packt Publishing | in Online Courses
The new volume in the Apache Kafka Series! Learn the Kafka Streams data-processing library, for Apache Kafka. Join hundreds of knowledge savvy students in learning one of the most promising data-processing libraries on Apache Kafka. This course is based on Java 8 and will include one example in Scala. Kafka Streams is Java-based and therefore is not suited for any other programming language. This course is the first and only available Kafka Streams course on the web. Get it now to become a Kafka expert!
Important Details
Requirements
By Packt Publishing | in Online Courses
Kafka Connect is a tool for scalable and reliable streaming data between Apache Kafka and other data systems. Apache Kafka Connect is a common framework for Apache Kafka producers and consumers. In this course, you are going to learn Kafka connector deployment, configuration, and management with hands-on exercises. You're also going to see the distributed and standalone modes to scale up to a large, centrally-managed service supporting an entire organization or scale down to development, testing, and small production deployments. The REST interface is used to submit and manage connectors to your Kafka Connect cluster via easy to use REST APIs.
Important Details
Requirements
By Packt Publishing | in eBooks
Scala has been observing wide adoption over the past few years, especially in the field of data science and analytics. Spark, built on Scala, has gained a lot of recognition and is being used widely in productions. The first part of this course introduces you to Scala, helping you understand the object-oriented and functional programming concepts needed for Spark application development. It then moves on to Spark to cover the basic abstractions using RDD and DataFrame. You will also learn how to develop Spark applications using SparkR and PySpark APIs, interactive data analytics using Zeppelin, and in-memory data processing with Alluxio.
Important Details
Requirements
By Packt Publishing | in Online Courses
Spark is the technology that allows us to perform big data processing in the MapReduce paradigm very rapidly, due to performing the processing in memory without the need for extensive I/O operations. This course promotes a practical approach to dealing with large amounts of online, unbounded data and drawing conclusions from it. You will implement streaming logic to handle huge amount of infinite streams of data.
Important Details
Requirements
By Packt Publishing | in eBooks
ETL is one of the essential techniques in the data processing. Given data is everywhere, ETL will always be the vital process to handle data from different sources. This course starts with the basic concepts of data warehousing and ETL process. You will learn how Azure Data Factory and SSIS can be used to understand the key components of an ETL solution. By the end of this book, you will not only learn how to build your own ETL solutions but also address the key challenges that are faced while building them.
Important Details
Requirements
By Packt Publishing | in eBooks
SAS has been recognized by Money Magazine and Payscale as one of the top business skills to learn in order to advance one’s career. Through innovative data management, analytics, and business intelligence software and services, SAS helps customers solve their business problems by allowing them to make better decisions faster. This book introduces the reader to the SAS and how they can use SAS to perform efficient analysis of any size data, including Big Data. By the end of this book, you will be able to clearly understand how you can efficiently analyze Big Data using SAS.
Important Details
Requirements
By Packt Publishing | in eBooks
Apache Hadoop is the most popular platform for big data processing and can be combined with a host of other big data tools to build powerful analytics solutions. Big Data Analytics with Hadoop 3 shows you how to do just that, by providing insights into the software as well as its benefits with the help of practical examples. Once you have taken a tour of Hadoop 3’s latest features, you will get an overview of HDFS, MapReduce, and YARN, and how they enable faster, more efficient big data processing. By the end of this book, you will be well-versed with the analytical capabilities of the Hadoop ecosystem. You will be able to build powerful solutions to perform big data analytics and get insight effortlessly.
Important Details
Requirements
By Packt Publishing | in Online Courses
In this course, you will start by learning about the Hadoop Distributed File System (HDFS) and the most common Hadoop commands required to work with HDFS. Next, you’ll be introduced to Sqoop Import, which will help you gain insights into the lifecycle of the Sqoop command and how to use the import command to migrate data from MySQL to HDFS, and from MySQL to Hive. In addition to this, you will get up to speed with Sqoop Export for migrating data effectively, along with using Apache Flume to ingest data. As you progress, you will delve into Apache Hive, external and managed tables, working with different files, and Parquet and Avro. Toward the concluding section, you will focus on Spark DataFrames and Spark SQL.
Important Details
Requirements