Java大数据分析pdf下载pdf下载

Java大数据分析百度网盘pdf下载

作者:
简介:本篇主要提供Java大数据分析pdf下载
出版社:文轩网教育考试专营店
出版时间:2019-03
pdf下载价格:0.00¥

免费下载


书籍下载


内容介绍

作  者:(美)拉贾特·梅塔(Rajat Mehta) 著
定  价:98
出 版 社:东南大学出版社
出版日期:2019年03月01日
页  数:392
装  帧:平装
ISBN:9787564182878
Preface
Chapter 1:Big Data Analytics with Java
Why data analytics on big data?
Big data for analytics
Big data - a bigger pay package for Java developers
Basics of Hadoop - a Java sub-project
Distributed computing on Hadoop
HDFS concepts
Design and architecture of HDFS
Main components of HDFS
HDFS simple commands
Apache Spark
Concepts
Transformations
Actions
Spark Java API
Spark samples using Java 8
Loading data
Data operations - cleansing and nging
Analyzing data - count, projection, grouping, aggregation, and max/min
Actions on RDDs
Paired RDDs
Saving data
Collecting and printing results
Executing Spark programs on Hadoop
Apache Spark sub-projects
Spark machine learning modules
Mahout - a popular Java ML library
Deeplearning4j - a deep learning library
Summary
Chapter 2: First Steps in Data Analysis
Datasets
Data cleaning and nging
Basic analysis of data with Spark SQL
Building SparkConf and context
Dataframe and datasets
Load and parse data
Analyzing data - the Spark-SQL way
Spark SQL for data exploration and analytics
Market basket analysis - Apriori algorithm
Implementation of the Apriori algorithm in Apache Spark
Efficient market basket analysis using FP-Growth algorithm
Running FP-Growth on Apache Spark
Summary
Chapter 3: Data Visualization
Data visualization with Java JFreeChart
Using charts in big data analytics
Time Series chart
All India seasonal and annual average temperature series dataset
Simple single Time Series chart
ltiple Time Series on a single chart window
Bar charts
Histograms
When would you use a histogram?
How to make histograms using JFreeChart?
Line charts
Scatter plots
Box plots
Advanced visualization technique
Prefuse
IVTK Graph toolkit
Other libraries
Summary
Chapter 4: Basics of Machine Learning
What is machine learning?
Real-life examples of machine learning
Type of machine learning
A small sample case study of supervised and unsupervised learning
Steps for machine learning problems
Choosing the machine learning model
What are the feature types that can be extracted from the datasets?
How do you select the best features to train your models?
How do you run machine learning analytics on big data?
Getting and preparing data in Hadoop
Training and storing models on big data
Apache Spark machine learning API
Summary
Chapter 5: Regression on Big Data
Linear regression
What is simple linear regression?
Where is linear regression used?
Logistic regression
Which mathematical functions does logistic regression use?
Where is logistic regression used?
Predicting heart disease using logistic regression
Summary
Chapter 6: Naive Bayes and Sentiment Analysis
Conditional probability
Bayes theorem
Naive Bayes algorithm
Advantages of Naive Bayes
Disadvantages of Naive Bayes
Sentimental analysis
Concepts for sentimental analysis
Tokenization
Stop words removal
Stemming
N-grams
Term presence and Term Frequency
TF-IDF
Bag of words
Dataset
Data exploration of text data
Sentimental analysis on this dataset
SVM or Support Vector Machine
Summary
Chapter 7: Decision Trees
What is a decision tree?
Building a decision tree
Choosing the best features for splitting the datasets
Dataset
Data exploration
Cleaning and nging the data
Training and testing the model
Summary
Chapter 8: Ensembling on Big Data
Ensembling
Types of ensembling
Bagging
Boosting
Advantages and disadvantages of ensembling
Random forests
Gradient boosted trees (GBTs)
Classification problem and dataset used
Data exploration
Training and testing our random forest model
Training and testing our gradient boosted tree model
Summary
Chapter 9: Recommendation Systems
Recommendation systems and their types
Content-based recommendation systems
Dataset
Content-based recommender on MovieLens dataset
Collaborative recommendation systems
Advantages
Disadvantages
Alternating least square - collaborative filtering
Summary
Chapter 10: Clustering and Customer Segmentation on Big Data
Clustering
Types of clustering
Hierarchical clustering
K-means clustering
Bisecting k-means clustering
Customer segmentation
Dataset
Data exploration
Clustering for customer segmentation
Changing the clustering algorithm
Summary
Chapter 11: Massive Graphs on Big Data
Refresher on graphs
Representing graphs
Common terminology on graphs
Common algorithms on graphs
Plotting graphs
Massive graphs on big data
Graph analytics
GraphFrames
Building a graph using GraphFrames
Graph analytics on airports and their flights
Datasets
Graph analytics on flights data
Summary
Chapter 12: Real-Time Analytics on Big Data
Real-time analytics
Big data stack for real-time analytics
Real-time SQL queries on big data
Real-time data ingestion and storage
Real-time data processing
Real-time SQL queries using Impala
Flight delay analysis using Impala
Apache Kafka
Spark Streaming
Trending videos
Summary
Chapter 13: Deep Learning Using Big Data
Introduction to neural networks
Perceptron
Problems with perceptrons
Sigmoid neuron
lti-layer perceptrons
Accuracy of lti-layer perceptrons
Deep learning
Advantages and use cases of deep learning
Flower species classification using lti-Layer perceptrons
Deeplearning4j
Hand written digit recognizition using CNN
Diving into the code:
Summary
Index
本书一开始先通过使用Java对大数据进行基本的统计分析,然后再讨论如分类、回归、聚类、集成等其他数据分析主题。它还涵盖了如推荐引擎、大规模图形分析、实时分析、深度学习等不错主题。书中涵盖了各种案例研究,例如tweet数据集的情绪分析、针对MovieLens数据集的推荐、电子商务数据集的客户细分、真实航班数据集的图表分析。这本书是使用Java实现大数据分析的端到端指南。Java如今已经是主流大数据环境(包括Hadoop)的事实语言。本书将教你如何使用产品友好的Java对大数据进行分析。