This video covers how you can can use rpart library in R to build decision trees for classification. The video provides a brief overview of decision tree and the shows a demo of using rpart to create decision tree models, visualise it and predict using the decision tree model

Provides steps for carrying out linear discriminant analysis in r and it's use for developing a classification model. Includes,
- Data partitioning
- Scatter Plot & Correlations
- Linear Discriminant Analysis
- Stacked Histograms of Discriminant Function Values
- Bi-Plot interpretation
- Partition plots
- Confusion Matrix & Accuracy - training & testing data
- Advantages and disadvantages
linear discriminant analysis is an important statistical tool related to analyzing big data or working in data science field.
R is a free software environment for statistical computing and graphics, and is widely used by both academia and industry.

Learn the basics of Machine Learning with R. Start our Machine Learning Course for free: https://www.datacamp.com/courses/introduction-to-machine-learning-with-R
First up is Classification. A *classification problem* involves predicting whether a given observation belongs to one of two or more categories. The simplest case of classification is called binary classification. It has to decide between two categories, or classes. Remember how I compared machine learning to the estimation of a function? Well, based on earlier observations of how the input maps to the output, classification tries to estimate a classifier that can generate an output for an arbitrary input, the observations. We say that the classifier labels an unseen example with a class.
The possible applications of classification are very broad. For example, after a set of clinical examinations that relate vital signals to a disease, you could predict whether a new patient with an unseen set of vital signals suffers that disease and needs further treatment. Another totally different example is classifying a set of animal images into cats, dogs and horses, given that you have trained your model on a bunch of images for which you know what animal they depict. Can you think of a possible classification problem yourself?
What's important here is that first off, the output is qualitative, and second, that the classes to which new observations can belong, are known beforehand. In the first example I mentioned, the classes are "sick" and "not sick". In the second examples, the classes are "cat", "dog" and "horse". In chapter 3 we will do a deeper analysis of classification and you'll get to work with some fancy classifiers!
Moving on ... A **Regression problem** is a kind of Machine Learning problem that tries to predict a continuous or quantitative value for an input, based on previous information. The input variables, are called the predictors and the output the response.
In some sense, regression is pretty similar to classification. You're also trying to estimate a function that maps input to output based on earlier observations, but this time you're trying to estimate an actual value, not just the class of an observation.
Do you remember the example from last video, there we had a dataset on a group of people's height and weight. A valid question could be: is there a linear relationship between these two? That is, will a change in height correlate linearly with a change in weight, if so can you describe it and if we know the weight, can you predict the height of a new person given their weight ? These questions can be answered with linear regression!
Together, \beta_0 and \beta_1 are known as the model coefficients or parameters. As soon as you know the coefficients beta 0 and beta 1 the function is able to convert any new input to output. This means that solving your machine learning problem is actually finding good values for beta 0 and beta 1. These are estimated based on previous input to output observations. I will not go into details on how to compute these coefficients, the function `lm()` does this for you in R.
Now, I hear you asking: what can regression be useful for apart from some silly weight and height problems? Well, there are many different applications of regression, going from modeling credit scores based on past payements, finding the trend in your youtube subscriptions over time, or even estimating your chances of landing a job at your favorite company based on your college grades.
All these problems have two things in common. First off, the response, or the thing you're trying to predict, is always quantitative. Second, you will always need input knowledge of previous input-output observations, in order to build your model. The fourth chapter of this course will be devoted to a more comprehensive overview of regression.
Soooo.. Classification: check. Regression: check. Last but not least, there is clustering. In clustering, you're trying to group objects that are similar, while making sure the clusters themselves are dissimilar.
You can think of it as classification, but without saying to which classes the observations have to belong or how many classes there are.
Take the animal photo's for example. In the case of classification, you had information about the actual animals that were depicted. In the case of clustering, you don't know what animals are depicted, you would simply get a set of pictures. The clustering algorithm then simply groups similar photos in clusters.
You could say that clustering is different in the sense that you don't need any knowledge about the labels. Moreover, there is no right or wrong in clustering. Different clusterings can reveal different and useful information about your objects. This makes it quite different from both classification and regression, where there always is a notion of prior expectation or knowledge of the result.

Provides steps for applying Image classification & recognition with easy to follow example.
Uses TensorFlow (by Google) as backend. Includes,
- load keras and EBImage packages
- read images
- explore images and image data
- resize and reshape images
- one hot encoding
- sequential model
- compile model
- fit model
- evaluate model
- prediction
- confusion matrix
Image Classification & Recognition with Keras is an important tool related to analyzing big data or working in data science field.
R is a free software environment for statistical computing and graphics, and is widely used by both academia and industry.

Provides steps for carrying out principal component analysis in r and use of principal components for developing a predictive model.
Includes,
- Data partitioning
- Scatter Plot & Correlations
- Principal Component Analysis
- Orthogonality of PCs
- Bi-Plot interpretation
- Prediction with Principal Components
- Multinomial Logistic regression with First Two PCs
- Confusion Matrix & Misclassification Error - training & testing data
- Advantages and disadvantages
principal component analysis is an important statistical tool related to analyzing big data or working in data science field.
R is a free software environment for statistical computing and graphics, and is widely used by both academia and industry.

Provides steps for applying Naive Bayes Classification with R.
Naive Bayes Classification is an important tool related to analyzing big data or working in data science field.
R is a free software environment for statistical computing and graphics, and is widely used by both academia and industry.

Provides illustration of doing cluster analysis with R.
Includes,
- Illustrates the process using utilities data
- data normalization
- hierarchical clustering using dendrogram
- use of complete and average linkage
- calculation of euclidean distance
- silhouette plot
- scree plot
- nonhierarchical k-means clustering
Cluster analysis is an important tool related to analyzing big data or working in data science field.
In this video you will learn how to perform linear discriminant analysis in R. As opposed to Logistic Regression analysis, Linear discriminant analysis (LDA) performs well when there is multi class classification problem at hand. It assumes linear relationship between target and explanatory variables. For quadratic relationships you can used quadratic Discriminant analysis.
It can well be used along with other classification algorithms like support vector machine, random forest, decision tree etc.
Provides concepts and steps for applying knn algorithm for classification and regression problems.
R is a free software environment for statistical computing and graphics, and is widely used by both academia and industry.

This video is a sample from Skillsoft's video course catalog. In this video, Steve Scott walks you through how to create a basic classification tree with the trees package in R.
Steve Scott has been a software developer and IT Consultant for 16 years. Steve's career has been spent serving clients across the globe, responsible for building software architecture, hiring development teams, and solving complex problems through code. Now with a toolbox of languages, platforms, tools, and APIs, Steve rounds out his coding background with ongoing formal study in Mathematics and Computer Science at Mount Allison University.
Skillsoft is a pioneer in the field of learning with a long history of innovation. Skillsoft provides cloud-based learning solutions for our customers worldwide, who range from global enterprises, government and education customers to mid-sized and small businesses. Learn more at http://www.skillsoft.com.
Provides steps for applying random forest to do classification and prediction.
Includes,
- random forest model
- why and when it is used
- benefits & steps
- number of trees, ntree
- number of variables tried at each step, mtry
- data partitioning
- prediction and confusion matrix
- accuracy and sensitivity
- randomForest & caret packages
- bootstrap samples and out of bag (oob) error
- oob error rate
- tune random forest using mtry
- no. of nodes for the trees in the forest
- variable importance
- mean decrease accuracy & gini
- variables used
- partial dependence plot
- extract single tree from the forest
- multi-dimensional scaling plot of proximity matrix
- detailed example with cardiotocographic or ctg data
random forest is an important tool related to analyzing big data or working in data science field.
Includes an example with,
- brief definition of what is svm?
- svm classification model
- svm classification plot
- interpretation
- tuning or hyperparameter optimization
- best model selection
- confusion matrix
- misclassification rate
svm is an important machine learning tool related to analyzing big data or working in data science field.
R is a free software environment for statistical computing and graphics, and is widely used by both academia and industry.

CART undertakes the following situation: 1. Classification 2. Regression. In classification the target variable is categorical and tree gives classification in which tree predicts the class in which the instances will fall.

Click here to download the example data set fitnessAppLog.csv:
https://drive.google.com/open?id=0Bz9Gf6y-6XtTczZ2WnhIWHJpRHc

This video shows you how to fit classification decision trees using R

This video tutorial shows you how to use the lad function in R to perform a Linear Discriminant Analysis. It also shows how to do predictive performance and cross validation of the Linear Discriminant Analysis. This is an intermediate video. You should feel comfortable reading data in, subsetting data, regression or anova in R.

Logistic Regression is one of the most widely used classification ML technique. This vlog introduces you to the concept and also helps you build your first model, score and judge it in R.

This tutorial will deep dive into data analysis using 'R' language. By the end of this tutorial you would have learnt to perform Sentiment Analysis of Twitter data using 'R' tool. To learn more about R, click here: http://goo.gl/uHfGbN
This tutorial covers the following topics:
• What is Sentiment Analysis?
• Sentiment Analysis use cases
• Sentiment Analysis tools
• Hands-On: Sentiment Analysis in R
The topics related to ‘R’ language are extensively covered in our ‘Mastering Data Analytics with R’ course.
Classification Trees are part of the CART family of technique for prediction. Here we use the package rpart, with its CART algorithms, in R to learn a classification tree model on the 'iris' data set available in all R installations. In this video I also compare our results from rpart to our results from C5.0 in the previous classification tree tutorial video called "

In this video you will learn about what is multinomial logistic regression and how to perform this in R. It is similar to Logistic Regression but with multiple values in the target variable.
In this Edureka YouTube live session, we will show you how to use the Time Series Analysis in R to predict the future!
Below are the topics we will cover in this live session:
1. Why Time Series Analysis?
2. What is Time Series Analysis?
3. When Not to use Time Series Analysis?
4. Components of Time Series Algorithm
5. Demo on Time Series
In this video: compare various classification models (LR, LDA, QDA, KNN).

Regression Trees are part of the CART family of techniques for prediction of a numerical target feature. Here we use the package rpart, with its CART algorithms, in R to learn a regression tree model on the msleep' data set available in the ggplot2 package.

RandomForests are currently one of the top performing algorithms for data classification and regression. Although their interpretability may be difficult, RandomForests are widely popular because of their ability to classify large amounts of data with high accuracy.
In this video I show how to import a Landsat image into R and how to extract pixel data to train and fit a RandomForests model. I also explain how to conduct image classification and how to speed it up through parallel processing.
See this post in my blog for more info: http://amsantac.co/blog/en/2015/11/28/classification-r.html
This video shows how to implement this R-based RandomForests algorithms for image classification in QGIS: https://youtu.be/-6Hsase6xQw
Remember to subscribe to my channel on Youtube for more videos!

Provides image or picture analysis and processing with r, and includes,
- reading and writing picture file
- intensity histogram
- combining images
- merging images into one picture
- image manipulation (brightness, contrast, gamma correction, cropping, color change, flip, flop, rotate, & resize )
- low-pass and high pass filter
R is a free software environment for statistical computing and graphics, and is widely used by both academia and industry.

** Data Science Certification using R: https://www.edureka.co/data-science **
This Edureka video on "KNN algorithm using R", will help you learn about the KNN algorithm in depth, you'll also see how KNN is used to solve real-world problems. Below are the topics covered in this module:
(00:52) Introduction to Machine Learning
(03:45) What is KNN Algorithm?
(08:09) KNN Use Case
(09:07) KNN Algorithm step by step
(12:12) Hands - On
(00:52) Introduction to Machine Learning
(03:45) What is KNN Algorithm?
(08:09) KNN Use Case
(09:07) KNN Algorithm step by step
(12:12) Hands - On
About the Course
Edureka's Data Science course will cover the whole data lifecycle ranging from Data Acquisition and Data Storage using R-Hadoop concepts, Applying modeling through R programming using Machine learning algorithms and illustrate impeccable Data Visualization by leveraging on 'R' capabilities.
- - - - - - - - - - - - - -
Why Learn Data Science?
Data Science training certifies you with ‘in demand’ Big Data Technologies to help you grab the top paying Data Science job title with Big Data skills and expertise in R programming, Machine Learning and Hadoop framework.
After the completion of the Data Science course, you should be able to:
1. Gain insight into the 'Roles' played by a Data Scientist
2. Analyze Big Data using R, Hadoop and Machine Learning
3. Understand the Data Analysis Life Cycle
4. Work with different data formats like XML, CSV and SAS, SPSS, etc.
5. Learn tools and techniques for data transformation
6. Understand Data Mining techniques and their implementation
7. Analyze data using machine learning algorithms in R
8. Work with Hadoop Mappers and Reducers to analyze data
9. Implement various Machine Learning Algorithms in Apache Mahout
10. Gain insight into data visualization and optimization techniques
11. Explore the parallel processing feature in R
- - - - - - - - - - - - - -
Who should go for this course?
The course is designed for all those who want to learn machine learning techniques with implementation in R language, and wish to apply these techniques on Big Data. The following professionals can go for this course:
1. Developers aspiring to be a 'Data Scientist'
2. Analytics Managers who are leading a team of analysts
3. SAS/SPSS Professionals looking to gain understanding in Big Data Analytics
4. Business Analysts who want to understand Machine Learning (ML) Techniques
5. Information Architects who want to gain expertise in Predictive Analytics
6. 'R' professionals who want to captivate and analyze Big Data
7. Hadoop Professionals who want to learn R and ML techniques
8. Analysts wanting to understand Data Science methodologies.
For online Data Science training, please write back to us at [email protected] or call us at IND: 9606058406 / US: 18338555775 (toll-free) for more information.

The overview of this video series provides an introduction to text analytics as a whole and what is to be expected throughout the instruction. It also includes specific coverage of:
– Overview of the spam dataset used throughout the series
– Loading the data and initial data cleaning
– Some initial data analysis, feature engineering, and data visualization
About the Series
This data science tutorial introduces the viewer to the exciting world of text analytics with R programming. As exemplified by the popularity of blogging and social media, textual data if far from dead – it is increasing exponentially! Not surprisingly, knowledge of text analytics is a critical skill for data scientists if this wealth of information is to be harvested and incorporated into data products. This data science training provides introductory coverage of the following tools and techniques:
– Tokenization, stemming, and n-grams
– The bag-of-words and vector space models
– Feature engineering for textual data (e.g. cosine similarity between documents)
– Feature extraction using singular value decomposition (SVD)
– Training classification models using textual data
– Evaluating accuracy of the trained classification models
Also called Classification and Regression Trees (CART) or just trees.
R file: https://goo.gl/Kx4EsU
Data file: https://goo.gl/gAQTx4
Includes,
- Illustrates the process using cardiotocographic data
- Decision tree and interpretation with party package
- Decision tree and interpretation with rpart package
- Plot with rpart.plot
- Prediction for validation dataset based on model build using training dataset
- Calculation of misclassification error
Decision trees are an important tool for developing classification or predictive analytics models related to analyzing big data or data science.
R is a free software environment for statistical computing and graphics, and is widely used by both academia and industry.

Random Forest with R : Classification with The South African Heart Disease Dataset

This video shows basic methods for developing and pruning classification and regression trees using the R programming language. The video takes viewers through a step by step approach to classification, demonstrating the approach with actual data. We cover the following topics:
Classification: Discuss the purpose of classification and advantages and disadvantages of different methods
Example Data: Review the dataset and classification objective for our example
Tree Package: Execute classification using the popular “tree” package in R
Pruning: Demonstrate how to reduce complexity in trees by “pruning” less-significant branches

Quick overview and examples /demos of Support Vector Machines (SVM) using R.
The getting started with SVM video covers the basics of SVM machine learning algorithm and then finally goes into a quick demo

Provides sentiment analysis and steps for making word clouds with r using tweets about apple obtained from Twitter.
Topics include:
- reading data obtained from Twitter in a csv format
- cleaning tweets for further analysis
- creating term document matrix
- making wordcloud, lettercloud, and barplots
- sentiment analysis of apple tweets before and after quarterly earnings report
R is a free software environment for statistical computing and graphics, and is widely used by both academia and industry.

IN this video you will learn how to perform the K Nearest neighbor classification R. You will also learn the theory of KNN.
KNN is a type of classification algo like Logistic regression, decisions tree, SVM & random forest. However, this is a non-parametric technique
In this video I've talked about how you can implement kNN or k Nearest Neighbor algorithm in R with the help of an example data set freely available on UCL machine learning repository.

Provides steps for applying artificial neural networks to do classification and prediction.
Includes,
- neural network model
- input, hidden, and output layers
- min-max normalization
- prediction
- confusion matrix
- misclassification error
- network repetitions
- example with binary data
neural network is an important tool related to analyzing big data or working in data science field. Apple has reported using neural networks for face recognition in iPhone X.
R is a free software environment for statistical computing and graphics, and is widely used by both academia and industry.

Text Mining with R. Import a single document into R.

September Houston R Users Group main talk http://www.meetup.com/houstonr/events/232830049/

Differentiating various species of flower 'Iris' using R

Analytics Accelerator Program, February 2016-April 2016 batch

This Time Series Analysis (Part-1) in R tutorial will help you understand what is time series, why time series, components of time series, when not to use time series, why does a time series have to be stationary, how to make a time series stationary and at the end, you will also see a use case where we will forecast car sales for 5th year using the given data.
Link to Time Series Analysis Part-2: https://www.youtube.com/watch?v=Y5T3ZEMZZKs
You can also go through the slides here: https://goo.gl/RsAEB8
A time series is a sequence of data being recorded at specific time intervals. The past values are analyzed to forecast a future which is time-dependent. Compared to other forecast algorithms, with time series we deal with a single variable which is dependent on time. So, lets deep dive into this video and understand what is time series and how to implement time series using R.
Below topics are explained in this " Time Series in R Tutorial " -
1. Why time series?
2. What is time series?
3. Components of a time series
4. When not to use time series?
5. Why does a time series have to be stationary?
6. How to make a time series stationary?
7. Example: Forcast car sales for the 5th year
Provides illustration of healthcare analytics using multinomial logistic regression and cardiotocographic data.
Includes,
- steps for preparing data for the analysis
- use of nnet package in r
- calculation of probabilities using coefficients from the model
- estimating probabilities using the model
- developing confusion matrix
- calculation of misclassification error
Logistic regression is an important tool for developing classification or predictive analytics models related to analyzing big data or working in data science field.
R is a free software environment for statistical computing and graphics, and is widely used by both academia and industry.

Provides steps for carrying handling class imbalance problem when developing classification and prediction models
Includes,
- What is Class Imbalance Problem?
- Data partitioning
- Data for developing prediction model
- Developing prediction model
- Predictive model evaluation
- Confusion matrix,
- Accuracy, sensitivity, and specificity
- Oversampling, undersampling, synthetic sampling using random over sampling examples
predictive models are important machine learning and statistical tools related to analyzing big data or working in data science field.
R is a free software environment for statistical computing and graphics, and is widely used by both academia and industry.

ROC Curve (Receiver Operating Characteristic Curve) and Random Oversampling Examples (ROSE Package) Analysis in R
As part of submitting to Data Science Dojo's Kaggle competition you need to create a model out of the titanic data set. We will show you how to do this using RStudio.
- Learn how to Analyse sentiments on anything being said on Twitter
- Get your own Twitter developer app key and pull tweets
- Understand what is sentiment analytics and text mining
- Create impressive word clouds
- Map sentiments on any topic and break them into bar graphs

