Views: 23740
Prabhudev Konana

This video covers how you can can use rpart library in R to build decision trees for classification. The video provides a brief overview of decision tree and the shows a demo of using rpart to create decision tree models, visualise it and predict using the decision tree model

Views: 75606
Melvin L

Provides steps for carrying out linear discriminant analysis in r and it's use for developing a classification model. Includes,
- Data partitioning
- Scatter Plot & Correlations
- Linear Discriminant Analysis
- Stacked Histograms of Discriminant Function Values
- Bi-Plot interpretation
- Partition plots
- Confusion Matrix & Accuracy - training & testing data
- Advantages and disadvantages
linear discriminant analysis is an important statistical tool related to analyzing big data or working in data science field.
R is a free software environment for statistical computing and graphics, and is widely used by both academia and industry. R software works on both Windows and Mac-OS. It was ranked no. 1 in a KDnuggets poll on top languages for analytics, data mining, and data science. RStudio is a user friendly environment for R that has become popular.

Views: 13079
Bharatendra Rai

Learn the basics of Machine Learning with R. Start our Machine Learning Course for free: https://www.datacamp.com/courses/introduction-to-machine-learning-with-R
First up is Classification. A *classification problem* involves predicting whether a given observation belongs to one of two or more categories. The simplest case of classification is called binary classification. It has to decide between two categories, or classes. Remember how I compared machine learning to the estimation of a function? Well, based on earlier observations of how the input maps to the output, classification tries to estimate a classifier that can generate an output for an arbitrary input, the observations. We say that the classifier labels an unseen example with a class.
The possible applications of classification are very broad. For example, after a set of clinical examinations that relate vital signals to a disease, you could predict whether a new patient with an unseen set of vital signals suffers that disease and needs further treatment. Another totally different example is classifying a set of animal images into cats, dogs and horses, given that you have trained your model on a bunch of images for which you know what animal they depict. Can you think of a possible classification problem yourself?
What's important here is that first off, the output is qualitative, and second, that the classes to which new observations can belong, are known beforehand. In the first example I mentioned, the classes are "sick" and "not sick". In the second examples, the classes are "cat", "dog" and "horse". In chapter 3 we will do a deeper analysis of classification and you'll get to work with some fancy classifiers!
Moving on ... A **Regression problem** is a kind of Machine Learning problem that tries to predict a continuous or quantitative value for an input, based on previous information. The input variables, are called the predictors and the output the response.
In some sense, regression is pretty similar to classification. You're also trying to estimate a function that maps input to output based on earlier observations, but this time you're trying to estimate an actual value, not just the class of an observation.
Do you remember the example from last video, there we had a dataset on a group of people's height and weight. A valid question could be: is there a linear relationship between these two? That is, will a change in height correlate linearly with a change in weight, if so can you describe it and if we know the weight, can you predict the height of a new person given their weight ? These questions can be answered with linear regression!
Together, \beta_0 and \beta_1 are known as the model coefficients or parameters. As soon as you know the coefficients beta 0 and beta 1 the function is able to convert any new input to output. This means that solving your machine learning problem is actually finding good values for beta 0 and beta 1. These are estimated based on previous input to output observations. I will not go into details on how to compute these coefficients, the function `lm()` does this for you in R.
Now, I hear you asking: what can regression be useful for apart from some silly weight and height problems? Well, there are many different applications of regression, going from modeling credit scores based on past payements, finding the trend in your youtube subscriptions over time, or even estimating your chances of landing a job at your favorite company based on your college grades.
All these problems have two things in common. First off, the response, or the thing you're trying to predict, is always quantitative. Second, you will always need input knowledge of previous input-output observations, in order to build your model. The fourth chapter of this course will be devoted to a more comprehensive overview of regression.
Soooo.. Classification: check. Regression: check. Last but not least, there is clustering. In clustering, you're trying to group objects that are similar, while making sure the clusters themselves are dissimilar.
You can think of it as classification, but without saying to which classes the observations have to belong or how many classes there are.
Take the animal photo's for example. In the case of classification, you had information about the actual animals that were depicted. In the case of clustering, you don't know what animals are depicted, you would simply get a set of pictures. The clustering algorithm then simply groups similar photos in clusters.
You could say that clustering is different in the sense that you don't need any knowledge about the labels. Moreover, there is no right or wrong in clustering. Different clusterings can reveal different and useful information about your objects. This makes it quite different from both classification and regression, where there always is a notion of prior expectation or knowledge of the result.

Views: 38894
DataCamp

Provides steps for applying Naive Bayes Classification with R.
Data: https://goo.gl/nCFX1x
R file: https://goo.gl/Feo5mT
Machine Learning videos: https://goo.gl/WHHqWP
Naive Bayes Classification is an important tool related to analyzing big data or working in data science field.
R is a free software environment for statistical computing and graphics, and is widely used by both academia and industry. R software works on both Windows and Mac-OS. It was ranked no. 1 in a KDnuggets poll on top languages for analytics, data mining, and data science. RStudio is a user friendly environment for R that has become popular.

Views: 19520
Bharatendra Rai

Provides steps for applying Image classification & recognition with easy to follow example.
R file: https://goo.gl/fCYm19
Data: https://goo.gl/To15db
Machine Learning videos: https://goo.gl/WHHqWP
Uses TensorFlow (by Google) as backend. Includes,
- load keras and EBImage packages
- read images
- explore images and image data
- resize and reshape images
- one hot encoding
- sequential model
- compile model
- fit model
- evaluate model
- prediction
- confusion matrix
Image Classification & Recognition with Keras is an important tool related to analyzing big data or working in data science field.
R is a free software environment for statistical computing and graphics, and is widely used by both academia and industry. R software works on both Windows and Mac-OS. It was ranked no. 1 in a KDnuggets poll on top languages for analytics, data mining, and data science. RStudio is a user friendly environment for R that has become popular.

Views: 18381
Bharatendra Rai

This playlist/video has been uploaded for Marketing purposes and contains only selective videos.
For the entire video course and code, visit [http://bit.ly/2xQrLB8].
This video shows how to do discriminant analysis in R.
• Discuss iris data, correlations, and scatter plot
• Show how to do data partition
• Show how to do linear discriminant analysis
For the latest Big Data and Business Intelligence video tutorials, please visit
http://bit.ly/1HCjJik
Find us on Facebook -- http://www.facebook.com/Packtvideo
Follow us on Twitter - http://www.twitter.com/packtvideo

Views: 2777
Packt Video

Includes an example with,
- brief definition of what is svm?
- svm classification model
- svm classification plot
- interpretation
- tuning or hyperparameter optimization
- best model selection
- confusion matrix
- misclassification rate
Machine Learning videos: https://goo.gl/WHHqWP
svm is an important machine learning tool related to analyzing big data or working in data science field.
R is a free software environment for statistical computing and graphics, and is widely used by both academia and industry. R software works on both Windows and Mac-OS. It was ranked no. 1 in a KDnuggets poll on top languages for analytics, data mining, and data science. RStudio is a user friendly environment for R that has become popular.

Views: 37711
Bharatendra Rai

This video tutorial shows you how to use the lad function in R to perform a Linear Discriminant Analysis. It also shows how to do predictive performance and cross validation of the Linear Discriminant Analysis. This is an intermediate video. You should feel comfortable reading data in, subsetting data, regression or anova in R.

Views: 49395
Ed Boone

In this video you will learn how to perform linear discriminant analysis in R. As opposed to Logistic Regression analysis, Linear discriminant analysis (LDA) performs well when there is multi class classification problem at hand. It assumes linear relationship between target and explanatory variables. For quadratic relationships you can used quadratic Discriminant analysis.
It can well be used along with other classification algorithms like support vector machine, random forest, decision tree etc.
ANalytics Study Pack : http://analyticuniversity.com/
Contact us for training/study packs [email protected]
Analytics University on Twitter : https://twitter.com/AnalyticsUniver
Analytics University on Facebook : https://www.facebook.com/AnalyticsUniversity
Logistic Regression in R: https://goo.gl/S7DkRy
Logistic Regression in SAS: https://goo.gl/S7DkRy
Logistic Regression Theory: https://goo.gl/PbGv1h
Time Series Theory : https://goo.gl/54vaDk
Time ARIMA Model in R : https://goo.gl/UcPNWx
Survival Model : https://goo.gl/nz5kgu
Data Science Career : https://goo.gl/Ca9z6r
Machine Learning : https://goo.gl/giqqmx
Data Science Case Study : https://goo.gl/KzY5Iu
Big Data & Hadoop & Spark: https://goo.gl/ZTmHOA

Views: 5011
Analytics University

Provides steps for carrying out principal component analysis in r and use of principal components for developing a predictive model.
Link to code file: https://goo.gl/SfdXYz
Includes,
- Data partitioning
- Scatter Plot & Correlations
- Principal Component Analysis
- Orthogonality of PCs
- Bi-Plot interpretation
- Prediction with Principal Components
- Multinomial Logistic regression with First Two PCs
- Confusion Matrix & Misclassification Error - training & testing data
- Advantages and disadvantages
principal component analysis is an important statistical tool related to analyzing big data or working in data science field.
R is a free software environment for statistical computing and graphics, and is widely used by both academia and industry. R software works on both Windows and Mac-OS. It was ranked no. 1 in a KDnuggets poll on top languages for analytics, data mining, and data science. RStudio is a user friendly environment for R that has become popular.

Views: 30552
Bharatendra Rai

Provides illustration of doing cluster analysis with R.
R File: https://goo.gl/BTZ9j7
Machine Learning videos: https://goo.gl/WHHqWP
Includes,
- Illustrates the process using utilities data
- data normalization
- hierarchical clustering using dendrogram
- use of complete and average linkage
- calculation of euclidean distance
- silhouette plot
- scree plot
- nonhierarchical k-means clustering
Cluster analysis is an important tool related to analyzing big data or working in data science field.
Deep Learning: https://goo.gl/5VtSuC
Image Analysis & Classification: https://goo.gl/Md3fMi
R is a free software environment for statistical computing and graphics, and is widely used by both academia and industry. R software works on both Windows and Mac-OS. It was ranked no. 1 in a KDnuggets poll on top languages for analytics, data mining, and data science. RStudio is a user friendly environment for R that has become popular.

Views: 104245
Bharatendra Rai

September Houston R Users Group main talk http://www.meetup.com/houstonr/events/232830049/

Views: 4531
Houston R Users

Link for R file: https://goo.gl/BXEf7M
Provides image or picture analysis and processing with r, and includes,
- reading and writing picture file
- intensity histogram
- combining images
- merging images into one picture
- image manipulation (brightness, contrast, gamma correction, cropping, color change, flip, flop, rotate, & resize )
- low-pass and high pass filter
R is a free software environment for statistical computing and graphics, and is widely used by both academia and industry. R software works on both Windows and Mac-OS. It was ranked no. 1 in a KDnuggets poll on top languages for analytics, data mining, and data science. RStudio is a user friendly environment for R that has become popular.

Views: 16386
Bharatendra Rai

CART undertakes the following situation: 1. Classification 2. Regression. In classification the target variable is categorical and tree gives classification in which tree predicts the class in which the instances will fall.

Views: 1171
StepUp Analytics

Provides steps for carrying out time-series analysis with R and covers classification stage.
Previous video - time-series clustering: https://goo.gl/UwsTxQ
R code file: https://goo.gl/orX2YM
Time-Series videos: https://goo.gl/FLztxt
Machine Learning videos: https://goo.gl/WHHqWP
Becoming Data Scientist: https://goo.gl/JWyyQc
Introductory R Videos: https://goo.gl/NZ55SJ
Deep Learning with TensorFlow: https://goo.gl/5VtSuC
Image Analysis & Classification: https://goo.gl/Md3fMi
Text mining: https://goo.gl/7FJGmd
Data Visualization: https://goo.gl/Q7Q2A8
Playlist: https://goo.gl/iwbhnE
R is a free software environment for statistical computing and graphics, and is widely used by both academia and industry. R software works on both Windows and Mac-OS. It was ranked no. 1 in a KDnuggets poll on top languages for analytics, data mining, and data science. RStudio is a user friendly environment for R that has become popular.

Views: 351
Bharatendra Rai

Click here to download the example data set fitnessAppLog.csv:
https://drive.google.com/open?id=0Bz9Gf6y-6XtTczZ2WnhIWHJpRHc

Views: 10784
The Data Science Show

Provides steps for applying random forest to do classification and prediction.
R code file: https://goo.gl/AP3LeZ
Data: https://goo.gl/C9emgB
Machine Learning videos: https://goo.gl/WHHqWP
Includes,
- random forest model
- why and when it is used
- benefits & steps
- number of trees, ntree
- number of variables tried at each step, mtry
- data partitioning
- prediction and confusion matrix
- accuracy and sensitivity
- randomForest & caret packages
- bootstrap samples and out of bag (oob) error
- oob error rate
- tune random forest using mtry
- no. of nodes for the trees in the forest
- variable importance
- mean decrease accuracy & gini
- variables used
- partial dependence plot
- extract single tree from the forest
- multi-dimensional scaling plot of proximity matrix
- detailed example with cardiotocographic or ctg data
random forest is an important tool related to analyzing big data or working in data science field.
Deep Learning: https://goo.gl/5VtSuC
Image Analysis & Classification: https://goo.gl/Md3fMi
R is a free software environment for statistical computing and graphics, and is widely used by both academia and industry. R software works on both Windows and Mac-OS. It was ranked no. 1 in a KDnuggets poll on top languages for analytics, data mining, and data science. RStudio is a user friendly environment for R that has become popular.

Views: 58487
Bharatendra Rai

#Naive_Bayes #Bayesian_Algorithm #Machine_Learning, #Classification_Technique #R_Studio
This is an elementary level video in which we learn to use the Bayesian Algorithm for classification. Ideally Bayesian Algorithm is appropriate in case of two levels of classification, but we have tried to use it on IRIS dataset which has 3 levels of classification. We have also used it on Breast Cancer data file from #Kaggle. You can find the Breast Cancer dataset from the link provided below. Stay tuned for more advanced level videos on Bayesian Algorithm.
https://www.dropbox.com/s/2qkskdmv7nywv7p/Breast_Cancer.csv?dl=0

Views: 583
Rajesh Dorbala

The overview of this video series provides an introduction to text analytics as a whole and what is to be expected throughout the instruction. It also includes specific coverage of:
– Overview of the spam dataset used throughout the series
– Loading the data and initial data cleaning
– Some initial data analysis, feature engineering, and data visualization
About the Series
This data science tutorial introduces the viewer to the exciting world of text analytics with R programming. As exemplified by the popularity of blogging and social media, textual data if far from dead – it is increasing exponentially! Not surprisingly, knowledge of text analytics is a critical skill for data scientists if this wealth of information is to be harvested and incorporated into data products. This data science training provides introductory coverage of the following tools and techniques:
– Tokenization, stemming, and n-grams
– The bag-of-words and vector space models
– Feature engineering for textual data (e.g. cosine similarity between documents)
– Feature extraction using singular value decomposition (SVD)
– Training classification models using textual data
– Evaluating accuracy of the trained classification models
Kaggle Dataset:
https://www.kaggle.com/uciml/sms-spam-collection-dataset
The data and R code used in this series is available here:
https://code.datasciencedojo.com/datasciencedojo/tutorials/tree/master/Introduction%20to%20Text%20Analytics%20with%20R
--
At Data Science Dojo, we believe data science is for everyone. Our in-person data science training has been attended by more than 3600+ employees from over 742 companies globally, including many leaders in tech like Microsoft, Apple, and Facebook.
--
Learn more about Data Science Dojo here:
https://hubs.ly/H0f5JLp0
See what our past attendees are saying here:
https://hubs.ly/H0f5JZl0
--
Like Us: https://www.facebook.com/datasciencedojo
Follow Us: https://twitter.com/DataScienceDojo
Connect with Us: https://www.linkedin.com/company/datasciencedojo
Also find us on:
Google +: https://plus.google.com/+Datasciencedojo
Instagram: https://www.instagram.com/data_science_dojo
Vimeo: https://vimeo.com/datasciencedojo

Views: 68274
Data Science Dojo

( Data Science Training - https://www.edureka.co/data-science )
This Logistic Regression Tutorial shall give you a clear understanding as to how a Logistic Regression machine learning algorithm works in R. Towards the end, in our demo we will be predicting which patients have diabetes using Logistic Regression!
In this Logistic Regression Tutorial video you will understand:
1) The 5 Questions asked in Data Science
2) What is Regression?
3) Logistic Regression - What and Why?
4) How does Logistic Regression Work?
5) Demo in R: Diabetes Use Case
6) Logistic Regression: Use Cases
Subscribe to our channel to get video updates. Hit the subscribe button above.
Check our complete Data Science playlist here: https://goo.gl/60NJJS
#LogisticRegression #Datasciencetutorial #Datasciencecourse #datascience
How it Works?
1. There will be 30 hours of instructor-led interactive online classes, 40 hours of assignments and 20 hours of project
2. We have a 24x7 One-on-One LIVE Technical Support to help you with any problems you might face or any clarifications you may require during the course.
3. You will get Lifetime Access to the recordings in the LMS.
4. At the end of the training you will have to complete the project based on which we will provide you a Verifiable Certificate!
- - - - - - - - - - - - - -
About the Course
Edureka's Data Science course will cover the whole data life cycle ranging from Data Acquisition and Data Storage using R-Hadoop concepts, Applying modelling through R programming using Machine learning algorithms and illustrate impeccable Data Visualization by leveraging on 'R' capabilities.
- - - - - - - - - - - - - -
Why Learn Data Science?
Data Science training certifies you with ‘in demand’ Big Data Technologies to help you grab the top paying Data Science job title with Big Data skills and expertise in R programming, Machine Learning and Hadoop framework.
After the completion of the Data Science course, you should be able to:
1. Gain insight into the 'Roles' played by a Data Scientist
2. Analyse Big Data using R, Hadoop and Machine Learning
3. Understand the Data Analysis Life Cycle
4. Work with different data formats like XML, CSV and SAS, SPSS, etc.
5. Learn tools and techniques for data transformation
6. Understand Data Mining techniques and their implementation
7. Analyse data using machine learning algorithms in R
8. Work with Hadoop Mappers and Reducers to analyze data
9. Implement various Machine Learning Algorithms in Apache Mahout
10. Gain insight into data visualization and optimization techniques
11. Explore the parallel processing feature in R
- - - - - - - - - - - - - -
Who should go for this course?
The course is designed for all those who want to learn machine learning techniques with implementation in R language, and wish to apply these techniques on Big Data. The following professionals can go for this course:
1. Developers aspiring to be a 'Data Scientist'
2. Analytics Managers who are leading a team of analysts
3. SAS/SPSS Professionals looking to gain understanding in Big Data Analytics
4. Business Analysts who want to understand Machine Learning (ML) Techniques
5. Information Architects who want to gain expertise in Predictive Analytics
6. 'R' professionals who want to captivate and analyze Big Data
7. Hadoop Professionals who want to learn R and ML techniques
8. Analysts wanting to understand Data Science methodologies
For more information, Please write back to us at [email protected] or call us at IND: 9606058406 / US: 18338555775 (toll free).
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
Customer Reviews:
Gnana Sekhar Vangara, Technology Lead at WellsFargo.com, says, "Edureka Data science course provided me a very good mixture of theoretical and practical training. The training course helped me in all areas that I was previously unclear about, especially concepts like Machine learning and Mahout. The training was very informative and practical. LMS pre recorded sessions and assignmemts were very good as there is a lot of information in them that will help me in my job. The trainer was able to explain difficult to understand subjects in simple terms. Edureka is my teaching GURU now...Thanks EDUREKA and all the best. "

Views: 84926
edureka!

In this video you will learn about what is multinomial logistic regression and how to perform this in R. It is similar to Logistic Regression but with multiple values in the target variable.
ANalytics Study Pack : http://analyticuniversity.com/
contact: [email protected]
Analytics University on Twitter : https://twitter.com/AnalyticsUniver
Analytics University on Facebook : https://www.facebook.com/AnalyticsUniversity
Logistic Regression in R: https://goo.gl/S7DkRy
Logistic Regression in SAS: https://goo.gl/S7DkRy
Logistic Regression Theory: https://goo.gl/PbGv1h
Time Series Theory : https://goo.gl/54vaDk
Time ARIMA Model in R : https://goo.gl/UcPNWx
Survival Model : https://goo.gl/nz5kgu
Data Science Career : https://goo.gl/Ca9z6r
Machine Learning : https://goo.gl/giqqmx
Data Science Case Study : https://goo.gl/KzY5Iu
Big Data & Hadoop & Spark: https://goo.gl/ZTmHOA

Views: 10335
Analytics University

RandomForests are currently one of the top performing algorithms for data classification and regression. Although their interpretability may be difficult, RandomForests are widely popular because of their ability to classify large amounts of data with high accuracy.
In this video I show how to import a Landsat image into R and how to extract pixel data to train and fit a RandomForests model. I also explain how to conduct image classification and how to speed it up through parallel processing.
See this post in my blog for more info: http://amsantac.co/blog/en/2015/11/28/classification-r.html
This video shows how to implement this R-based RandomForests algorithms for image classification in QGIS: https://youtu.be/-6Hsase6xQw
Remember to subscribe to my channel on Youtube for more videos!

Views: 22750
Alí Santacruz

Also called Classification and Regression Trees (CART) or just trees.
R file: https://goo.gl/Kx4EsU
Data file: https://goo.gl/gAQTx4
Includes,
- Illustrates the process using cardiotocographic data
- Decision tree and interpretation with party package
- Decision tree and interpretation with rpart package
- Plot with rpart.plot
- Prediction for validation dataset based on model build using training dataset
- Calculation of misclassification error
Decision trees are an important tool for developing classification or predictive analytics models related to analyzing big data or data science.
R is a free software environment for statistical computing and graphics, and is widely used by both academia and industry. R software works on both Windows and Mac-OS. It was ranked no. 1 in a KDnuggets poll on top languages for analytics, data mining, and data science. RStudio is a user friendly environment for R that has become popular.

Views: 53591
Bharatendra Rai

Provides concepts and steps for applying knn algorithm for classification and regression problems.
R code: https://goo.gl/FqpxWK
Data file: https://goo.gl/D2Asm7
More ML videos: https://goo.gl/WHHqWP
R is a free software environment for statistical computing and graphics, and is widely used by both academia and industry. R software works on both Windows and Mac-OS. It was ranked no. 1 in a KDnuggets poll on top languages for analytics, data mining, and data science. RStudio is a user friendly environment for R that has become popular.

Views: 5233
Bharatendra Rai

Regression Trees are part of the CART family of techniques for prediction of a numerical target feature. Here we use the package rpart, with its CART algorithms, in R to learn a regression tree model on the msleep' data set available in the ggplot2 package.

Views: 39494
Jalayer Academy

In this video I've talked about how you can implement kNN or k Nearest Neighbor algorithm in R with the help of an example data set freely available on UCL machine learning repository.

Views: 39382
Data Science Tutorials

This video shows you how to fit classification decision trees using R

Views: 107832
Abbass Al Sharif

Provides sentiment analysis and steps for making word clouds with r using tweets about apple obtained from Twitter.
Link to R and csv files:
https://goo.gl/B5g7G3
https://goo.gl/W9jKcc
https://goo.gl/khBpF2
Topics include:
- reading data obtained from Twitter in a csv format
- cleaning tweets for further analysis
- creating term document matrix
- making wordcloud, lettercloud, and barplots
- sentiment analysis of apple tweets before and after quarterly earnings report
R is a free software environment for statistical computing and graphics, and is widely used by both academia and industry. R software works on both Windows and Mac-OS. It was ranked no. 1 in a KDnuggets poll on top languages for analytics, data mining, and data science. RStudio is a user friendly environment for R that has become popular.

Views: 16780
Bharatendra Rai

This playlist/video has been uploaded for Marketing purposes and contains only selective videos.
For the entire video course and code, visit [http://bit.ly/2xQrLB8].
This video shows how to do time series decomposition in R.
• Discuss an example of time series data
• Show how to do log transformation of data
• Show how to do decomposition of additive time series
For the latest Big Data and Business Intelligence video tutorials, please visit
http://bit.ly/1HCjJik
Find us on Facebook -- http://www.facebook.com/Packtvideo
Follow us on Twitter - http://www.twitter.com/packtvideo

Views: 4495
Packt Video

Logistic Regression is one of the most widely used classification ML technique. This vlog introduces you to the concept and also helps you build your first model, score and judge it in R.

Views: 1366
Keshav Singh

Find the terms here:
http://ptrckprry.com/course/ssd/data/positive-words.txt
http://ptrckprry.com/course/ssd/data/negative-words.txt

Views: 10834
Jalayer Academy

Analytics Accelerator Program, February 2016-April 2016 batch

Views: 24848
Equiskill Insights LLP

Provides steps for carrying out time-series analysis with R and covers clustering stage.
Previous video - time-series forecasting: https://goo.gl/wmQG36
Next video - time-series classification: https://goo.gl/w3b55p
Time-Series videos: https://goo.gl/FLztxt
Machine Learning videos: https://goo.gl/WHHqWP
Becoming Data Scientist: https://goo.gl/JWyyQc
Introductory R Videos: https://goo.gl/NZ55SJ
Deep Learning with TensorFlow: https://goo.gl/5VtSuC
Image Analysis & Classification: https://goo.gl/Md3fMi
Text mining: https://goo.gl/7FJGmd
Data Visualization: https://goo.gl/Q7Q2A8
Playlist: https://goo.gl/iwbhnE
R is a free software environment for statistical computing and graphics, and is widely used by both academia and industry. R software works on both Windows and Mac-OS. It was ranked no. 1 in a KDnuggets poll on top languages for analytics, data mining, and data science. RStudio is a user friendly environment for R that has become popular.

Views: 456
Bharatendra Rai

Code on Github: https://github.com/msterkel/text-analysis
Twitter API tutorial: https://analytics4all.org/2016/11/16/r-connect-to-twitter-with-r/

Views: 1350
Matthew Sterkel

This Support Vector Machine in R tutorial video will help you understand what is Machine Learning, what is classification, what is Support Vector Machine (SVM), what is SVM kernel and you will also see a use case in which we will classify horses and mules from a given data set using SVM algorithm. SVM is a method of classification in which you plot raw data as points in an n-dimensional space (where n is the number of features you have). The value of each feature is then tied to a particular coordinate, making it easy to classify the data. Lines called classifiers can be used to split the data and plot them on a graph. SVM is a classification algorithm used to assign data to various classes. They involve detecting hyperplanes which segregate data into classes. SVMs are very versatile and are also capable of performing linear or nonlinear classification, regression, and outlier detection. Now, let us get started and understand Support Vector Machine in detail.
Below topics are explained in this "Support Vector Machine in R" video:
1. What is machine learning?
2. What is classification?
3. What is support vector machine?
4. Understanding support vector machine
5. Understanding SVM kernel
6. Use case: classifying horses and mules
To learn more about Data Science, subscribe to our YouTube channel: https://www.youtube.com/user/Simplilearn?sub_confirmation=1
You can also go through the Slides here: https://goo.gl/w72XBR
Watch more videos on Data Science: https://www.youtube.com/watch?v=0gf5iLTbiQM&list=PLEiEAq2VkUUIEQ7ENKU5Gv0HpRDtOphC6
#DataScienceWithR #DataScienceCourse #DataScience #DataScientist #BusinessAnalytics #MachineLearning
Become an expert in data analytics using the R programming language in this data science certification training course. You’ll master data exploration, data visualization, predictive analytics and descriptive analytics techniques with the R language. With this data science course, you’ll get hands-on practice on R CloudLab by implementing various real-life, industry-based projects in the domains of healthcare, retail, insurance, finance, airlines, music industry, and unemployment.
Why learn Data Science with R?
1. This course forms an ideal package for aspiring data analysts aspiring to build a successful career in analytics/data science. By the end of this training, participants will acquire a 360-degree overview of business analytics and R by mastering concepts like data exploration, data visualization, predictive analytics, etc
2. According to marketsandmarkets.com, the advanced analytics market will be worth $29.53 Billion by 2019
3. Wired.com points to a report by Glassdoor that the average salary of a data scientist is $118,709
4. Randstad reports that pay hikes in the analytics industry are 50% higher than IT
The Data Science Certification with R has been designed to give you in-depth knowledge of the various data analytics techniques that can be performed using R. The data science course is packed with real-life projects and case studies, and includes R CloudLab for practice.
1. Mastering R language: The data science course provides an in-depth understanding of the R language, R-studio, and R packages. You will learn the various types of apply functions including DPYR, gain an understanding of data structure in R, and perform data visualizations using the various graphics available in R.
2. Mastering advanced statistical concepts: The data science training course also includes various statistical concepts such as linear and logistic regression, cluster analysis and forecasting. You will also learn hypothesis testing.
3. As a part of the data science with R training course, you will be required to execute real-life projects using CloudLab. The compulsory projects are spread over four case studies in the domains of healthcare, retail, and the Internet. Four additional projects are also available for further practice.
The Data Science with R is recommended for:
1. IT professionals looking for a career switch into data science and analytics
2. Software developers looking for a career switch into data science and analytics
3. Professionals working in data and business analytics
4. Graduates looking to build a career in analytics and data science
5. Anyone with a genuine interest in the data science field
6. Experienced professionals who would like to harness data science in their fields
Learn more at: https://www.simplilearn.com/big-data-and-analytics/data-scientist-certification-sas-r-excel-training?utm_campaign=Support-Vector-Machine-in-R-QkAmOb1AMrY&utm_medium=Tutorials&utm_source=youtube
For more information about Simplilearn courses, visit:
- Facebook: https://www.facebook.com/Simplilearn
- Twitter: https://twitter.com/simplilearn
- LinkedIn: https://www.linkedin.com/company/simplilearn/
- Website: https://www.simplilearn.com
Get the Android app: http://bit.ly/1WlVo4u
Get the iOS app: http://apple.co/1HIO5J0

Views: 6748
Simplilearn

Learn more about credit risk modeling in R: https://www.datacamp.com/courses/introduction-to-credit-risk-modeling-in-r
Now, we have removed the observation containing a bivariate outlier for age and annual income from the data set. What we did not discuss before is that there are missing inputs (or NA's, which stand for not available) for two variables: employment length and interest rate. In this video we will demonstrate some methods for handling missing data on the employment length variable. You'll practice this newly gained knowledge yourself on the variable interest rate.
First, you want to know how many inputs are missing, as this will affect what you do with them. A simple way of finding out is with the function summary(). If you do this for employment length, you will see that there are 809 NA's.
There are generally three ways to treat missing inputs: delete them, replace them, or keep them. We will illustrate these methods on employment length. When deleting, you can either delete the observations where missing inputs are detected, or delete an entire variable. Typically, you would only want to delete observations if there is just a small number of missing inputs, and would only consider deleting an entire variable when many cases are missing.
Using this construction with which() and is.na(), the rows with missing inputs are deleted in the new data set loan_data_no_NA. To delete the entire variable employment length, you simply set the employment length variable in the loan data equal to NULL. Here, we save the result to a copy of the data set called loan_data_delete_employ. Making a copy of your original data before deleting things can be a good way to avoid losing information, but may be costly if working with very large data sets.
Second, when replacing a variable, common practice is to replace missing values with the median of the values that are actually observed. This is called median imputation.
Last, you can keep the missing values, since in some cases, the fact that a value is missing is important information. Unfortunately, keeping the NAs as such is not always possible, as some methods will automatically delete rows with NAs because they cannot deal with them. So how can we keep NAs? A popular solution is coarse classification.
Using this method, you basically put a continuous variable into so-called bins. Let's start off making a new variable emp_cat, which will be the variable replacing emp_length. The employment length in our data set ranges from 0 to 62 years. We will put employment length into bins of roughly 15 years, with groups 0 to 15, 15 to 30, 30 to 45, 45 plus, and a "missing” group, representing the NAs. Let's see how this changes our data.
Let's look at the plot of this new factor variable. It appears that the bin '0-15' contains a very high proportion of the cases, so it might seem more reasonable to look at bins of different ranges but with similar frequencies, as shown here. You can get these results by trial and error for different bin ranges, or by using quantile functions to know exactly where the breaks should be to get more balanced bins.
Before trying all of this in R yourself, let me finish the video with a couple of remarks. First, all the methods for missing data handling can also be applied to outliers. If you think an outlier is wrong, you can treat it as NA and use any of the methods we have discussed in this chapter.
Second, you may have noticed I only talked about missingness for continuous variables in this chapter. What about factor variables? Here's the basic approach. For categorical variables, deletion works in the exact same way as for continuous variables, deleting either observations or entire variables. When we wish to replace a missing factor variable, this is done by assigning it to the modal class, which is the class with the highest frequency. Keeping NAs for a categorical variable is done by including a missing category.
Now, let's try some of these methods yourself!

Views: 5197
DataCamp

1. Download cross validation using caret for machine learning classification and regression training example codes: https://drive.google.com/open?id=1uCUDvwJE0RYSmejg22aES6AmkXbLG--h
2. Download source data T2DRecords.csv link: https://drive.google.com/open?id=1MabU6pqYUacl2WbzwMuEfuQUw2_PAL2A
3. In caret package, if you meet "Error: package e1071 is required", simply execute the install.packages("e1071") to install the missing package.
4. Use as.factor and levels to transform numeric values into factors with different levels (Starting from 6:20 in the video).
Related videos:
1. Use R to build ROC curve and measure a model's accuracy: https://www.youtube.com/watch?v=TZwI0XgcphM
2. Data partition with oversampling in R: https://www.youtube.com/watch?v=UFaZvynajtI
3. Cross Validation for Data with Imbalanced Classes: https://youtu.be/b1IAyZM6WAA

Views: 345
The Data Science Show

This video is a sample from Skillsoft's video course catalog. In this video, Steve Scott walks you through how to create a basic classification tree with the trees package in R.
Steve Scott has been a software developer and IT Consultant for 16 years. Steve's career has been spent serving clients across the globe, responsible for building software architecture, hiring development teams, and solving complex problems through code. Now with a toolbox of languages, platforms, tools, and APIs, Steve rounds out his coding background with ongoing formal study in Mathematics and Computer Science at Mount Allison University.
Skillsoft is a pioneer in the field of learning with a long history of innovation. Skillsoft provides cloud-based learning solutions for our customers worldwide, who range from global enterprises, government and education customers to mid-sized and small businesses. Learn more at http://www.skillsoft.com.
https://www.linkedin.com/company/skillsoft
http://www.twitter.com/skillsoft
https://www.facebook.com/skillsoft

Views: 4056
Skillsoft YouTube

LDA is surprisingly simple and anyone can understand it. Here I avoid the complex linear algebra and use illustrations to show you what it does so you will know when to use it and how to interpret the results. Sample code for R is at the StatQuest website:
https://statquest.org/2016/07/10/statquest-linear-discriminant-analysis-lda-clearly-explained/
For a complete index of all the StatQuest videos, check out:
https://statquest.org/video-index/
If you'd like to support StatQuest, please consider a StatQuest t-shirt or sweatshirt...
https://teespring.com/stores/statquest
...or buying one or two of my songs (or go large and get a whole album!)
https://joshuastarmer.bandcamp.com/

Views: 114160
StatQuest with Josh Starmer

Text Mining with R. Import a single document into R.

Views: 19547
Jalayer Academy

Quick overview and examples /demos of Support Vector Machines (SVM) using R.
The getting started with SVM video covers the basics of SVM machine learning algorithm and then finally goes into a quick demo

Views: 58617
Melvin L

Provides steps for carrying handling class imbalance problem when developing classification and prediction models
Download R file: https://goo.gl/ns7zNm
data: https://goo.gl/d5JFtq
Includes,
- What is Class Imbalance Problem?
- Data partitioning
- Data for developing prediction model
- Developing prediction model
- Predictive model evaluation
- Confusion matrix,
- Accuracy, sensitivity, and specificity
- Oversampling, undersampling, synthetic sampling using random over sampling examples
predictive models are important machine learning and statistical tools related to analyzing big data or working in data science field.
R is a free software environment for statistical computing and graphics, and is widely used by both academia and industry. R software works on both Windows and Mac-OS. It was ranked no. 1 in a KDnuggets poll on top languages for analytics, data mining, and data science. RStudio is a user friendly environment for R that has become popular.

Views: 13923
Bharatendra Rai

Classification Trees are part of the CART family of technique for prediction. Here we use the package rpart, with its CART algorithms, in R to learn a classification tree model on the 'iris' data set available in all R installations. In this video I also compare our results from rpart to our results from C5.0 in the previous classification tree tutorial video called "

Views: 39857
Jalayer Academy

Learn how to do Logistic Regression R. Logistic Regression, like decision tree, SVM, random forest or probit model is another classification modelling technique. It is one form of Linear Regression that has binary dependent variable
For Training & Study packs on Analytics/Data Science/Big Data, Contact us at [email protected]
Find all free videos & study packs available with us here:
http://analyticuniversity.com/
ANalytics Study Pack :
Analytics University on Twitter : https://twitter.com/AnalyticsUniver
Analytics University on Facebook : https://www.facebook.com/AnalyticsUniversity
Logistic Regression in R: https://goo.gl/S7DkRy
Logistic Regression in SAS: https://goo.gl/S7DkRy
Logistic Regression Theory: https://goo.gl/PbGv1h
Time Series Theory : https://goo.gl/54vaDk
Time ARIMA Model in R : https://goo.gl/UcPNWx
Survival Model : https://goo.gl/nz5kgu
Data Science Career : https://goo.gl/Ca9z6r
Machine Learning : https://goo.gl/giqqmx
Data Science Case Study : https://goo.gl/KzY5Iu
Big Data & Hadoop & Spark: https://goo.gl/ZTmHOA

Views: 56938
Analytics University

Provides steps for applying artificial neural networks to do classification and prediction.
R file: https://goo.gl/VDgcXX
Data file: https://goo.gl/D2Asm7
Machine Learning videos: https://goo.gl/WHHqWP
Includes,
- neural network model
- input, hidden, and output layers
- min-max normalization
- prediction
- confusion matrix
- misclassification error
- network repetitions
- example with binary data
neural network is an important tool related to analyzing big data or working in data science field. Apple has reported using neural networks for face recognition in iPhone X.
R is a free software environment for statistical computing and graphics, and is widely used by both academia and industry. R software works on both Windows and Mac-OS. It was ranked no. 1 in a KDnuggets poll on top languages for analytics, data mining, and data science. RStudio is a user friendly environment for R that has become popular.

Views: 26097
Bharatendra Rai

Provides illustration of healthcare analytics using multinomial logistic regression and cardiotocographic data.
R file: https://goo.gl/ty2Jf2
Data: https://goo.gl/kMAh8U
Includes,
- steps for preparing data for the analysis
- use of nnet package in r
- calculation of probabilities using coefficients from the model
- estimating probabilities using the model
- developing confusion matrix
- calculation of misclassification error
Logistic regression is an important tool for developing classification or predictive analytics models related to analyzing big data or working in data science field.
R is a free software environment for statistical computing and graphics, and is widely used by both academia and industry. R software works on both Windows and Mac-OS. It was ranked no. 1 in a KDnuggets poll on top languages for analytics, data mining, and data science. RStudio is a user friendly environment for R that has become popular.

Views: 48300
Bharatendra Rai

Random Forest Overview and Demo in R (for classification). See previous videos
- What:
An ensemble learning method for classification and regression
Operate by constructing a multitude of decision trees
- Why use Random Forest:
Reasonable fast but very easy to use
Handles sparse data/missing data well
Overcomes problem with over fitting
- How:
Tree bagging - random sample with replacement
Random subset of the features.
Voting
- Demo
using randomForest library

Views: 31821
Melvin L

In this module we introduce the kNN k nearest neighbor model in R using the famous iris data set. We also introduce random number generation, splitting the data set into training data and test data, and Normalizing our numerical features (a form of rescaling necessary for certain learning algorithms).

Views: 91930
Jalayer Academy