This data is released under the Creative Commons Attribution-Non Commercial-Share Alike 3.0 Unported license
The code requires two things: An internet connection and a Kaggle Account. Kaggle is a platform for running data science competitions. You can learn about Kaggle and related data science competitions at Kaggle.com.
Kaggle provides an online data-repository for any datasets that are posted to their site. To use the code, you’ll need to be logged into a Kaggle account.
To start:
Go to Kaggle and create a free account.
Select the “Competitions” tab, select “Create Competition”, and select “Machine Learning” as the type of competition. Kaggle users can also select “Data Science” as a competition type.
Select “Create a new competition” and select “Create”.
(If you already have a Kaggle account, you can use the same login to Kaggle with this tutorial.)
In the “Options” section, select “Submission Instructions”, select “No Data Sources”, and select “No External Scripts”.
Select “Run the competition”.
The code is now ready to be run.
If you want to submit the Kaggle notebook to Kaggle’s competitions platform, you can do so from the Competitions tab.
The code is available on Github.
How to run the notebook
Load data from Kaggle
The first step is to copy and paste this into a new code cell in a new notebook:
%matplotlib inline from IPython.display import Image from IPython.display import display from nltk.classify import NaiveBayesClassifier from nltk.corpus import stopwords import numpy as np import re import pandas as pd import matplotlib.pyplot as plt from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.naive_bayes import MultinomialNB, BernoulliNB from sklearn.metrics import accuracy_score from sklearn.preprocessing import StandardScaler from sklearn.cross_validation import train_test_split
Related links:
Comments