0 Comments
Create a dataset- Sales of different region
Simple Normal Piechart
Simple Pie chart with Percentage lables
3D Pie Chart
Using ggplot2
Using ggplot2 -Percentage Annotation
Using googleVis
This analysis of global video game sales data is done as a part of various discussions on public data sets in Kaggle.com . The data set can be downloaded from kaggle public data repository. This data set provides information about Global, North America, Europe , Japan and Other country sales revenue in USD for different video game publishers. Data also contains details about the Platform, Year of sales and Genre of the video games sold. Click here to view Shiny application ----------> VG SALES Download Github source code --------------> VG Sales Github Analysis Analysis can be divided into two sections: 1) Analysis about the data in general 2) Analysis of data about each publisher. General Data Analysis
The R code is : sales_publisher<-as.data.frame(table(vgsales$Publisher)) colnames(sales_publisher)<-c("publisher","numbers") sales_publisher<-sales_publisher[order(-sales_publisher$numbers),] top_20_sales_publisher<-head(sales_publisher,n=20) ggplot(top_20_sales_publisher,aes(x=reorder(publisher,numbers),y=numbers))+geom_bar(stat="identity",fill="orange")+theme_minimal()+coord_flip()+geom_text(aes(label=numbers),vjust=0.5,color="black",size=4.0)+ylab("Total Number of Sales")+xlab("Publisher")+ggtitle("Top Selling Publishers") 2. Video Game Releases per year Total number of video game sales by year is identified by creating a pivot table of video game sales per year and creating bar plots for the video game sales per year R Code: sales_year<-as.data.frame(table(vgsales$Year)) colnames(sales_year)<-c("Year","Numbers") sales_year<-sales_year[-nrow(sales_year),] ggplot(sales_year,aes(x=Year,y=Numbers))+geom_bar(stat="identity",fill="lightgreen")+theme(axis.text=element_text(size=8))+geom_text(aes(label=Numbers),vjust=0.5,color="black",size=4.0)+ylab("Total Number of Sales")+xlab("Year")+ggtitle("Video Game Sales by Year") 3. Video Game Revenue per year Total revenue of video game sales in a year is calculated by aggregating global video game sales by year. R Code: sales_year_revenue<as.data.frame(aggregate(vgsales$Global_Sales,by=list(Year=vgsales$Year),FUN=sum)) colnames(sales_year_revenue)<-c("Year","Sales") sales_year_revenue<-sales_year_revenue[-nrow(sales_year_revenue),] ggplot(sales_year_revenue,aes(x=Year,y=Sales))+geom_bar(stat="identity",fill="magenta")+theme(axis.text=element_text(size=8))+geom_text(aes(label=Sales),vjust=0.5,color="black",size=4.0)+ylab("Total Sales Revenue")+xlab("Year")+ggtitle("Video Game Sales revenue by Year") 4. Top Selling Platforms Top selling platforms are identified by creating a pivot table of gaming platforms and sorting them in descending order to find top 20 . R Code: sales_platform<-as.data.frame(table(vgsales$Platform)) colnames(sales_platform)<-c("platform","Numbers") sales_platform<-sales_platform[order(-sales_platform$Numbers),] top_20_sales_platform<-head(sales_platform,n=20) ggplot(top_20_sales_platform,aes(x=reorder(platform,Numbers),y=Numbers))+geom_bar(stat="identity",fill="steelblue")+theme_minimal()+coord_flip()+geom_text(aes(label=Numbers),vjust=0.5,color="black",size=4.0)+ylab("Total Number of Sales")+xlab("Platform")+ggtitle("Top Selling Video Game Platforms") Analysis by Publisher The data is filtered based on publishers using Shiny dashboard and subset based the publisher name selected from the drop down menu.
R Code: ggplot(head(vgsales_publisher,n=20),aes(x=reorder(Name,Global_Sales),y=Global_Sales))+geom_bar(stat="identity",fill="steelblue")+theme_minimal()+coord_flip()+geom_text(aes(label=Global_Sales),vjust=0.5,color="black",size=4.0)+ylab("Global Sales in Millions of Dollars")+xlab("Video Game")+ggtitle("Top Global Selling Games") 2. Top Selling Platforms Sales by platform is identified by aggregating platform based on sales revenue and creating pie-charts to undertstand the distribution. Then repeated for different countries. R Code: sales_platform_global=as.data.frame(aggregate(vgsales_publisher$Global_Sales,by=list(Platform=vgsales_publisher$Platform),FUN=sum)) colnames(sales_platform_global)<-c("platform","total_Sales") Pie1<-gvisPieChart(sales_platform_global,labelvar = "Platform",options = list(title="Global Sales by Platform",width=1000,height=500)) 3. Top Selling Genre Sales by platform is identified by aggregating genre based on sales revenue and creating pie-charts to undertstand the distribution. Then repeated for different countries. R Code: sales_genre_global=as.data.frame(aggregate(vgsales_publisher$Global_Sales,by=list(Genre=vgsales_publisher$Genre),FUN=sum)) colnames(sales_genre_global)<-c("genre","total_Sales") Pie1<-gvisPieChart(sales_genre_global,labelvar = "Genre",options = list(title="Global Sales by Genre",width=1000,height=500)) 4. Sales By Year Sales by Year is calculated by aggregating sales with respect toevery year. This is then evaluated using a Line Chart and repeated for different countries. R Code: sales_year_global=as.data.frame(aggregate(vgsales_publisher$Global_Sales,by=list(Year=vgsales_publisher$Year),FUN=sum)) colnames(sales_year_global)<-c("Year","total_sales") sales_year_global<-sales_year_global[-nrow(sales_year_global),] line1<-gvisLineChart(sales_year_global ,options = list(title="Global Sales by Year",width=1000,height=500)) ForecastingForecasting of time series data related to video game sales per year of different publishers are based on following two forecasting models:
1) ARIMA Model 2)ETS Model The code and step by step procedure followed for building the model as in the blogs You Canalytics , Analytics Vidhya and Dataiku. So, finally we have come to the conclusion of world's biggest sporting event -Rio Olympics 2016. It started with a grand opening ceremony on August 4th 2016 and finally came to end with another amazing closing ceremony on 21st August 2016. Athletes from almost 206 countries have participated in this great event and was followed by viewers all around the globe. One unique feature of last few Olympics is that more opinions about athletes and countries are shared by people all over world through the medium of social media. So obviously everyone will be curious to know what most people are talking over social media.
For the purpose of this study, I have considered tweets about most popular athletes and general category of Rio Olympics 2016 .It was impossible to consider all medal winners and all country athletes. So I have tried to include only the most popular athletes. Tweets are fetched for a particular time period when the athletes were competing. for mist of athletes 5000-15000 tweets were extracted depending on their popularity. Finally sentiment analysis is performed on the tweets and visualized using Shiny and R. Click Here to view the sentiment analysis App - RIO OLYMPICS 2016 APP The categories considered for this analysis study are: 1) Rio Olympics 2016 2)Rio Olympics Opening ceremony 3)Rio Olympics Closing ceremony USA Athletes 1) Micheal Phelps - Swimming 2)Justin Gatlin - Track and Field 3) Ashton Eaton - Track and Field 4)Simone Biles -Gymnastics 5)Katie Ledecky - Swimming 6)Ryan Lochte - Swimming 7)Kayla Harrison - Judo 8) Allyson Felix - Track and Field 9) Brianna Rollins- Track and Field 10)Tori Bowie- Track and Field 11) Nia Ali - Track and Field 12) Haley Anderson- Swimming 13)Jake Dalton- Gymnastics 14) Jeff Henderson - Track and Field 15) Christian Taylor- Track and Field JAMAICA Athletes 1) Usain Bolt -Track and Field 2) Yohan Blake - Track and Field 3) Elaine Thompson - Track and Field 4) Omar McLeod- Track and Field UK Athletes 1) Max Whitlock - Gymnastics 2) Andy Murray -Tennis 3) Mo Farah - Track and Field 4) Adam Peaty - Swimming CANADA Athletes 1)Andre De Grasse - Track and Field 2)Erica Wiebe - Wrestling 3) Derek Drouin -Track and Field BRAZIL Athletes 1) Neymar -Football ITALY Athletes 1) Fabio Basile -Judo AFRICA Athletes 1) Wayne van Niekerk - Track and Field 2)Jemima Jelagat Sumgong - Track and Field BAHAMAS Athletes 1)Shaunae Miller -Track and field INDIA Athletes 1) PV Sindhu - Badmintion 2) Sakshi Malik - Wrestling 3) Dipa Karmakar -Gymnastics 4) Abhinav Bindra - Shooting Detailed Code and files can be obtained from my Github Repository
Cricket is a very popular sport in many countries and International Cricket Council(ICC ) will conduct World Cup Tournaments for two formats of the game, usually 20 Overs and 50 Overs. It is followed by millions of viewers across the globe and it normally creates a buzz in social media about various popular players.
For the purpose of this study, I have considered tweets about various players in different teams during the period of T20 worldcup from 2016-03-08 until 2016-04-03, the duration of the worldcup . There were 10 major teams participating and from each team , 1-3 popular players were considered and tweets of the players during the time frame is extracted and sentiment analysis and wordcloud visualization is performed on each player tweets using shiny. For most of the players a maximum of 5000 tweets extracted , and for few popular players ,a maximum of 10000 tweets were extracted. Click here to view the App - Cricket T20 Shiny APP The list of players considered from various teams are: INDIA
ENGLAND
WEST INDIES
AUSTRALIA
BANGLADESH
PAKISTAN
SOUTH AFRICA
NEWZELAND
SRILANKA
AFGHANISTAN
Extracting the tweets from twitter
A detailed tutorial about using twitteR package in R for extracting tweets can be found here -Extract tweets in R . Details of the tweets extraction is not provided in this blog
A sample code to fetch tweets of the player Virat Kohli is provided below. kohli_tweets<-searchTwitter('Virat Kohli',since = '2016-03-08',until = '2016-04-03',n=10000,lang = "en") kohli_tweets<-sapply(kohli_tweets,function(x) x$getText()) Detailed code can be obtained from my Github
The cleaning of tweets require the following steps:
Three separate functions are created for the entire cleaning of the tweets , the code can be obtained from Github Sentiment classification
Classification of sentiments can be done using the package 'sentiment' in R. First convert the tweets into a dataframe.
A sample code is given as below: library(RCurl) require(sentiment) ###Tweets Classification # classify emotion class_emo = classify_emotion(kohli_tweets$tweets, algorithm="bayes", prior=1.0) # get emotion best fit emotion = class_emo[,7] # classify polarity class_pol = classify_polarity(kohli_tweets$tweets, algorithm="bayes") # get polarity best fit polarity = class_pol[,4] Repeat this procedure for all the players and classify the emotions of tweets. Sentiment Score Classification
We can generate a sentiment score based on comparison of tweet words with positive and negative words lexicon and come up with a sentiment score.
#Scan positive words opinion.lexicon.pos<-scan("positive-words.txt",what = 'character',comment.char = ';') #Scan negative words opinion.lexicon.neg<-scan("negative-words.txt",what = 'character',comment.char = ';') pos.words = c(opinion.lexicon.pos,'upgrade') neg.words = c(opinion.lexicon.neg,'wait','waiting', 'wtf', 'cancellation') getSentimentScore = function(sentences, words.positive, words.negative, .progress='none') { require(plyr) require(stringr) scores = laply(sentences, function(sentence, words.positive, words.negative) { # Let first remove the Digit, Punctuation character and Control characters: sentence = gsub('[[:cntrl:]]', '', gsub('[[:punct:]]', '', gsub('\\d+', '', sentence))) # Then lets convert all to lower sentence case: sentence = tolower(sentence) # Now lets split each sentence by the space delimiter words = unlist(str_split(sentence, '\\s+')) # Get the boolean match of each words with the positive & negative opinion-lexicon pos.matches = !is.na(match(words, words.positive)) neg.matches = !is.na(match(words, words.negative)) # Now get the score as total positive sentiment minus the total negatives score = sum(pos.matches) - sum(neg.matches) return(score) }, words.positive, words.negative, .progress=.progress ) # Return a data frame with respective sentence and the score return(data.frame(score=scores)) } score<-getSentimentScore(kohli_tweets$tweets,pos.words,neg.words) kohli_tweets<-cbind(kohli_tweets, data.frame(emotion,polarity,score)) Shiny Codeglobal.R
ui.R
server.r
|
Categories
All
|