This analysis of global video game sales data is done as a part of various discussions on public data sets in Kaggle.com . The data set can be downloaded from kaggle public data repository. This data set provides information about Global, North America, Europe , Japan and Other country sales revenue in USD for different video game publishers. Data also contains details about the Platform, Year of sales and Genre of the video games sold. Click here to view Shiny application ----------> VG SALES Download Github source code --------------> VG Sales Github Analysis Analysis can be divided into two sections: 1) Analysis about the data in general 2) Analysis of data about each publisher. General Data Analysis
The R code is : sales_publisher<-as.data.frame(table(vgsales$Publisher)) colnames(sales_publisher)<-c("publisher","numbers") sales_publisher<-sales_publisher[order(-sales_publisher$numbers),] top_20_sales_publisher<-head(sales_publisher,n=20) ggplot(top_20_sales_publisher,aes(x=reorder(publisher,numbers),y=numbers))+geom_bar(stat="identity",fill="orange")+theme_minimal()+coord_flip()+geom_text(aes(label=numbers),vjust=0.5,color="black",size=4.0)+ylab("Total Number of Sales")+xlab("Publisher")+ggtitle("Top Selling Publishers") 2. Video Game Releases per year Total number of video game sales by year is identified by creating a pivot table of video game sales per year and creating bar plots for the video game sales per year R Code: sales_year<-as.data.frame(table(vgsales$Year)) colnames(sales_year)<-c("Year","Numbers") sales_year<-sales_year[-nrow(sales_year),] ggplot(sales_year,aes(x=Year,y=Numbers))+geom_bar(stat="identity",fill="lightgreen")+theme(axis.text=element_text(size=8))+geom_text(aes(label=Numbers),vjust=0.5,color="black",size=4.0)+ylab("Total Number of Sales")+xlab("Year")+ggtitle("Video Game Sales by Year") 3. Video Game Revenue per year Total revenue of video game sales in a year is calculated by aggregating global video game sales by year. R Code: sales_year_revenue<as.data.frame(aggregate(vgsales$Global_Sales,by=list(Year=vgsales$Year),FUN=sum)) colnames(sales_year_revenue)<-c("Year","Sales") sales_year_revenue<-sales_year_revenue[-nrow(sales_year_revenue),] ggplot(sales_year_revenue,aes(x=Year,y=Sales))+geom_bar(stat="identity",fill="magenta")+theme(axis.text=element_text(size=8))+geom_text(aes(label=Sales),vjust=0.5,color="black",size=4.0)+ylab("Total Sales Revenue")+xlab("Year")+ggtitle("Video Game Sales revenue by Year") 4. Top Selling Platforms Top selling platforms are identified by creating a pivot table of gaming platforms and sorting them in descending order to find top 20 . R Code: sales_platform<-as.data.frame(table(vgsales$Platform)) colnames(sales_platform)<-c("platform","Numbers") sales_platform<-sales_platform[order(-sales_platform$Numbers),] top_20_sales_platform<-head(sales_platform,n=20) ggplot(top_20_sales_platform,aes(x=reorder(platform,Numbers),y=Numbers))+geom_bar(stat="identity",fill="steelblue")+theme_minimal()+coord_flip()+geom_text(aes(label=Numbers),vjust=0.5,color="black",size=4.0)+ylab("Total Number of Sales")+xlab("Platform")+ggtitle("Top Selling Video Game Platforms") Analysis by Publisher The data is filtered based on publishers using Shiny dashboard and subset based the publisher name selected from the drop down menu.
R Code: ggplot(head(vgsales_publisher,n=20),aes(x=reorder(Name,Global_Sales),y=Global_Sales))+geom_bar(stat="identity",fill="steelblue")+theme_minimal()+coord_flip()+geom_text(aes(label=Global_Sales),vjust=0.5,color="black",size=4.0)+ylab("Global Sales in Millions of Dollars")+xlab("Video Game")+ggtitle("Top Global Selling Games") 2. Top Selling Platforms Sales by platform is identified by aggregating platform based on sales revenue and creating pie-charts to undertstand the distribution. Then repeated for different countries. R Code: sales_platform_global=as.data.frame(aggregate(vgsales_publisher$Global_Sales,by=list(Platform=vgsales_publisher$Platform),FUN=sum)) colnames(sales_platform_global)<-c("platform","total_Sales") Pie1<-gvisPieChart(sales_platform_global,labelvar = "Platform",options = list(title="Global Sales by Platform",width=1000,height=500)) 3. Top Selling Genre Sales by platform is identified by aggregating genre based on sales revenue and creating pie-charts to undertstand the distribution. Then repeated for different countries. R Code: sales_genre_global=as.data.frame(aggregate(vgsales_publisher$Global_Sales,by=list(Genre=vgsales_publisher$Genre),FUN=sum)) colnames(sales_genre_global)<-c("genre","total_Sales") Pie1<-gvisPieChart(sales_genre_global,labelvar = "Genre",options = list(title="Global Sales by Genre",width=1000,height=500)) 4. Sales By Year Sales by Year is calculated by aggregating sales with respect toevery year. This is then evaluated using a Line Chart and repeated for different countries. R Code: sales_year_global=as.data.frame(aggregate(vgsales_publisher$Global_Sales,by=list(Year=vgsales_publisher$Year),FUN=sum)) colnames(sales_year_global)<-c("Year","total_sales") sales_year_global<-sales_year_global[-nrow(sales_year_global),] line1<-gvisLineChart(sales_year_global ,options = list(title="Global Sales by Year",width=1000,height=500)) ForecastingForecasting of time series data related to video game sales per year of different publishers are based on following two forecasting models:
1) ARIMA Model 2)ETS Model The code and step by step procedure followed for building the model as in the blogs You Canalytics , Analytics Vidhya and Dataiku.
2 Comments
|
Categories
All
|