top of page

Smartphone Specs

Date

This project is about analyzing the specifications on all the different smartphones on the market to pick the best one to match what a customer might be looking for.

September 12th, 2023

The Problem

There are a lot of smartphones available for people to buy out there. The amount of different features, brands and prices can feel overwhelming for people looking to buy a new phone. The purpose of this project is to highlight the phones that most people will want to buy based on the features and ratings of these different phones. The following questions are a good representation of the questions consumers have when getting a new phone

  • Do more expensive phones tend to have higher ratings?

  • Which brands have the cheapest 5G phone?

  • Which phones have the best cameras for the best price?

The Data

The data I will be using for this project was found on Kaggle. This dataset contains many different features on tons of different models of smartphones. It has 22 columns and 980 rows. There are a lot of columns in this dataset, including but not limited to a phone's model, brand, price, whether it has 5G capabilities or not as well as camera resolutions and screen resolutions. It also has information on the averge user ratings of a phone, processor speed, brand, and the number of cores it has as well as how long it takes to charge, the amount of RAM, and what the storage capacity is.

Preprocessing the Data

When preprocessing the data, I had a lot of work to do. The first step I took was taking the csv file and putting into a pandas data frame so I could easily manipulate the data. The next step was to remove a lot of columns. These columns were removed because they were simply not relevant to the questions being asked. I removed 16 columns so that left me with just 7; the brand name, model, price, the average user rating, whether or not the phone has 5G, and the front and rear facing camera quality. Next, I multiplied the price column by 0.01 because all the prices were in whole numbers; they were without decimal points essentially. Lastly, I checked to see if we had any null values. We had 101 empty cells in the average rating column. The only question that uses this column is the first one so I only dropped these rows when visualizing the data for it. This way I can use the data that is not null in the other columns for the other questions

Data Visualization

On the left is a scatter plot displaying the average rating compared with the price of each of the smartphones in the dataset. We want to know if the price of the phone has an impact on the quality of the phone i.e. the average rating. We can see from the plot that there does seem to be a correlation between the price and the rating, but it's not very pronounced at all. I can say that you can't find a single phone that is below $1,000 and has an average rating lower than 7.5 out of 10. This shows that the higher in price you go, you will typically find a higher rated phone but it's also not a perfectly linear relationship.

The bar graph on the right here shows us the cheapest 5G phones' brand from top to bottom. The brand with the cheapest 5G phone is Blackview with the second cheapest being Vivo. The black lines on the bars are error bars. They show us other prices of phones under that same brand.

The bar graph on the left shows the smartphone with the best cameras available on the market. All 5 of these have a 200 megapixel camera on the back of it as well as a 60 megapixel on the front. After that is established we just compare the different models and their price points, cheapest at the top. With this diagram we can see that Motorola has the best cameras at the best price with the Moto X30 Pro as well as the Edge 30 Ultra. 

Storytelling

So, the questions that I originally asked were answered for the most part. With the price and quality relationship of the smartphones, that data was entirely based on 1 to 10 ratings from an unknown source that were all averaged together for each smartphone. The problem there is that we don't have any idea how reliable those ratings are and if the ratings even came from people who actually used the phone they rated. 

Another problem I noticed was that in the second data vis, there are a lot of unfamiliar brands on that list. This is mostly because they aren't available in the US which kind of defeats the purpose of this question. In fact, a lot of the phones included in this data are not recognizable to most Americans which isn't very helpful. Most people stick to the brand they already know as that usually creates less problems when it comes to transferring data between the old phone and the new one. 

Impact

The impact of these data visualizations could be harmful if someone may start looking at one of the brands that has a cheap 5G phone available but then it turns out that specific phone isn't available here in the US. If there was a column in this dataset that outlined the countries it's either available or not available in, that could help me improve these data visualizations greatly.

bottom of page