4.1. Frequency Analysis
Once the master table was created, the actual analysis could be conducted. For this we utilized “WordCloud,” one of R program’s utilities. This is a visualization method that displays how frequently words appear in a given sample of text, and the way it works is quite simple. The more frequently a specific word appears in a database, the bigger and bolder it appears in the word cloud. The results of our cases will now be discussed.
4.1.1. Best Features
Figure 7 shows the best features of three cars. Words with the highest co-occurrence are represented in this word cloud, with the most frequent and important words located in the center and the least frequent words located on the edges. Hence, the closer the words are to edges, the less frequent they are. In the case of Hyundai, “seat” is the most frequent word, followed by “interior,” then “style,” and then the rest. In the case of Honda, the most frequent words are, in order, “mpg,” “gas,” “seat,” “comfort,” “mileage,” “dash,” “control,” “display,” “smooth,” “steering,” “wheel,” “econ,” “fun,” and so on. In contrast with Hyundai, where the main advantage was design and style, consumers mostly emphasize characteristics related to value, technology, and movement on a road. In the case of Ford, the most frequent words are “handles,” “seat,” “interior,” “system,” “sync,” “style,” “comfort,” “gas,” “transmission,” “exterior,” “mileage,” and so on.
Figure 7. Best features (Hyundai, Honda, and Ford) using word cloud.
In the case of Hyundai, after filtration only 14 words remain. As shown in the barplot in Figure 8, there are many words relating to appearance. These are “interior” (24), “style” (19), “exterior” (13), “look” (13), and “design” (10). The interpretation of this result is that consumers mostly liked the car design. The most frequent word is “seat,” which occurs 43 times. Because this word has such high frequency, association analysis was conducted to determine its significance. We performed correlation analysis on the most frequent word, as shown in Figure 9. For example, in the case of Hyundai, the term “seat” has a high correlation with words such as “position,” “front,” and “back.” Hence, we can assume that this word refers to the convenience and comfort consumers felt when they sat in a Hyundai Elantra. Consistent with this interpretation, words such as “back,” “comfort,” and “rear” might also refer to comfort, which was one of the best features for consumers. Similarly, the occurrence of words such as “mpg” and “gas” means that consumers were satisfied with Hyundai’s fuel consumption. For the word “control,” the most closely associated word was “steering” with a correlation of 0.78. This means consumers were likely to be satisfied with their control over the movement of a vehicle.
Figure 8. Best features (Hyundai, Honda, and Ford) (barplot).
Figure 9. Correlation graph for the term “seat” (Hyundai).
In the case of Honda, the first two words are “mpg” and “gas” which means this car has very low fuel consumption. Additionally, for some consumers, seats seem to be very comfortable. The rest of the words are related to the dashboard and technological features such as “dash,” “control,” “display,” “steering,” “system,” “bluetooth,” and “econ.” The word “steering” refers more to technology than holding the road because the correlated words were “wheel,” “control,” “electronic assist,” and “dash.” The word “econ” was correlated with words such as “mode” and “feature.” This is explained by the fact that the Honda Civic has an econ button as a special function, which has become one of its most favored features.
In the case of Ford, the most frequent word is “handles,” which occurred 48 times. Because it is quite difficult to interpret this word, an association analysis was conducted as shown in Figure 10. We can assume that the word “handles” does not refer to the means by which a thing is held, carried, or controlled, but how easily a car is to handle on a road. Such words as “turn,” “directions,” “balance,” “quiet,” “turn,” and others can help to precisely interpret the meaning of this word. Another frequent word was “sync,” which are correlated with some words such as voice, system, phone, ipod, control, navigation, and so on.
Figure 10. Correlation graph for the term “handles” (Ford).
4.1.2. Worst Features
The same analysis was then conducted using reviews that contained the worst features of three brands. For a Hyundai Elantra, the most frequent word is “mpg,” which occurred 28 times, as shown in Figure 11. Although “mpg” also occurred in the results for best features, it is not impossible for the same term to appear in worst features. Here, we can assume that many consumers were not satisfied with fuel consumption and that these consumers outnumber those who were satisfied.
Figure 11. Hyundai’s worst features—barplot.
Moreover, if we consider the correlation analysis for “mpg” in Figure 12, we can see that there are highly correlated words such as “show,” “computer,” “onboard,” and “display.” Therefore, we assume there might be some problem related to displaying the mpg on the onboard computer. This hypothesis was checked manually, and it was found that many consumers were complaining about an incorrect mpg display.
Figure 12. Correlation for the term “mpg” (Hyundai).
Consistent with this finding, the correlation for the word “gas” yielded a similar result as shown in Figure 13, so we can assume that the words “estimates” and “misleading” are referring to the same problem.
Figure 13. Correlation for the term “gas” (Hyundai).
The barplot for worst features shows that consumers were unsatisfied with spare tire (“spare” and “tire” were the most highly correlated with = 0.89), noise on the road, fog lights, trunk, mpg efficiency, as well as the mpg display and seats. As mentioned previously, “mpg,” “gas,” “fuel,” as well as “seat,” occurred inconsistently in the results for best features. Such phenomena could be accounted for by the differing preferences of every individual. Furthermore, based on the proportion of words for both groups, mpg and mileage are more likely to be considered poor rather than superior features because the sum of occurrences in best features is 36; in worst features it is 61.
In the case of Honda, we can see that one of the most frequent words in terms of worst features is “interior,” as shown in Figure 14.
Figure 14. Honda’s worst features using word cloud.
The correlation graphic shows that this is highly correlated with “cheap,” as shown in Figure 15. Therefore, we can assume that some customers did not like the quality and appearance of their interior, viewing it as a drawback rather than an advantage. Additionally, “fabric” is correlated with “interior.” Although the correlation is quite low, we can still assume that reviewers were unsatisfied with the material of their interior. Furthermore, some consumers were not satisfied with back or front seats. Moreover, fog lights, mirrors, and noise on roads became some of the worst features in the Honda Civic as shown in Figure 14.
Figure 15. Correlation for the term “interior” (Honda).
In the case of Ford, the worst features were “transmission,” “seat,” “back,” “control,” “fix,” “issue,” “shift,” “rear,” “wheel,” “system,” and so on. The most frequent word “transmission” occurred 57 times, comparatively larger than other terms. To understand the transmission flaw in the Ford Focus, an association graph was built, as shown in Figure 16.
Figure 16. Correlation for the term “transmission” (Ford).
Among the terms highly correlated with “transmission” were “severe,” “grinding,” “crunching,” and “bucking.” Therefore, we can assume there is a problem with the transmission, as it is perceived as making strange sounds and being inconvenient to use. In addition, the terms “issue,” “fix,” “manual,” “problem,” and “shift” also correlated with “transmission,” which means that it is probably the most significant problem with the Ford Focus. Ford also seems to have problems in terms of technology. For instance, the terms “control” and “wheel” were highly correlated with “steering,” “device,” “equipment,” “aux,” “cruise,” “dashboard,” and other words, which can be interpreted as Ford exhibiting a deficiency in equipment. There are also consumers who are certainly not satisfied with the seats and space in a cab, both front and back (rear).
4.1.3. Implications and Discussion
Based on the results, we can assume that the biggest strength of the Hyundai Elantra car is its design. This is supported by the fact that the most frequent words are related to car appearance. These include “interior,” “style,” exterior,” “look,” and “design.” The worst features for Hyundai appear to be gas consumption and some problems with technology, such as the mpg display on the onboard computer and problems with a spare tire. In contrast to the Hyundai car, the biggest strengths of the Honda car are low gas consumption, dashboard, controls on the steering wheel, and the “econ” mode, which improves fuel efficiency. The worst feature for Honda appears to be its interior, which reviewers emphasized as cheap. Furthermore, they were unsatisfied with the material it was made of. In the case of Ford, the biggest strength appeared to be manipulation of the car. This is consistent with the high frequency of the word “handles.” The, other best features for Ford were the interior and exterior, noiselessness during the ride, and the “Ford Sync” system which allowed users to control automotive functions using their voice. The biggest disadvantage for Ford was found to be transmission. This was supported by the high frequency of the word “transmission” and other frequent yet negative words, such as “issue,” “fix,” “manual,” “problem,” and “shift.” Another important disadvantage relates to technology, specifically a problem with the controls on the steering wheel. In addition, in all three cases one of the most frequent words was “seat,” which appeared in both “best features” and “worst features” categories, suggesting that reviewers are divided in their opinions. Hence, it can be concluded that it is difficult for all three car brands to find favor in the eyes of all consumers.
4.2. Analysis of Car Features Using the Association Rule
According to Edmunds.com, eight different consumers rate features whereby each one of eight features refers to certain terms and involves some form standard conception. Otherwise, every individual might have a different conception about each of the features relative to other individuals.
Performance involves terms such as acceleration, braking, road holding, and shifting.
Comfort relates to front seats, rear seats, getting in/out, and noise/vibration.
Value involves fuel economy, maintenance cost, purchase cost, and resale value.
Interior implies cargo/storage, instrumentation, interior design, and the logic of controls.
Reliability relates to repair frequency, dealership support, engine, and transmission.
Safety consists of headlights, outward visibility, parking aids, and rain/snow traction.
Technology stands for entertainment, navigation, Bluetooth, and USB ports.
Exterior stands for exterior design.
The question that arises is: how are the most frequent terms for each car related to the eight different features and what is their frequency? To answer this question, both groups of reviews, which contain best features and worst features, were combined and analyzed using text mining tools. The aim was to determine the frequency of every word that occurs in reviews. The process was conducted for all three car brands: Hyundai Elantra 2012, Honda Civic 2012, and Ford Focus 2012. The 24 most frequent terms were chosen as a sample and, using the same association approach as in previous research, the relationship between the terms and eight different features were found. The results are presented in Table 2.
Table 2. Relationship between terms and eight features.
Although the frequency for the most frequent terms was found, the total number of all reviews for each vehicle brand was different; specifically, 116 for the Hyundai Elantra, 156 for the Honda Civic, and 267 for the Ford Focus, respectively. Thus, we have to adjust the numbers to the common denominator to interpret the comparison between three brands more clearly. To do this, the following formula was applied:
where F is approximate occurrence of a given term in one review.
Thus, all words according to a specific feature were summed, and their frequency before and after adjustment was determined.
4.2.1. Analysis Results for Eight Features
As shown in Table 3, the highest frequency was for terms related to comfort and interior features in Hyundai, comfort and technological features in Honda, and comfort in Ford (Criteria: F ≥ 1). Therefore, reviewers were mostly interested in these features and discussed these most heatedly. These can now be scrutinized more closely for each case.
Table 3. Frequency before and after adjustment.
If we compare the frequency of terms for interior and the average rating score of consumers for interior we see that terms related to interior appear more often in the satisfied group and in best features because the score for interior is quite high. Therefore, the Hyundai company has won the favor of consumers in respect to its interior. The comfort score is 3.97, which is neither high nor low. This suggests there might be some factors reviewers were not satisfied with. Thus, a more precise analysis is needed. Another feature worth considering is the exterior feature. Compared to other automobile brands, the frequency of terms related to the exterior is significantly higher than for Honda and Ford. It can therefore be assumed that the exterior is also the strongest feature for Hyundai. Its score is highest among the scores for all features and the likelihood that terms related to an exterior would mostly occur in the satisfied group and best features is very high.
Comparing the frequencies and scores for comfort and technology, it is very likely that terms related to these will occur mostly in the satisfied group and best features because the score for both groups is pretty high. Furthermore, the frequency and score for technology is the highest among all three automobile brands, which can be interpreted as Honda being a technological leader.
(3) Ford case
Although the frequency of terms related to comfort is high, the score is quite low. Although it is not the lowest score compared to other features, the result suggests that such terms would appear in both satisfied and unsatisfied groups and in both the best and worst features groups. Furthermore, reliability is also worth mentioning, because the frequency of terms related to this feature is much higher than for Hyundai and Honda. With a very low score for reliability, we can assume that terms will mostly appear in worst features for both satisfied and unsatisfied groups.
4.2.2. Comparison of Two Groups’ Reviews
In this section, we compare the reviews of both groups and find terms whose influence is greater than others. We also compare the differences between satisfied and unsatisfied groups of reviewers. What, therefore, are the frequency and ratio of terms for eight different features between satisfied and unsatisfied groups and what are the implications of this?
As shown in Table 4, in the case of Hyundai consumers rarely mentioned words related to performance in comparison to Honda and Ford consumers.
Table 4. Performance (best/worst features of each group).
The satisfied group mentioned the words “control,” “system,” and “speed” more often than the unsatisfied group, although it is difficult to say they were definitely satisfied with these factors because these terms appeared in both best features and worst features. The frequency of these terms in the unsatisfied group is very low. Along with a fairly low score for performance, we can assume that reviewers were unsatisfied due to factors other than “control,” “system,” and “speed.” In the case of Honda, consumers were generally satisfied with the performance because the most frequent words for this feature mostly appeared in best features, and, looking at the performance score of 4.29 in Table 3, we can assume they were significant to a certain degree. The result for Ford is ambiguous, but we can say with confidence that reviewers like how the car handles as the term “handle” appeared much more frequently than other terms and, in 95% of cases, appeared in best features for both satisfied and unsatisfied groups.
As shown in Table 5, Hyundai drivers felt comfortable in the car, and most were very satisfied with the seats. However, it seems that it has some problems with noise as the average score for comfort is low. Looking at the results for Honda, consumers who were satisfied with the Honda Civic purchase felt very comfortable in a cab and were pleased with the space provided, but it is likely that both satisfied and unsatisfied groups were unsatisfied with the back seats. In the case of Ford, there were people who found it comfortable and people who did not. The unsatisfied group did not discuss the comfort feature as much as the satisfied group, and it seems that individuals from the satisfied group liked neither the front nor the back seats. The most positive aspect of Ford mentioned by reviewers is the manipulation of the car.
Table 5. Comfort (best/worst features of each group).
As shown in Table 6, most Hyundai holders were not satisfied with the fuel consumption of this car. Nevertheless, some of the satisfied group felt that Hyundai’s mpg was not bad. The reason for this might depend on individual satisfaction levels in relation to mpg assessment. In the case of Honda, all terms related to fuel consumption constantly appeared in best features for both satisfied and unsatisfied groups. Occurring several times in worst features was the word minimal, which means Honda’s mpg is very high and probably best among the three brands. Ford holders did not mention fuel consumption as much as owners of the other cars, but it is likely that Ford does not have any problems with fuel consumption and is credibly even better than Hyundai’s mpg. Hence, there are other factors that resulted in the low score for value.
Table 6. Value (best/worst features of each group).
As shown in Table 7, the interior was most often discussed in the Hyundai case, and it is clear that reviewers from both satisfied and unsatisfied groups were greatly satisfied with this attribute. However, there is also a problem with a spare tire, which often appeared in worst features for both satisfied and unsatisfied groups. Opinions about the interior for Honda were divided among the satisfactory group, but the common element for both groups is that they liked the dashboard display. Additionally, the satisfied group often mentioned “room,” which means they were satisfied with this feature. The interior results for Ford holders were good rather than bad, but it seems this was not the most important feature for reviewers.
Table 7. Interior (best/worst features of each group).
As shown in Table 8, in all three cases, customers were plenty satisfied with the exterior, especially Hyundai and Ford users. For instance, the occurrence of terms related to the exterior was very high for the Hyundai Elantra and the term “exterior” never appeared in worst features in either of the two groups. In the case of Ford, relative to other features, the exterior was the only factor with which consumers were satisfied. It is also the only factor which has a high score in the Ford sample. However, looking at the frequency of the term, we can assume that it was not the hottest topic for discussion compared to Hyundai. For the Honda Civic, the term “exterior” did not appear at all, which means it was not the main factor in determining whether customers purchased this car.
Table 8. Exterior (best/worst features of each group).
Reliability and Safety
As shown in Table 9, among the 24 most frequent terms for Hyundai and Honda, only one word, “engine,” was related to reliability. There might be other words such as “engine” that could influence a decision to rate reliability, but these did not appear among the most frequent words. Therefore, it is hard to determine the extent to which the term “engine” affected the reliability score, but many people from the satisfied group for Honda were satisfied with its engine and mentioned it a few times. In the Hyundai group, the term “engine” occurred almost equally in best features and worst features. In the satisfied group it occurred more often in best features while in the unsatisfied group it appeared more frequently in worst features, which makes sense. Thus, we can assume there was an approximately equal number of people who were satisfied and unsatisfied with the engine.
Table 9. Reliability and safety (best/worst features of each group).
Reliability was actively discussed in the Ford group. It is clear that Ford has serious problems in this field, mostly to do with transmission. The frequency of the term “transmission” was 0.29, the highest among all terms in the reliability group and more than three times higher than the frequency of the term “engine” in the Hyundai and Honda groups. Furthermore, the frequency of terms “manual,” “automatic,” “issue,” and “shift” was also high. Based on frequency analysis, where a link was found between these terms and “transmission,” we can say that both satisfied and unsatisfied groups criticized transmission, and the occurrence of these terms in total was 0.73, which is very high for just one specific part of a car. The low score of 3.19 in Table 3 is consistent with the results for frequency, so Ford must solve this problem in order to secure clients’ trust. In the case of safety, it is difficult to interpret the results as there is only one word, “light,” that, after association analysis, was correlated with several features such as “safety” and “technology.” Hence, it would be a mistake to judge the significance of the relationship between safety scores for all three car brands and the term “light” as well as its frequency in the satisfied and unsatisfied groups.
Among the three brands, technology was the most frequently discussed by Honda owners, quite frequently by Ford owners, and least often in the Hyundai group as shown in Table 10. Along with comfort, technology was the hottest topic for discussion in the Honda group. We can say with confidence that Honda holders from both satisfactory groups greatly enjoyed using the steering wheel, Bluetooth, econ function, and inward system. Opinions about the dashboard varied, as there were reviewers in both satisfied and satisfied groups who liked or did not like this feature. Hence, Honda consumers were very satisfied with the technological side, and the technological level is probably the highest among the three brands. In the case of Ford, satisfied and unsatisfied groups mentioned terms related to technology in both best features and worst features, and the ratio was quite similar. Indeed, one of the most frequent terms was “sync,” which refers to Ford’s special feature. Looking at the results, it seems that, regardless of the satisfactory group, some reviewers enjoyed using this system and some did not. Therefore, the technological side of the Ford Focus was worth paying attention to, but it is unclear whether this is beneficial or disadvantageous for an automotive company. Given the very low score in Table 3 for the technology, we can presume that reviewers who mentioned these words in worst features evaluated it very negatively, while reviewers who mentioned these words in best features did not evaluate it highly. If we look at the results for Hyundai, we can say that consumers liked the Bluetooth system, but we cannot say the same about other terms. Therefore, we suppose there are other factors that resulted in the low score for technology.
Table 10. Technology (best/worst features of each group).
4.2.3. Implications and Discussion
Based on the results of this research, the following propositions can be stated. Firstly, among the three car brands, the Hyundai Elantra car has the best marks in relation to “interior” and “exterior,” but, in terms of the interior, there is a problem with a spare tire that needs to be solved. A few people were also unsatisfied with gas consumption, so it would be better for engineers from Hyundai to improve the mpg index. Furthermore, Hyundai has problems in terms of technology, one of which is incorrect mpg displays. Moreover, despite consumers’ satisfaction with comfort, there was a problem with noise on roads. This seems to be the main reason for a relatively low score for comfort. Hence, Hyundai should reconsider the value and technological particularities of the Elantra car to make it more competitive on the market. Secondly, among the three car brands, the Honda Civic received positive feedback for all features and was found to be best in terms of value and technology. It has the best mpg index and the best technological equipment compared to Hyundai and Ford. Despite satisfaction with all features, Honda engineers should pay attention to the interior, because many consumers criticized it for its cheapness. In addition, Honda should consider the issue of comfort, because back seats were also found to be a weak point. Thirdly, the Ford car was evaluated very poorly regarding all features with the exception of the exterior, where it received a high score. However, according to the results, this was not the most discussed topic among reviewers. Among the 24 most frequent terms, negative terms were in the majority. These were related to the topic of reliability, where reviewers severely criticized transmission and found this to be the biggest problem in the Ford automobile. Apart from problems with transmission, most consumers were quite unsatisfied with both back and front seats. The only feature reviewers were truly pleased with, according to the results, was manipulation of the car. In previous research, reliability was found to be one of the most significant factors for buyers. Therefore, looking at the poor evaluation of this car, reviewers were greatly disappointed with its reliability and, for this reason, the scores for other features were slightly biased. Hence, Ford marketers and engineers must completely reconsider and reassess their car from all sides, starting with the reliability feature.
In addition, the results for several features, such as safety, were unclear and ambiguous, thereby making interpretation difficult. This can be explained by the lack of terms chosen for the analysis. To fill such blind spots, a more extended analysis is needed.