2. Research Background
2.1. Big Data Analytics and Business Value
2.2. Text Mining and Association Rule
2.3. Consumer Car Purchasing Behavior
3. Research Methodology
3.1. Text Mining Approach to Car Reviews
3.1.1. Scraping Data from Websites
If ((car = 1) and (overall_rating < 4.30)) GD = 0
If ((car = 2) and (overall_rating >= 4.60)) GD = 1
If ((car = 2) and (overall_rating < 4.60)) GD = 0
If ((car = 3) and (overall_rating >= 3.70)) GD = 1
If ((car = 3) and (overall_rating < 3.70)) GD = 0
where Car #1 is a Hyundai Elantra, Car #2 is a Honda Civic, and Car #3 is a Ford Focus. GD is a variable for group diversity where GD = 1 relates to the satisfied group and GD = 0 relates to the unsatisfied group. The result of the data split is presented in Table 1.
3.1.2. Input Data
3.1.3. Data Manipulation
3.1.4. Data Cleansing
3.1.5. Data Mastering
4. Results of the Study
4.1. Frequency Analysis
4.1.1. Best Features
4.1.2. Worst Features
4.1.3. Implications and Discussion
4.2. Analysis of Car Features Using the Association Rule
Performance involves terms such as acceleration, braking, road holding, and shifting.
Comfort relates to front seats, rear seats, getting in/out, and noise/vibration.
Value involves fuel economy, maintenance cost, purchase cost, and resale value.
Interior implies cargo/storage, instrumentation, interior design, and the logic of controls.
Reliability relates to repair frequency, dealership support, engine, and transmission.
Safety consists of headlights, outward visibility, parking aids, and rain/snow traction.
Technology stands for entertainment, navigation, Bluetooth, and USB ports.
Exterior stands for exterior design.
where F is approximate occurrence of a given term in one review.