Select frequent words (whose count is equal or greater than 50,000).
Display the frequent words in descending order.
Get groups of words by their length (Hint: use the built-in function SIZE) and count each group.
(2,1096049) means that there are 1096049 occurrence of words that have two characters.
Problem 3 is based on dataset nyc_taxi_data_2014.csv.gz
Find the effect of passenger_count on trip_distance, fare_amount, and tip_rate.
a) Create a new data set records2 that has passenger_count, trip_distance, fare_amount,
b) Filter records2 by passenger_count (0 < passenger_count < 10) and name the data set as
c) Group records3 by passenger_count.
d) Display the average trip_distance, average fare_amount, and average tip_rate per each
group of passenger_count.