ASA Sections on:
Dealing with this amount of data is definitely a challenge and we hope that this data expo will inspire you to learn more about dealing with large volumes of data. To make sure you don't get overwhelmed, this page describes some simple command line tools to sort, filter and tabulate.
All of these tools are available on a default install of linux or mac os x. If you want to use them on windows, you will need to install cygwin or similar.
Sort by the 10th column (flightnum):
sort -t, -k 10,10 2008.csv
Remove header rows:
awk -F, '$NR != 1' 2008.csv
Show flights from Des Moines to Chicago O'hare
awk -F, '$17 == "DSM" && $18 == "ORD"' 2008.csv
Select only columns 9 (carrier) and 10 (flight num):
cut -f9,10 -d, 2008.csv
Count the number of flights for each flight number and save to
cut -f9,10 -d, 2008.csv | sort | uniq -c > 2008-flights.csv