How to Parse a CSV File with Ruby
Ruby makes everything a breeze, including parsing different files like CSVs.
There are three steps involved in this process - accessing the CSV, parsing it, and then iterating through it.
Accessing a CSV File With Ruby
In order to parse our CSV file, our Ruby code needs to be able to access it so it can later map through and parse each row.
Accessing a CSV in a Ruby Script
If you are running a Ruby script in the Rails console, you can upload your CSV file onto your Rails server and parse it directly from there.
For example, if we have a project called Testsuite, we could navigate to that directory:
- Create (such as
touch example.csv
) or copy a file into the Testsuite directory - Open up the console by running
rails console
- Verify that you can access the CSV by running
File.read("example.csv")
and check the result
Then we can focus on parsing this CSV file, you can skip to parsing CSVs.
Accessing CSV in a Ruby on Rails Application
If you want to build a feature around programmatically parsing CSVs in a Rails app, we won't be able to manually upload them as we did in the above example.
Instead, we need to download the CSV to our server or pull it from an external source like an AWS S3 bucket.
If We Have the URL of the CSV
We can make a request to the given endpoint and store the response as a variable.
table = Net::HTTP.get_response(URI.parse('https://example.com/data/table.csv'))
file = StringIO.new(table)
And now we can call file.read
and parse the CSV within Ruby!
If the CSV is Stored in AWS S3
If we store CSVs in S3, we could use the aws-sdk
gem to connect to AWS and pull it like so:
bucket = Aws::S3::Resource.new.bucket("bucket_name")
file = bucket.object("path_to_file").get.body.string
data = CSV.parse(file, col_sep: "\t", headers: true).map(&:to_h)
The resulting data
variable would be a list of rows in a hash format. Each row would be a hash where the column header is the key and the value of that column is the value for that key.
This approach combines both downloading the CSV and parsing it into a list that we can iterate over. You can skip to iterating a CSV here.
Parsing a CSV File With Ruby
Once we have a CSV, iterating over its rows is pretty straightforward.
Ruby allows us to open files by calling File.read
and we can parse CSV files by calling CSV.parse
on the resulting file.
Combine both calls and we can parse a CSV like this:
file = CSV.parse(File.read("filename.csv"), headers: true)
Note, it's optional but I like to include the headers: true
argument as it makes this a little easier to parse and any code that deals with this CSV will be easier to maintain.
The headers: true
argument allows us to access each row of our CSV in the format of a ruby hash.
You can see the first row and what this format will look like by calling:
file.first
And the result will be a hash of column headers and values for each column in the given row. (In this case file.first
gives us the first row.)
{
"id": 1,
"name": "Testsuite",
"score": 100
}
The specific keys and values of the hash will match the columns of your specific CSV file, where the keys are the column headers and the values are the row's values.
How to Iterate Over Our CSV
So we can access our CSV and have parsed it into an iterable format.
Now we can do something with it.
Since CSV.parse
results in a list of rows, we can iterate over our file like so:
file.each do |row|
# Do something with this row
end
And since each row has been formatted as a Ruby hash, we can access each column of the row with the standard Ruby hash syntax.
total_score = 0
file.each do |row|
puts "#{row['name']} is the best!"
total_score += row['score']
end
And that's it!
Once you have a good understanding for how to parse and map through a CSV file's rows, you can do anything you want with a CSV.
For example, instead of just reading CSVs we can easily allow CSV downloads from our application by exporting data as a CSV.
I hope this was helpful and gave you a full understanding of CSV parsing in Ruby!