× {{alert.msg}} Never ask again
Get notified about new tutorials RECEIVE NEW TUTORIALS

Simple cache when scraping with Ruby

Miha Rekar
Sep 13, 2014
<p>I'm scraping a bunch of websites lately and got bored with using <code>File.write</code> to store cached versions of websites. Because I'm still developing the script I don't want it to hit the real website every time. So simple way to fix that is with the <a href="https://github.com/vcr/vcr">vcr gem</a>. While made primarily for testing you can also use it for this kind of tasks.</p><p>First you need some kind of configuration file that loads before your actual script. I have it in <code>config/vcr.rb</code>:</p><pre><code>VCR.configure do |c| c.cassette_library_dir = 'cassettes' c.hook_into :webmock c.allow_http_connections_when_no_cassette = true end </code></pre><p>Then I have a <code>Shared</code> module with the <code>cache</code> method which I <code>include</code> in any classes I need this functionality:</p><pre><code>module Shared def cache name VCR.use_cassette name do yield end end end </code></pre><p>And now you can use this magic, to have the website you're scraping instantly cached:</p><pre><code>def github_for user cache "gh-#{user}" do response = open("https://api.github.com/users/#{user}").read JSON[response] end end </code></pre>
comments powered by Disqus