Sunday, June 26, 2011

A Real Life Github Success Story

Github has been a real treasure for developers, and I've used it both at work and for personal projects. Until now, I haven't used it to its full effect, that is, contributing.

For MetalDetectr, I wanted to allow a user to filter the list to see releases from artists he had in his last.fm library. A quick search led me to this gem, only it wasn't as fully-featured as I needed.

So I forked it.

Github made this really easy to do. Soon I had the repository in my account, cloned it locally, checked out a new branch, and I was working.

The code was clean and certainly made my life easier to get what I wanted. There was a /method_categories folder that contained the methods to do API calls to get or create information for artists, tracks, and users. I wanted to read in a user's library of artists, so I simply modeled this after the other files.

class Lastfm
  module MethodCategory
    class Library < Base
      regular_method :get_artists, [:user], [[:limit, nil], [:page, nil]] do |response|
        response.xml['artists']['artist']
      end
    end
  end
end
This created a get request call for a last.fm user, set an optional limit on the number of fetched results, and set an optional page number to scan to. Along with the API key, these fields are outlined in the last.fm api docs.

Testing worked similarly. A spec file contained the other method tests, so adding the following, plus a fixture of the xml response, was super easy.

  describe '#library' do
    it 'should return an instance of Lastfm::Library' do
      @lastfm.library.should be_an_instance_of(Lastfm::MethodCategory::Library)
    end

    describe '#get_artists' do
      it 'should get the artists\' info' do
        @lastfm.should_receive(:request).with('library.getArtists', {
          :user => 'test',
          :limit => nil,
          :page => nil
        }).and_return(make_response('library_get_artists'))
        artists = @lastfm.library.get_artists('test')
        artists[1]['name'].should eql('Dark Castle')
        artists.size.should == 2
      end
    end
  end
After adding these methods, I pushed the branch to my github repository and sent a pull request to the original repository. Again, github makes this trivially easy. Before it was accepted, I had this line in the Metaldetectr Gemfile:
gem 'lastfm', :git => 'git://github.com/dbolson/ruby-lastfm.git', :branch => 'library_get_artists'
With the pull request accepted and my code merged into the master branch, it looked like this:
gem 'lastfm'

That's all it took to contribute to open source software.

Wednesday, June 15, 2011

Metal Archives' JSON Results Parsing

Some further explanation of how to get Metal Archives' JSON data from a recent post is necessary. Through reading the markup and trial-and-error, I found the URL to receive the data I needed. Here it is:

http://www.metal-archives.com/search/ajax-advanced/searching/albums \
/?&releaseYearFrom=2011&releaseMonthFrom=1&releaseYearTo=2011 \
&releaseMonthTo=12&_=1&sEcho=0&iColumns=4&sColumns=&iDisplayStart=1& \
iDisplayLength=100&sNames=%2C%2C%2C

This returns a result set that looks like this:

{ 
 "error": "",
 "iTotalRecords": 3637,
 "iTotalDisplayRecords": 3637,
 "sEcho": 0,
 "aaData": [
  [ 
    "<a href=\"http://www.metal-archives.com/bands/037/3540277845\" title=\"037 (ES)\">037</a>",
    "<a href=\"http://www.metal-archives.com/albums/037/Los_Fuertes_Sobreviven/307703\">Los Fuertes Sobreviven</a>",
    "Full-length", 
    "May 24th, 2011 <!-- 2011-05-24 -->"  
 ],
  [ 
    "<a href=\"http://www.metal-archives.com/bands/037/3540277845\" title=\"037 (ES)\">037</a>",
    "<a href=\"http://www.metal-archives.com/albums/037/Tantas_Vidas/306172\">Tantas Vidas</a>",
    "Single", 
    "May 6th, 2011 <!-- 2011-05-06 -->"  
 ]

You'll notice the iTotalRecords field which conveniently provides the total amount to releases available. You'll also notice the the iDisplayStart parameter in the URL that lets us step through the results 100 at a time. By looping through (iTotalRecords / 100 + 1) times, incrementing iDispalyStart by i * 100, we can get a result set for all the records very quickly.

Now that we have the results, we just need a little regular expression magic to pull out all the information.

BAND_NAME_AND_COUNTRY_REGEXP = /(.+)\s{1}\(([a-zA-Z]{2})\)/
ALBUM_URL_AND_NAME_REGEXP = /"(.+)">(.+)<\/a>/
RELEASE_DATE_REGEXP = /<!--\s(.{10})\s-->/

There was a strange situation where an album didn't have a band page but displayed a message that the band didn't exist, so there is one last regular expression used to guard against a slightly alternative format for the data:

NO_BAND_REGEXP = /span.+<\/span/

The data are much easier to gather and never time-out now, so I was able to get rid of all the intermediate saving steps such as after gathering the paginated links and saving the last release searched when the site times-out. I'll probably have to add it back in to get the record label of the release since you'll notice it's absent in the JSON but it is available on the release's page.

The code to save the albums now looks like this:

agent = ::MetalArchives::Agent.new
agent.paginated_albums.each_with_index do |album_page, index|
  album_page.each do |album|
    if album[0].match(::MetalArchives::Agent::NO_BAND_REGEXP).nil?
      Release.create(
        :name => agent.album_name(album),
        :band => agent.band_name(album),
        :format => agent.release_type(album),
        :url => agent.album_url(album),
        :country => agent.country(album),
        :us_date => agent.release_date(album)
      )
    end
    CompletedStep.find_or_create_by_step(CompletedStep::ReleasesCollected)
  end
end

Quick and simple.