Ruby

There are multiple barcode generator gems available, i decided on 'barby'
First I adding barby to the Gemfile file and using bundler to download and install said gem
Then I had to add the needed 'require' lines to controllers and views using barby. Code-128 is the type of barcode I needed to use.

require 'barby'
require 'barby/barcode/code_128'

Finally using the following lines of code I generated the barcode as a PNG and saved it into a file:

@barcode = Barby::Code128.new("1234")
outputter = Barby::PngOutputter.new(@barcode)
outputter.xdim = 2
File.open('barcode.png','wb') {|f| f.write outputter.to_png }

Taggings:

ProgrammingLanguage:

Information Extraction Process for News Websites in Ruby

In order to extract and save the information that is according to the challenge we need to follow these high level steps that will be covered in detail in the following:

  1. Analyze website's source code and identify proper tags that need to be extracted
  2. In our Ruby code, use a XML/HTML parser that is able to extract information based on proper selectors (either XPath or CSS selectors)
  3. Save extracted information in a data storage

In the following we will apply these steps to the news website: http://www.nachrichtenleicht.de/uebersicht/nachrichten/

1. Analyse source code, Identify tags
Looking at the source code of our example page, we can quickly identify that all news items share the same class, namely: .entry-teaser

Following this, we can extract the child elements of this top element with class '.entry-teaser'.
The title is in the h2 element with class '.entry-title' and the short description is in all the underlying p elements.

2. Extract Information based tag identifcation in step 1

For this we use the library Nokigiri[1] for Ruby - it is an HTML, XML, SAX, and Reader parser. Among Nokogiri’s many features is the ability to search documents via XPath or CSS3 selectors. In this particular example, we will use CSS3 selectors:

(Beware: indentation is not properly adjusted in this view)

require 'nokogiri'
require 'open-uri'
url = 'http://www.nachrichtenleicht.de/uebersicht/nachrichten/'
image_url_prefix = 'http://www.nachrichtenleicht.de'
doc = Nokogiri::HTML(open(url))
doc.css(".entry-teaser").each do |item|
link = item.at_css('.image').at_css('a')
href = link[:href]
image_url = image_url_prefix + link.at_css('img')[:src]
title = item.at_css('h2.entry-title').text
short_description = item.at_css('p a').text.strip
end

3. Save extraction information in database

For this we create a new model class[2] News (title, image_url, short_description, external_url, source)

We also extend our code example from before to save our new news item to the database:

news = News.new
news.title = title
news.image_url = image_url
news.short_description = short_description
news.external_url = href
news.source = url
news.save!

[1] http://nokogiri.org/
[2] This model class uses the Ruby Gem for Active Record: http://rubygems.org/gems/activerecord

Taggings:

Subscribe to Ruby