Sunday 11 August 2013

Rabbitmq implementation in rails using amqp gem


RabbitMQ is a message broker.It accepts messages from producers, and delivers them to consumers. In-between, it can route, buffer, and persist the messages according to rules we give it.
We need Ruby's amqp gem to implement it in rails.

For installation of rabbitmq and amqp gem check out the github link of amqp gem

After installation,

You need to create a ruby file in initializers and paste this code snippet:

require "amqp"

error_handler = Proc.new do |settings|
  puts "!!! Failed to connect..."
  EM.stop
end
puts "Creating a subscriber"
 Thread.new {
  EventMachine.run do
    $connection = AMQP.connect(:host => '127.0.0.1')
    puts "Initialize Subscriber..."
    $channel = AMQP::Channel.new($connection)
    $queue = $channel.queue("amqpgem.examples.helloworld", :auto_delete => true)
    $exchange = $channel.direct("")
    $queue.subscribe do |payload|
        puts "Received a message: #{payload}..."
        sleep(10)
        puts "After sleep"
    end
  end 
}
class MessageQueue

  def push(message)
    puts "Got a message for pushing"+message
    EventMachine.next_tick do
      #puts "Published to the routing key "+$exchange.to_yaml
      $exchange.publish(message, routing_key: "amqpgem.examples.helloworld")
    end
  end
end

MESSAGE_QUEUE = MessageQueue.new

Now In order to push a message into the queue call this in the controller:

MESSAGE_QUEUE.push("Pushing my first message")


And now whenever you call the controller,  you will get a message into the console "After sleep" after around 10 seconds.

Here is Little explanation of the code:

Whenever we start our server, our ruby file gets loaded which in turn creates a thread and start EventMachine into it.

EventMachine is an event-driven I/O and lightweight concurrency library for Ruby. It provides event-driven I/O using the Reactor pattern, much like JBoss, Netty, Apache MINA, Node.js.

 After EventMachine has been started we start a connection to AMQP broker(Rabbitmq). Then we create a channel on the connection and a queue on that channel.And then we are starting our subscriber and bind it to the queue. It will be responsible for taking the message and other information from the queue and process  accordingly as instructed.

Now, whenever  we call the controller it calls
MESSAGE_QUEUE.push and passes a message to the push method which in turn pushes  it to the queue. Then there,it will be taken by the subscriber and it will do rest of the work.

For detailed explanation  checkout the github link of amqp gem

Thursday 25 July 2013

Parse formatted PDF in rails

I had to parse a PDF which was formatted and it has different styles eg: bold texts. I tried pdf-reader gem but it wasn't parsing properly. Bold texts were repeated.And it was really hard to figure it out what was the original text.

Then I tried  iText its implementation has been done in Java.Implemented it using ruby Java Bridge, but none of its strategy could parse my formatted pdf properly.

Then at last I found Docsplit and It could parse my PDF properly.

I will simply show you the steps:-

These are the bunch of gems you need to include and one more thing you might need is SUDO permission in order to install some gems:-

gem "pdftk"
gem "docsplit"
gem "glib2"
gem "gdk_pixbuf2"
gem "poppler"

Some of the gem doesn't get installed simply through bundle install and you might need to install it using apt-get. Just google it if you are having any issue on installing any gem or you can leave a comment below.

After successfully installing all the gems:

You simply need this one line of code and it will parse your PDF and save all of its text to a text file with the same name as the PDF file.

file_dest = Rails.public_path+'/pdfparser/text (where you want to save the text file)
Docsplit.extract_text(pdf_path,:output =>file_dest)

There are many other options provided,  like you can parse a specific page of the PDF or even extract images from the PDF please refer to its documentation.

 And after this if you want to find text between two texts from the file here's  what you need to do:

text_main = File.open(extracted_text_url).read
# you need to use  Regexp.escape if you have any special character in your from text.

text = text_main.scan(/#{Regexp.escape(str_from_text)}(.*?)#{str_to_text}/m)
text = text[0].try(:first).try(:rstrip).try(:to_s)

text variable will contain the text you want.