Author Archives: Mathew

Apache Hadoop MapReduce (Pseudo-Distributed mode)- Part 2

In this article we will run the example from Part-1 in pseudo-distributed single server mode. Most of the configuration details are clearly laid out on the Hadoop site at Setting Up Single Server Pseudo-Distributed mode. For the sake of additional clarity I will note them here and also run our previous job from Part-1 against the new cluster. I assume that you already have Hadoop downloaded and setup from the previous article.

Note: Updated to Hadoop 2.4.1 and re-published from original Sept 28th, 2011 blog.

Continue reading

Faceted search with ElasticSearch

I have been playing with ElasticSearch for a while now, both at work as well as personally. In recent discussions I came across a use case to perform faceted searches and figured this would be a good topic for a blog post. Lets explore by example how to implement faceted searches using both the older facet module as well as the newer aggregations module.

Continue reading

Career roles for senior techies…

Highly experienced & passionate technologists have a challenging task in terms of figuring out their career strategies. For some this process of self discovery and adjustment works itself out quick, but for others its a harder journey. A passionate technologist often feels the need to be exposed to new cutting edge technologies and at the same time expects appropriate career growth and recognition.

Continue reading

Sparkjava & JDBI

Once you spend a lot of time with a set of tools there is a tendency to come up with solutions to every problem with just those tools. This narrow tunnel vision is dangerous for a techie since you can be completely blindsided when something new comes up and you are found lacking in new skills. It also inhibits the ability to learn new things and take in new ideas. Having spent a lot of time in the Java Spring tunnel, it was a welcome break for me to try out SparkJava & JDBI recently – void of any Spring, JEE or IoC.

Continue reading

Using Jest as a REST-based Java client with ElasticSearch

If you have used ElasticSearch (ES) you will be familiar with the two ways you can access the index – the RESTful HTTP API’s and the Java API which uses a binary protocol. What is missing is a pure RESTful HTTP Java Client API. Open source Jest library attempts to fill that gap. Updated July 2016 to use ElasticSearch 2.3.4 and Jest 2.0.0.

Continue reading

Asynchronous Indexing into ElasticSearch using Spring Integration & ActiveMQ

Here is a slightly modified Architecture on my previous post on Getting Started With ElasticSearch. If you find yourself indexing content constantly (like 100s or even 1000s per minute) you might want to consider an asynchronous architecture towards indexing.

Continue reading

Getting started with ElasticSearch

You must have surely heard the tag lines “Data is gold” or “Data is oil”! If not, then you heard it now. The notion is that with the right type and volume of data, you can pull out very valuable insights to help support your business/IT goals. This data might be coming from your own applications, log files, social media data, blogs, online news media, etc. Data is everywhere. And when you have that data, you want to search through it for intelligent information. That is where search engines come to the rescue. I will cover one such search engine – ElasticSearch.

Continue reading