Category Archives: Big Data

Digital Transformation

You probably have heard leadership talk about Digital Transformation in meetings, town halls and blogs. Industry pundits seem to talk/write about it; often using it to rank which companies are competing at the topmost level in their industry. Often though there is confusion among the rank and file employees about this whole “Digital” thing. Is it the latest buzzword that executives are in love with (probably true to some extent)? Is there anything behind it or just a lot of hot air? Who are all those high priced management consultants that show up to tell us about Digital Transformation, without explaining what it is?

Continue reading →

Running ElasticSearch in Production

(Updated version. Originally Published on: Nov 3, 2013) Here are some things to keep in mind as you go about designing your ElasticSearch cluster. Many of these are from real life experiences and IMO are the basic common sense items you should consider. In addition to these settings noted here, there maybe other settings that are relevant to your use case.

Continue reading →

Log Analysis with ELK

While I had the privilege of using ElasticSearch to implement media analytics in a past role, it is only recently that I have started looking at it for log analysis.

Continue reading →

Apache Hadoop MapReduce (Pseudo-Distributed mode)- Part 2

In this article we will run the example from Part-1 in pseudo-distributed single server mode. Most of the configuration details are clearly laid out on the Hadoop site at Setting Up Single Server Pseudo-Distributed mode. For the sake of additional clarity I will note them here and also run our previous job from Part-1 against the new cluster. I assume that you already have Hadoop downloaded and setup from the previous article.

Note: Updated to Hadoop 2.4.1 and re-published from original Sept 28th, 2011 blog.

Continue reading →

Apache Hadoop MapReduce (Local Mode) – Part 1

Hadoop is a framework that allows you to process large sets of unstructured or semi-structured data. The unstructured/semi-structured nature of the data and the sheer size (terabytes or petabytes) make the current RDMS offerings come short. Enter Apache Hadoop.

Note: Updated to Hadoop 2.4.1 and re-published from original Sept 28th, 2011 blog. Continue reading →

Asynchronous Indexing into ElasticSearch using Spring Integration & ActiveMQ

Here is a slightly modified Architecture on my previous post on Getting Started With ElasticSearch. If you find yourself indexing content constantly (like 100s or even 1000s per minute) you might want to consider an asynchronous architecture towards indexing.

Continue reading →

Getting started with ElasticSearch

You must have surely heard the tag lines “Data is gold” or “Data is oil”! If not, then you heard it now. The notion is that with the right type and volume of data, you can pull out very valuable insights to help support your business/IT goals. This data might be coming from your own applications, log files, social media data, blogs, online news media, etc. Data is everywhere. And when you have that data, you want to search through it for intelligent information. That is where search engines come to the rescue. I will cover one such search engine – ElasticSearch.

Continue reading →

Spring Integration with JMS, ActiveMQ and MongoDB

Extending from some of my previous posts around the 2012 Presidential political contributions, here I will use Spring Integration, ActiveMQ, JMS and Mongodb to load the CSV data into Mongodb.

Continue reading →

Notes from #MongoDC2012 conference

Notes from attending today’s (6/26/2012) MongoDB conference – MongoDC.

Continue reading →

MongoDB and Spring Data

This blog will give the reader a decent start with writing a Spring-based application that writes to MongoDB, retrieves data via queries and finally runs a simple MapReduce query. All this using Spring Data MongoDB support.

Continue reading →

{"Mat's Random Thoughts"}

Mathew's Tech Notes..

Category Archives: Big Data

Digital Transformation

Running ElasticSearch in Production

Log Analysis with ELK

Apache Hadoop MapReduce (Pseudo-Distributed mode)- Part 2

Apache Hadoop MapReduce (Local Mode) – Part 1

Asynchronous Indexing into ElasticSearch using Spring Integration & ActiveMQ

Getting started with ElasticSearch

Spring Integration with JMS, ActiveMQ and MongoDB

Notes from #MongoDC2012 conference

MongoDB and Spring Data