Big Data & ELK Stack (Elasticsearch, Logstash, Kibana)


Some years ago We started a new challenge on Big Data Platforms, new project coming and huge volume of data to analyze and proccess for BI Solution. We started looking and evaluting tools for this new project. We found  ELK Stack (Elasticsearch,Logstash, Kibana). Great platform and high h-scalability, many companies using it as back-bone for Big Data Analytics. Here is summmary about ELK.


What is Elasticseach Tool?

Elasticsearch is a distributed, RESTful search and analytics engine capable of solving a growing number of use cases. As the heart of the Elastic Stack, it centrally stores your data so you can discover the expected and uncover the unexpected.

We found interesting definion from QBOX Site.

“Elasticsearch is a juggernaut solution for your data extraction problems. A single developer can use it to find the high-value needles underneath all of your data haystacks, so you can put your team of data scientists to work on another project. Consider these benefits:

Real-time data and real-time analytics. The ELK stack gives you the power of real-time data insights, with the ability to perform super-fast data extractions from virtually all structured or unstructured data sources. Real-time extraction, and real-time analytics. Elasticsearch is the engine that gives you both the power and the speed.

Scalable, high-availability, multi-tenant. With Elasticsearch, you can start small and expand it along with your business growth-when you are ready. It is built to scale horizontally out of the box. As you need more capacity, simply add another node and let the cluster reorganize itself to accommodate and exploit the extra hardware. Elasticsearch clusters are resilient, since they automatically detect and remove node failures. You can set up multiple indices and query each of them independently or in combination.

Full text search. Under the cover, Elasticsearch uses Lucene to provide the most powerful full-text search capabilities available in any open-source product. The search features come with multi-language support, an extensive query language, geolocation support, and context-sensitive suggestions, and autocompletion.

Document orientation. You can store complex, real-world entities in Elasticsearch as structured JSON documents. All fields have a default index, and you can use all the indices in a single query to get precise results in the blink of an eye.”

Here is simple way how ES store data from different Data Sources.



What is Logstash Tool?

Logstash is an open source, server-side data processing pipeline that ingests data from a multitude of sources simultaneously, transforms it, and then sends it to your favorite “stash.”, Centralize, Transform & Stash Your Data.

“Logstash is a tool for log data intake, processing, and output. This includes virtually any type of log that you manage: system logs, webserver logs, error logs, and app logs. As administrators, we know how much time can be spent normalizing data from disparate data sources. We know, for example, how widely Apache logs differ from NGINX logs.

Rather than normalizing with time-sucking ETL (Extract, Transform, and Load), we recommend that you switch over to the fast track. Instead, you could spend much less time training Logstash to normalize the data, getting Elasticsearch to process the data, and then visualize it with Kibana. With Logstash, it’s super easy to take all those logs and store them in a central location. The only prerequisite is a Java runtime, and it takes just two commands to get Logstash up and running.

Using Elasticsearch as a backend datastore and Kibana as a frontend dashboard (see below), Logstash will serve as the workhorse for storage, querying and analysis of your logs. Since it has an arsenal of ready-made inputs, filters, codecs, and outputs, you can grab hold of a very powerful feature-set with a very little effort on your part.

Think of Logstash as a pipeline for event processing: it takes precious little time to choose the inputs, configure the filters, and extract the relevant, high-value data from your logs. Take a few more steps, make it available to Elasticsearch and—BAM!—you get super-fast queries against your mountains of data.”


What is Kibana Tool?

“Kibana lets you visualize your Elasticsearch data and navigate the Elastic Stack, gives you the freedom to select the way you give shape to your data. And you don’t always have to know what you’re looking for. With its interactive visualizations, start with one question and see where it leads you.”


“Kibana is your log-data dashboard. Get a better grip on your large data stores with point-and-click pie charts, bar graphs, trendlines, maps and scatter plots. You can visualize trends and patterns for data that would otherwise be extremely tedious to read and interpret. Eventually, each business line can make practical use of your data collection as you help them customize their dashboards. Save it, share it, and link your data visualizations for quick and smart communication.”


Full Stack looks like diagram below, Logstash extract and process data, Elasticsearch store and index data and finally Kibana explore/visualize data.



Add a Comment

Your email address will not be published. Required fields are marked *