Need for Data Analysis
Data analysis helps to understand:
How application is performing
Whether application is performing as expected
What are issues in the application
Issue debugging
Several logs are generated by application are unstructured
It would be good to have process to read the multiple logs, structure the logs and store logs at one location for analysis
What is ELK Stack?
ELK stack is collection of three open source tools forming log management platform helping in search, analysis and visualization of the logs.
The stack originally included only Elasticsearch, Logstash, and Kibana. But in 2015, Elastic added another open source technology: Beats. Rather than changing the acronym, Elastic now refers to the augmented stack as the Elastic Stack.
Elasticsearch
Elasticsearch is a modern, open source full-text search and analytics engine. Elasticsearch can be used for searching a full array of data types—from text, numbers, and geospatial data to other types of structured and unstructured data.
Built on the Apache Lucene library, Elasticsearch has a distributed architecture, offers simple, REST APIs, and stores data as schema-free JSON documents. It is easy to use and is scalable, enabling you to rapidly search fast-growing volumes of data.
Installation
Elasticsearch installation details can be found using below link
Features
Open source search server is written using Java
Used to index any kind of heterogeneous data
Has REST API web-interface with JSON output
Full-Text Search
Near Real Time (NRT) search
Sharded, replicated searchable, JSON document store
Schema-free, REST & JSON based distributed document store
Multi-language & Geolocation support
Advantages
Store schema-less data and also creates a schema for your data
Manipulate your data record by record with the help of Multi-document APIs
Perform filtering and querying your data for insights
Based on Apache Lucene and provides RESTful API
Provides horizontal scalability, reliability, and multitenant capability for real time use of indexing to make it faster search
Helps you to scale vertically and horizontally
LogStash
Logstash is an open source, server-side data processing pipeline that dynamically ingests data transforms it, and ships it to whatever location (or “stash”) user define. It can simultaneously ingest unstructured data streaming in from numerous sources—including websites, application servers, and data stores
Logstash filters and parses the data it collects, transforming it into a common format. It then sends that data wherever you want it to go. Many organizations send the transformed data to Elasticsearch, where logs can be indexed and searched. Once data is available in Elasticsearch, it can also be visualized with Kibana.
Installation
Logstash installation details are available at below link:
Features
Events are passed through each phase using internal queues
Allows different inputs for your logs
Filtering/parsing for your logs
Advantages
Offers centralize the data processing
It analyzes a large variety of structured/unstructured data and events
ELK LogStash offers plugins to connect with various types of input sources and platforms
Kibana
Kibana is an open source data analysis and visualization tool that turns the data stored in Elasticsearch into easily consumable charts, graphs, histograms, and other visual representations. Through a browser-based interface, you can use preconfigured dashboards to explore large data volumes.
Kibana provides a useful way to share insights across your organization. Non-technical users can easily see trends and assess KPIs, all through rich, customizable graphics. It Allows to create dashboard with several visualizations providing quick summary of analyzed data
Installation
Kibana installation details are available at:
Features
Powerful front-end dashboard capable of visualizing indexed information from the elastic cluster
Enables real-time search of indexed information
You can search, View, and interact with data stored in Elasticsearch
Execute queries on data & visualize results in charts, tables, and maps
Configurable dashboard to slice and dice logstash logs in elasticsearch
Capable of providing historical data in the form of graphs, charts, etc.
Real-time dashboards which is easily configurable
Kibana ElasticSearch enables real-time search of indexed information
Advantages
Easy visualizing
Fully integrated with Elasticsearch
Visualization tool
Offers real-time analysis, charting, summarization, and debugging capabilities
Provides instinctive and user-friendly interface
Allows sharing of snapshots of the logs searched through
Permits saving the dashboard and managing multiple dashboards
How ELK stack works
As I mentioned earlier, the different components of the ELK Stack provide a simple yet powerful solution for log management and analytics
The various components in the ELK Stack were designed to interact and play nicely with each other without too much extra configuration. However, how you end up designing the stack greatly differs on your environment and use case.
Log: Different Server logs that need to be analyzed are identified and given to logstash as input.
Logstash: Collect logs from various inputs and events data. It parses and transforms data and passes onto elasticsearch
ElasticSearch: The transformed data from Logstash is Store, Search, and indexed
Kibana: Kibana uses Elasticsearch DB to Explore, Visualize, and Share the data
Why ELK is popular?
Organizations are adopting the ELK Stack because Elasticsearch has become a leading choice over other search engines. Compared with other solutions, Elasticsearch can offer superior scalability, provide more powerful near-real-time search and analytics capabilities, and better support dynamic, changing data. Its native JSON-based Query DSL (domain-specific language) can also handle highly complex searches.
The ELK Stack also provides greater hosting flexibility than other stacks. You can deploy the ELK Stack on your preferred cloud provider, including AWS, Google Cloud, and Microsoft Azure. You also have the option to install components on servers running a range of operating systems—such as versions of Windows Server, CentOS, Ubuntu, and Debian. And you can run the stack in Kubernetes or Docker environments.
The fact that the ELK Stack comprises open source technologies has also contributed to its popularity. Unlike proprietary solutions, such as Splunk, the ELK Stack lets you avoid costly licensing fees while also joining a thriving open source community that is continuously innovating.
Basic Elasticsearch Concepts
Elasticsearch is a feature rich and complex system. There are some basic concepts and terms that all Elasticsearch users should learn and become familiar with.
Below are the some concepts to start with.
Index
Elasticsearch Indices are logical partitions of documents and can be compared to a database in the world of relational databases.
You can have as many indices defined in Elasticsearch as you want but this can affect performance. These, in turn, will hold documents that are unique to each index.
Indices are identified by lowercase names that are used when performing various actions (such as searching and deleting) against the documents that are inside each index.
Configuring and managing Elasticsearch indexes will likely take up a good chunk of your ELK maintenance hours
Documents
Documents are JSON objects that are stored within an Elasticsearch index and are considered the base unit of storage. In the world of relational databases, documents can be compared to a row in a table.
Data in documents is defined with fields comprised of keys and values. A key is the name of the field, and a value can be an item of many different types such as a string, a number, a boolean expression, another object, or an array of values.
Documents also contain reserved fields that constitute the document metadata such as index, type and _id.
Types
Elasticsearch types are used within documents to subdivide similar types of data wherein each type represents a unique class of documents. Types consist of a name and a mapping (see below) and are used by adding the _type field. This field can then be used for filtering when querying a specific type.
Mapping
Like a schema in the world of relational databases, mapping defines the different types that reside within an index. It defines the fields for documents of a specific type — the data type (such as string and integer) and how the fields should be indexed and stored in Elasticsearch.
A mapping can be defined explicitly or generated automatically when a document is indexed using templates. (Templates include settings and mappings that can be applied automatically to a new index.)
Shards
Index size is a common cause of Elasticsearch crashes. Since there is no limit to how many documents you can store on each index, an index may take up an amount of disk space that exceeds the limits of the hosting server. As soon as an index approaches this limit, indexing will begin to fail.
One way to counter this problem is to split up indices horizontally into pieces called shards. This allows you to distribute operations across shards and nodes to improve performance. You can control the amount of shards per index and host these “index-like” shards on any node in your Elasticsearch cluster.
Replicas
To allow you to easily recover from system failures such as unexpected downtime or network issues, Elasticsearch allows users to make copies of shards called replicas. Because replicas were designed to ensure high availability, they are not allocated on the same node as the shard they are copied from. Similar to shards, the number of replicas can be defined when creating the index but also altered at a later stage.
URI Search
The easiest way to search your Elasticsearch cluster is through URI search. You can pass a simple query to Elasticsearch using the q query parameter.
Conclusion
ELK stack is very powerful tool and can be very useful for organization for data analysis. This blog is gives basic idea about ELK stack. Please explore world of ELK stack and make use of it as per your need.
Thank you. Enjoy exploring ELK stack world.
留言