Prepare better with the best interview questions and answers, and walk away with top interview tips. These interview questions and answers will boost your core interview skills and help you perform better. Be smarter with every interview.
Splunk gives the real time answer which is required to meet the customer expectation and business goals as well. It is connected to machine data and gain the insights from that and it tells us the opportunities and the risks for the business. Splunk scales to meet the modern data needs i.e. embrace the complexity and get the desired answers. Leverage artificial intelligence powered by machine learning for actionable and predicative insights.
Splunk provide the insights in the business, visibility in the operational, monitoring the environment proactively and search & investigate the issue.
The main components of Splunk are:
Splunk agents known as forwarder is installed/deployed on the application severs which collect the data from the source and forward it to the Indexer.
Then Indexer store these data locally based on the license capacity in a host machine or on cloud.
Post to these setup Search Head will be coming in picture where it is used for searching, analyzing, visualizing and performing various other functions on the data stored in the Indexer.
There are other tools in the market apart from Splunk for analyzing machine data, performing business intelligence for doing the business operations and providing the security. However apart from Splunk there is no other tools which can do all the operations.
Please find the basic difference below with few other recognized tools.
Features | Splunk | Sumo Logic | ELK (Kibana) |
---|---|---|---|
Searching | Searching is possible | Searching is possible | Only possible with Integrations |
Analysis | Analysis is possible | Analysis is possible | Only possible with Integrations |
Visualization Dashboard | Dashboard can be created | Dashboard can be created | Only possible with Integrations |
SaaS Setup | SaaS Setup is done | SaaS Setup is done | SaaS Setup is done |
On Premise Setup | OnPremise Setup is done | OnPremise Setup not possible | OnPremise Setup is done |
Input any data type | Any datatype can be done | Needs Plugins | Needs Plugins |
Customer Support | Available | Available but proficiency is lacking | Available but proficiency is lacking |
Documentation & Community | Available | Unavailable | Available |
Please find the benefits of the data flowing from forwarders to Splunk below
In any systems alerts are been configured for an erroneous situation in the system like CPU utilization, high memory consumption and few more so even Splunk alerts are the same where notification are been triggered to the configured mail id foe ant such happening.
Please find few examples for splunk alerts
For example,
Clustering technique has two terminologies known as Search Factor & Replication Factor.
Search factor determines what is the count of searchable copies for the data which is owned by the indexer.
Replication Factor in case of Indexer cluster, is the number of copies of data the cluster maintains and in case of a search head cluster, it is the minimum number of copies of each search artifact, the cluster maintains.
With respect to cluster Search head cluster has only a Search Factor and Indexer cluster has both a Search Factor and a Replication Factor
Moreover replication factor should not be less than search factor
There are many commands which are used during filtering the result. Please find few of the command used below
Stats | Chart | Timechart |
---|---|---|
It is reporting command | Displays the search result in bar format or line form or area graph | Bar and line graphs can be viewed |
Multiple fields used to create a table. | Takes only 2 fields, each field on X and Y axis respectively. | Takes only 1 field since the X-axis is fixed as the time field. |
Lookup commands are used when you want to receive some fields from an external file (such as CSV file or any python based script) to get some value of an event. It is used to narrow the search results as it helps to reference fields in an external CSV file that match fields in your event data
An inputlookup basically takes an input as the name suggests. For example, it would take the product price, product name as input and then match it with an internal field like a product id or an item id. Whereas, an outputlookup is used to generate an output from an existing field list. Basically, inputlookup is used to enrich the data and outputlookup is used to build their information.
Different types of Data Inputs in Splunk are
There are about 5 fields that are default and they are barcoded with every event into Splunk.
It can be extracted either from the below options
Please find the difference between Search time and Index time field extractions below
Search time field extractions | Index time field extractions |
---|---|
Fields are extracted while performing the search. | Fields are extracted when data comes to the indexer. |
No disk space is been consumed as extracted fields are not part of the metadata | Disk space are been consumed as extracted fields are a part of metadata |
Summary index is used to store the analytical data, reports and summaries.
Pros | Cons |
---|---|
Summary index retains the analytics and reports even after the data has aged out | Hayrick kind of a search not possible |
Deep dive analysis is not possible |
Time zone property provides the desired output for given time zone. Splunk picks up the default time zone from your browser settings. The browser in turn picks up the current time zone from the machine which is currently in use. Splunk picks up that time zone when the data is input, and it is required the most when we are searching and correlating data coming from different sources.
Splunk App is the collection of reports, dashboard, alerts, field extractions and lookups whereas Splunk Add-ons are same but they don’t have the visual components of a report or a dashboard.
Color need to be assigned to the charts while reports are been created. However if the colors are not assigned then colors are picked by default
Steps to assign the colors are
OR
It is one of the default fields that Splunk has assigns to all incoming data. It informs the Splunk what kind of data is been send so that it can format the data intelligently during indexing. Sourcetype also helps to categorize the data for making the search easy.
Whenever data limit is exceeded then ‘license violation’ error will been thrown. The license warning that is thrown up will persist for 14 days. For commercial license 5 warnings within a 30 day rolling window is been setup before which Indexer’s search results and reports stop triggering. In a free version it will show only 3 counts of warning.
Common ports numbers on which services are run (by default) are
Service | Port Number |
---|---|
Management / REST API | 8089 |
Search head / Indexer | 8000 |
Search head | 8065, 8191 |
Indexer cluster peer node / Search head cluster member | 9887 |
Indexer | 9997 |
Indexer/Forwarder | 514 |
Bucket concepts is been used in Splunk where all the data coming in the indexer are been stored in a directories known as buckets. There are 5 different buckets in Splunk used for different data ages
Over the period of time data rolls over the buckets from one stage to other
When first time data gets into indexed, it moves to the hot bucket. Hot buckets are searchable and even actively being written too.
When certain conditions occur (for example, the hot bucket reaches a certain size or Splunk gets restarted), the hot bucket becomes a warm bucket (“rolls to warm”), and a new hot bucket is created in its place.
Warm buckets are searchable but are not actively written to.
Once further conditions are met (for example, the index reaches some maximum number of warm buckets), the indexer begins to roll the warm buckets to cold based on their age. It always selects the oldest warm bucket to roll to cold. Buckets continue to roll to cold as they age in this manner
Cold buckets are searchable.
Frozen buckets are non searchable.
Thawing buckets are searchable.
The bucket aging policy can be modified by editing the attributes in indexes.conf.
There are two types of splunk forwarder as below
$splunkhome/etc/system/default
Splunk free lacks these features:
24 hours timer will be started by the license slave post to which all the search will be blocked and end user won’t be able to search the data in the slave until it reaches the license master again..
It is the generic SQL database plugin for Splunk which easily integrate with the database information with Splunk queries and reports.
Regular Expression for extracting ip address:
rex field=_raw “(?<ip_address>\d+\.\d+\.\d+\.\d+)”
OR
rex field=_raw “(?<ip_address>([0-9]{1,3}[\.]){3}[0-9]{1,3})”
Midnight to midnight on the clock of the license master
splunk start splunkweb
splunk start splunkd
ps aux | grep splunk
$SPLUNK_HOME/bin/splunk enable boot-start
$SPLUNK_HOME/bin/splunk disable boot-start
Set value OFFENSIVE=Less in splunk_launch.conf
Delete following file on splunk server
$splunk_home/var/log/splunk/searches.log
Fishbucket is a directory or index at default location i.e.
/opt/splunk/var/lib/splunk
It can be accessible by searching for “index=_thefishbucket” in Splunk GUI
It could be achieve by using the regular expression where necessary event can be regex and remaining could be send to NullQueue
By watching data from splunk’s metrics log in real-time.
index=”_internal” source=”*metrics.log” group=”per_sourcetype_thruput” series=”<your_sourcetype_here>” | eval MB=kb/1024 | chart sum(MB)
or to watch everything happening split by sourcetype….
index=”_internal” source=”*metrics.log” group=”per_sourcetype_thruput” | eval MB=kb/1024 | chart sum(MB) avg(eps)
All the search which running or which are been completed are saved in dispatch directory
$SPLUNK_HOME/var/run/splunk/dispatch
Using the defaults (which can be overwritten in limits.conf), these directories will be deleted 10 minutes after the search completes – unless the user saves the search results, in which case the results will be deleted after 7 days.
This algorithm is used to search the data very fast and basically it is used for batch based large scale parallelization.
It’s stimulated from functional programming’s map() and reduce () functions.
At indexer splunk keeps track of indexed events in a directory called fish buckets (default location /opt/splunk/var/lib/splunk). It contains seek pointers and CRCs for the files you are indexing, so splunkd can tell if it has read them already.
Search head pooling is a older way of splunk search which is now been overwritten by search head clustering in next upcoming splunk version. Search head cluster is managed by captain and captain controls its slaves. However both are the splunk search functionality/feature and at any given point of time if any search head is down then other search head come into picture and avoid search to go down. Moreover with respect to reliability and efficiency search head clustering takes an upper hand over Search head pooling.
Splunk gives the real time answer which is required to meet the customer expectation and business goals as well. It is connected to machine data and gain the insights from that and it tells us the opportunities and the risks for the business. Splunk scales to meet the modern data needs i.e. embrace the complexity and get the desired answers. Leverage artificial intelligence powered by machine learning for actionable and predicative insights.
Splunk provide the insights in the business, visibility in the operational, monitoring the environment proactively and search & investigate the issue.
The main components of Splunk are:
Splunk agents known as forwarder is installed/deployed on the application severs which collect the data from the source and forward it to the Indexer.
Then Indexer store these data locally based on the license capacity in a host machine or on cloud.
Post to these setup Search Head will be coming in picture where it is used for searching, analyzing, visualizing and performing various other functions on the data stored in the Indexer.
There are other tools in the market apart from Splunk for analyzing machine data, performing business intelligence for doing the business operations and providing the security. However apart from Splunk there is no other tools which can do all the operations.
Please find the basic difference below with few other recognized tools.
Features | Splunk | Sumo Logic | ELK (Kibana) |
---|---|---|---|
Searching | Searching is possible | Searching is possible | Only possible with Integrations |
Analysis | Analysis is possible | Analysis is possible | Only possible with Integrations |
Visualization Dashboard | Dashboard can be created | Dashboard can be created | Only possible with Integrations |
SaaS Setup | SaaS Setup is done | SaaS Setup is done | SaaS Setup is done |
On Premise Setup | OnPremise Setup is done | OnPremise Setup not possible | OnPremise Setup is done |
Input any data type | Any datatype can be done | Needs Plugins | Needs Plugins |
Customer Support | Available | Available but proficiency is lacking | Available but proficiency is lacking |
Documentation & Community | Available | Unavailable | Available |
Please find the benefits of the data flowing from forwarders to Splunk below
In any systems alerts are been configured for an erroneous situation in the system like CPU utilization, high memory consumption and few more so even Splunk alerts are the same where notification are been triggered to the configured mail id foe ant such happening.
Please find few examples for splunk alerts
For example,
Clustering technique has two terminologies known as Search Factor & Replication Factor.
Search factor determines what is the count of searchable copies for the data which is owned by the indexer.
Replication Factor in case of Indexer cluster, is the number of copies of data the cluster maintains and in case of a search head cluster, it is the minimum number of copies of each search artifact, the cluster maintains.
With respect to cluster Search head cluster has only a Search Factor and Indexer cluster has both a Search Factor and a Replication Factor
Moreover replication factor should not be less than search factor
There are many commands which are used during filtering the result. Please find few of the command used below
Stats | Chart | Timechart |
---|---|---|
It is reporting command | Displays the search result in bar format or line form or area graph | Bar and line graphs can be viewed |
Multiple fields used to create a table. | Takes only 2 fields, each field on X and Y axis respectively. | Takes only 1 field since the X-axis is fixed as the time field. |
Lookup commands are used when you want to receive some fields from an external file (such as CSV file or any python based script) to get some value of an event. It is used to narrow the search results as it helps to reference fields in an external CSV file that match fields in your event data
An inputlookup basically takes an input as the name suggests. For example, it would take the product price, product name as input and then match it with an internal field like a product id or an item id. Whereas, an outputlookup is used to generate an output from an existing field list. Basically, inputlookup is used to enrich the data and outputlookup is used to build their information.
Different types of Data Inputs in Splunk are
There are about 5 fields that are default and they are barcoded with every event into Splunk.
It can be extracted either from the below options
Please find the difference between Search time and Index time field extractions below
Search time field extractions | Index time field extractions |
---|---|
Fields are extracted while performing the search. | Fields are extracted when data comes to the indexer. |
No disk space is been consumed as extracted fields are not part of the metadata | Disk space are been consumed as extracted fields are a part of metadata |
Summary index is used to store the analytical data, reports and summaries.
Pros | Cons |
---|---|
Summary index retains the analytics and reports even after the data has aged out | Hayrick kind of a search not possible |
Deep dive analysis is not possible |
Time zone property provides the desired output for given time zone. Splunk picks up the default time zone from your browser settings. The browser in turn picks up the current time zone from the machine which is currently in use. Splunk picks up that time zone when the data is input, and it is required the most when we are searching and correlating data coming from different sources.
Splunk App is the collection of reports, dashboard, alerts, field extractions and lookups whereas Splunk Add-ons are same but they don’t have the visual components of a report or a dashboard.
Color need to be assigned to the charts while reports are been created. However if the colors are not assigned then colors are picked by default
Steps to assign the colors are
OR
It is one of the default fields that Splunk has assigns to all incoming data. It informs the Splunk what kind of data is been send so that it can format the data intelligently during indexing. Sourcetype also helps to categorize the data for making the search easy.
Whenever data limit is exceeded then ‘license violation’ error will been thrown. The license warning that is thrown up will persist for 14 days. For commercial license 5 warnings within a 30 day rolling window is been setup before which Indexer’s search results and reports stop triggering. In a free version it will show only 3 counts of warning.
Common ports numbers on which services are run (by default) are
Service | Port Number |
---|---|
Management / REST API | 8089 |
Search head / Indexer | 8000 |
Search head | 8065, 8191 |
Indexer cluster peer node / Search head cluster member | 9887 |
Indexer | 9997 |
Indexer/Forwarder | 514 |
Bucket concepts is been used in Splunk where all the data coming in the indexer are been stored in a directories known as buckets. There are 5 different buckets in Splunk used for different data ages
Over the period of time data rolls over the buckets from one stage to other
When first time data gets into indexed, it moves to the hot bucket. Hot buckets are searchable and even actively being written too.
When certain conditions occur (for example, the hot bucket reaches a certain size or Splunk gets restarted), the hot bucket becomes a warm bucket (“rolls to warm”), and a new hot bucket is created in its place.
Warm buckets are searchable but are not actively written to.
Once further conditions are met (for example, the index reaches some maximum number of warm buckets), the indexer begins to roll the warm buckets to cold based on their age. It always selects the oldest warm bucket to roll to cold. Buckets continue to roll to cold as they age in this manner
Cold buckets are searchable.
Frozen buckets are non searchable.
Thawing buckets are searchable.
The bucket aging policy can be modified by editing the attributes in indexes.conf.
There are two types of splunk forwarder as below
$splunkhome/etc/system/default
Splunk free lacks these features:
24 hours timer will be started by the license slave post to which all the search will be blocked and end user won’t be able to search the data in the slave until it reaches the license master again..
It is the generic SQL database plugin for Splunk which easily integrate with the database information with Splunk queries and reports.
Regular Expression for extracting ip address:
rex field=_raw “(?<ip_address>\d+\.\d+\.\d+\.\d+)”
OR
rex field=_raw “(?<ip_address>([0-9]{1,3}[\.]){3}[0-9]{1,3})”
Midnight to midnight on the clock of the license master
splunk start splunkweb
splunk start splunkd
ps aux | grep splunk
$SPLUNK_HOME/bin/splunk enable boot-start
$SPLUNK_HOME/bin/splunk disable boot-start
Set value OFFENSIVE=Less in splunk_launch.conf
Delete following file on splunk server
$splunk_home/var/log/splunk/searches.log
Fishbucket is a directory or index at default location i.e.
/opt/splunk/var/lib/splunk
It can be accessible by searching for “index=_thefishbucket” in Splunk GUI
It could be achieve by using the regular expression where necessary event can be regex and remaining could be send to NullQueue
By watching data from splunk’s metrics log in real-time.
index=”_internal” source=”*metrics.log” group=”per_sourcetype_thruput” series=”<your_sourcetype_here>” | eval MB=kb/1024 | chart sum(MB)
or to watch everything happening split by sourcetype….
index=”_internal” source=”*metrics.log” group=”per_sourcetype_thruput” | eval MB=kb/1024 | chart sum(MB) avg(eps)
All the search which running or which are been completed are saved in dispatch directory
$SPLUNK_HOME/var/run/splunk/dispatch
Using the defaults (which can be overwritten in limits.conf), these directories will be deleted 10 minutes after the search completes – unless the user saves the search results, in which case the results will be deleted after 7 days.
This algorithm is used to search the data very fast and basically it is used for batch based large scale parallelization.
It’s stimulated from functional programming’s map() and reduce () functions.
At indexer splunk keeps track of indexed events in a directory called fish buckets (default location /opt/splunk/var/lib/splunk). It contains seek pointers and CRCs for the files you are indexing, so splunkd can tell if it has read them already.
Search head pooling is a older way of splunk search which is now been overwritten by search head clustering in next upcoming splunk version. Search head cluster is managed by captain and captain controls its slaves. However both are the splunk search functionality/feature and at any given point of time if any search head is down then other search head come into picture and avoid search to go down. Moreover with respect to reliability and efficiency search head clustering takes an upper hand over Search head pooling.
Submitted questions and answers are subjecct to review and editing,and may or may not be selected for posting, at the sole discretion of Knowledgehut.