Splunk Interview Questions

Prepare better with the best interview questions and answers, and walk away with top interview tips. These interview questions and answers will boost your core interview skills and help you perform better. Be smarter with every interview.

  • 4.8 Rating
  • 47 Question(s)
  • 40 Mins of Read
  • 9654 Reader(s)

Beginner

Splunk gives the real time answer which is required to meet the customer expectation and business goals as well. It is connected to machine data and gain the insights from that and it tells us the opportunities and the risks for the business. Splunk scales to meet the modern data needs i.e. embrace the complexity and get the desired answers. Leverage artificial intelligence powered by machine learning for actionable and predicative insights.

Splunk provide the insights in the business, visibility in the operational, monitoring the environment proactively and search & investigate the issue.

  • Business Decisions - Splunk learns the trends, patterns and then gains the operational intelligence from the machine data which in turn help in taking better business decisions.
  • E2E visibility – With the help of machine data Splunk gain the end-to-end visibility across operations and then it is broken down across the infrastructure
  • Explore & Examine - With the help of machine data Splunk finds the problems and then correlate the events across multiple data sources and implicitly the detect patterns across massive sets of data.
  • Upfront Servers Monitoring – It uses machine data to monitor the systems which helps to identifying the issues, problems and even the attacks.

The main components of Splunk are:

  1. Indexer: Indexes machine data from the application server logs
  2. Forwarder: Forwards logs to index which is implemented on the application server logs
  3. Search head: Provides GUI for searching post to the index & forwarder implementation on the application server logs
  4. Deployment Server (Management Console Host): Manages the Splunk components (indexer, forwarder and search head) in a distributed environment

Splunk agents known as forwarder is installed/deployed on the application severs which collect the data from the source and forward it to the Indexer.

Then Indexer store these data locally based on the license capacity in a host machine or on cloud.

Post to these setup Search Head will be coming in picture where it is used for searching, analyzing, visualizing and performing various other functions on the data stored in the Indexer.

There are other tools in the market apart from Splunk for analyzing machine data, performing business intelligence for doing the business operations and providing the security. However apart from Splunk there is no other tools which can do all the operations.

Please find the basic difference below with few other recognized tools.

Features

Splunk
Sumo Logic
ELK (Kibana)
Searching

Searching is possible
Searching is possible
Only possible with Integrations
Analysis

Analysis is possible
Analysis is possible
Only possible with Integrations
Visualization Dashboard

Dashboard can be created
Dashboard can be created
Only possible with Integrations
SaaS Setup

SaaS Setup is done
SaaS Setup is done
SaaS Setup is done
On Premise Setup

OnPremise Setup is done
OnPremise Setup not possible
OnPremise Setup is done
Input any data type

Any datatype can be done
Needs Plugins
Needs Plugins
Customer Support

Available
Available but proficiency is lacking
Available but proficiency is lacking
Documentation & Community

Available
Unavailable
Available

Please find the benefits of the data flowing from forwarders to Splunk below

  1. Throttling on bandwidth
  2. To collect all syslog data from the system log server
  3. If any issues are been encountered on splunk the captured logs from the application server won’t be lost it will be saved in flat files on the servers.
  4. SSL connection for transferring the data from forwarder to an indexer are been encrypted.
  5. Data which is been pushed to splunk indexer are been load balanced by default to avoid any issue and the reason for introducing Load Balancer is if any one node of server of indexer is down then data can be routed to the other node.
  6. The data are been cached by forwarder locally prior sending to indexer this cache help as temporary backup of the data. Eventually at any given point of time data won’t be lost in any circumstance.

In any systems alerts are been configured for an erroneous situation in the system like CPU utilization, high memory consumption and few more so even Splunk alerts are the same where notification are been triggered to the configured mail id foe ant such happening.

Please find few examples for splunk alerts

For example,

  1. Notification need to send to the admin and to other (configured mail id) if application servers are unhealthy.
  2. Notification need to send if user enters multiple times wrong credentials.
  3. Weekly report of a dashboard
  4. Notification need to send to the admin and to other (configured mail id) if high number of failures has been observed.

Clustering technique has two terminologies known as Search Factor & Replication Factor.

Search factor determines what is the count of searchable copies for the data which is owned by the indexer.

Replication Factor in case of Indexer cluster, is the number of copies of data the cluster maintains and in case of a search head cluster, it is the minimum number of copies of each search artifact, the cluster maintains.

With respect to cluster Search head cluster has only a Search Factor and Indexer cluster has both a Search Factor and a Replication Factor

Moreover replication factor should not be less than search factor

There are many commands which are used during filtering the result. Please find few of the command used below

  • Rex- In simpler word it is a regular expression which helps the user to extract the data/exact field from the events which are generated. To get these info REX command is used.
  • Where- EVAL expression is used by WHERE command to filter the searched result from the extracted event. WHERE command is used to deep dive in the searched results
  • Sort- If the user wants the result need to be sorted by specified fields then SORT command is been used which can sort in result in ascending or descending order. Moreover even the capacity of the sorting can be defined with this command.
  • Search- To retrieve the events from the indexes SEARCH command is been used. Events from the indexes can be searched by using keyword, Key, Value, quoted phrases and the wildcards.
Stats

Chart
Timechart
It is reporting command
Displays the search result in bar format or line form or area graph

Bar and line graphs can be viewed
Multiple fields used to create a table.
Takes only 2 fields, each field on X and Y axis respectively.

Takes only 1 field since the X-axis is fixed as the time field.

Lookup commands are used when you want to receive some fields from an external file (such as CSV file or any python based script) to get some value of an event. It is used to narrow the search results as it helps to reference fields in an external CSV file that match fields in your event data

An inputlookup basically takes an input as the name suggests. For example, it would take the product price, product name as input and then match it with an internal field like a product id or an item id. Whereas, an outputlookup is used to generate an output from an existing field list. Basically, inputlookup is used to enrich the data and outputlookup is used to build their information.

Different types of Data Inputs in Splunk are

  1. Using files and directories as input.
  2. To get the data push automatically in splunk is by Configuring Network ports.

There are about 5 fields that are default and they are barcoded with every event into Splunk.

  1. host,
  2. source,
  3. source type,
  4. index
  5. timestamp

It can be extracted either from the below options

  1. Event Lists
  2. Sidebar
  3. From Settings menu via GUI
  4. Creating regular expressions in props.conf configuration file

Please find the difference between Search time and Index time field extractions below

Search time field extractions
Index time field extractions

Fields are extracted while performing the search.

Fields are extracted when data comes to the indexer.
No disk space is been consumed as extracted fields are not part of the metadata

Disk space are been consumed as extracted fields are a part of metadata

Summary index is used to store the analytical data, reports and summaries.

Pros

Cons
Summary index retains the analytics and reports even after the data has aged out
Hayrick kind of a search not possible


Deep dive analysis is not possible

Time zone property provides the desired output for given time zone. Splunk picks up the default time zone from your browser settings. The browser in turn picks up the current time zone from the machine which is currently in use. Splunk picks up that time zone when the data is input, and it is required the most when we are searching and correlating data coming from different sources.

Splunk App is the collection of reports, dashboard, alerts, field extractions and lookups whereas Splunk Add-ons are same but they don’t have the visual components of a report or a dashboard.

Color need to be assigned to the charts while reports are been created. However if the colors are not assigned then colors are picked by default

Steps to assign the colors are

  • Edit the panels built on top of the dashboard
  • Modify the panel settings from the UI
  • Select and choose the colors

                                 OR

  • Write commands to choose the colors from a palette by inputting hexadecimal values or by writing code.
  • Provide different gradients and set the values into a radial gauge or water gauge

It is one of the default fields that Splunk has assigns to all incoming data. It informs the Splunk what kind of data is been send so that it can format the data intelligently during indexing. Sourcetype also helps to categorize the data for making the search easy.

Advanced

Whenever data limit is exceeded then ‘license violation’ error will been thrown. The license warning that is thrown up will persist for 14 days. For commercial license 5 warnings within a 30 day rolling window is been setup before which Indexer’s search results and reports stop triggering. In a free version it will show only 3 counts of warning.

Common ports numbers on which services are run (by default) are

Service
Port Number

Management / REST API

8089
Search head / Indexer

8000
Search head

8065, 8191
Indexer cluster peer node / Search head cluster member

9887
Indexer
9997

Indexer/Forwarder

514

Bucket concepts is been used in Splunk where all the data coming in the indexer are been stored in a directories known as buckets. There are 5 different buckets in Splunk used for different data ages

  1. Hot bucket
  2. Warm bucket
  3. Cold bucket
  4. Frozen bucket
  5. Thawed bucket

Over the period of time data rolls over the buckets from one stage to other

  • HOT Bucket: - Contains newly indexed data. Open for writing. One or more hot buckets can be hold up for each index.

When first time data gets into indexed, it moves to the hot bucket. Hot buckets are searchable and even actively being written too.

  • WARM Bucket: - Data are been rolled out from hot to warm bucket. There can be many warm buckets. Data is not actively written to warm buckets.

When certain conditions occur (for example, the hot bucket reaches a certain size or Splunk gets restarted), the hot bucket becomes a warm bucket (“rolls to warm”), and a new hot bucket is created in its place.

Warm buckets are searchable but are not actively written to.

  • COLD Bucket: - Data are been rolled out from warm to cold. There can be many cold buckets.

Once further conditions are met (for example, the index reaches some maximum number of warm buckets), the indexer begins to roll the warm buckets to cold based on their age. It always selects the oldest warm bucket to roll to cold. Buckets continue to roll to cold as they age in this manner

Cold buckets are searchable.

  • FROZEN Bucket: - Data are been rolled out from cold. The indexer deletes frozen data by default, but it can chosen to archive it instead. Archived data can later be thawed.

Frozen buckets are non searchable.

  • THAWING Bucket: - Data are been restored from an archive. If we archive frozen data, we can later return it to the index by thawing it.

Thawing buckets are searchable.

The bucket aging policy can be modified by editing the attributes in indexes.conf.

There are two types of splunk forwarder as below

  1. Universal forwarder (UF) - Splunk agents are been installed on application serverd to read the data, can’t parse or index data
  2. Heavy weight forwarder (HWF) – full instance of splunk with advance functionality.
  • props.conf
  • indexes.conf
  • inputs.conf
  • transforms.conf
  • server.conf

Splunk free lacks these features:

  1. authentication and scheduled searches/alerting
  2. distributed search
  3. forwarding in TCP/HTTP (to non-splunk)
  4. deployment management

24 hours timer will be started by the license slave post to which all the search will be blocked and end user won’t be able to search the data in the slave until it reaches the license master again..

It is the generic SQL database plugin for Splunk which easily integrate with the database information with Splunk queries and reports.

Regular Expression for extracting ip address:

rex field=_raw  “(?<ip_address>\d+\.\d+\.\d+\.\d+)”

OR

rex field=_raw  “(?<ip_address>([0-9]{1,3}[\.]){3}[0-9]{1,3})”

Midnight to midnight on the clock of the license master

$SPLUNK_HOME/bin/splunk disable boot-start

Set value OFFENSIVE=Less in splunk_launch.conf

Delete following file on splunk server

$splunk_home/var/log/splunk/searches.log

Fishbucket is a directory or index at default location i.e.

/opt/splunk/var/lib/splunk

It can be accessible by searching for  “index=_thefishbucket” in Splunk GUI

It could be achieve by using the regular expression where necessary event can be regex and remaining could be send to NullQueue

By watching  data from splunk’s metrics log in real-time.

index=”_internal” source=”*metrics.log” group=”per_sourcetype_thruput” series=”&lt;your_sourcetype_here&gt;” | eval MB=kb/1024 | chart sum(MB)

or to watch everything happening split by sourcetype….

index=”_internal” source=”*metrics.log” group=”per_sourcetype_thruput” | eval MB=kb/1024 | chart sum(MB) avg(eps)

All the search which running or which are been completed are saved in dispatch directory

$SPLUNK_HOME/var/run/splunk/dispatch

Using the defaults (which can be overwritten in limits.conf), these directories will be deleted 10 minutes after the search completes – unless the user saves the search results, in which case the results will be deleted after 7 days.

This algorithm is used to search the data very fast and basically it is used for batch based large scale parallelization.

It’s stimulated from functional programming’s map() and reduce () functions.

At indexer splunk keeps track of indexed events in a directory called fish buckets (default location /opt/splunk/var/lib/splunk). It contains seek pointers and CRCs for the files you are indexing, so splunkd can tell if it has read them already.

Search head pooling is a older way of splunk search which is now been overwritten by search head clustering in next upcoming splunk version. Search head cluster is managed by captain and captain controls its slaves. However both are the splunk search functionality/feature and at any given point of time if any search head is down then other search head come into picture and avoid search to go down. Moreover with respect to reliability and efficiency search head clustering takes an upper hand over Search head pooling.

Description

Prepare better with the best interview questions and answers, and walk away with top interview tips. These interview questions and answers will boost your core interview skills and help you perform better. Be smarter with every interview.
Levels