NOTE: This version of the ITSI health check dashboard has been deprecated. This dashboard describes the operational status and configuration of an ITSI instance. Because it accesses sensitive indexes (_internal) and REST endpoints, reports will be incomplete if not run by an Admin user.
(INFO|ER\w+|WAR\w+|FAT\w+|DEBUG|CRI\w+))\s+\["
| fillnull log_level value="UNKNOWN"
| eval log_level=case(log_level="WAR","WARNING",1=1,log_level)
| rex mode=sed field=component "s/(.*)-\d+/\1/"
| bucket _time span=5m
| stats count by _time log_level component host
]]>
$field1.earliest$$field1.latest$
Splunk Server Information| rest splunk_server=local /services/server/info
| stats values(version) as splunk_version, values(server_roles) as server_roles, values(os_name) as os, values(numberOfCores) as cpu_cores, values(numberOfVirtualCores) as virtual_cpu_cores, values(physicalMemoryMB) as physical_mem_MB by splunk_server
| rename splunk_server as host
| join type=left host [| rest splunk_server=local /services/apps/local/itsi
| stats values(version) as itsi_version by splunk_server
| rename splunk_server as host]
| join type=left host [ search index=_introspection sourcetype=splunk_disk_objects component=Indexes data.name="*itsi*"
| stats dc(data.name) as index_count, values(data.name) as indexes by host]
| join type=left host [ search index=_internal splunk_server=local sourcetype=splunkd "Linux transparent hugepage support" latest=now()
| head 1 | rex field=event_message "enabled= (?<enabled>\S+)"
| eval THP_kernel_settings=if(enabled="always", "not ok", "ok") ]
| table host splunk_version itsi_version os cpu_cores virtual_cpu_cores physical_mem_MB THP_kernel_settings server_roles index_count indexes$field1.earliest$$field1.latest${"bad":#F7BC38,"ok":#65A637}[#D93F3C,#F7BC38,#65A637]12,16[#D93F3C,#F7BC38,#65A637]24,32[#D93F3C,#F7BC38,#65A637]12288,16384
The maximum number of objects in each collection is 500,000. You may notice performance degradation as a collection approaches its limit.
KV Store Collections| rest splunk_server=local /services/server/introspection/kvstore/collectionstats
| mvexpand data
| spath input=data
| rex field=ns "(?<App>.*)\.(?<Collection>.*)"
| eval dbsize=size/1024/1024
| eval indexsize=totalIndexSize/1024/1024
| stats first(count) AS "Number of Objects" first(nindexes) AS Accelerations first(indexsize) AS "Acceleration Size (MB)" first(dbsize) AS "Collection Size (MB)" by App,Collection
| sort - "Number of Objects"$field1.earliest$$field1.latest$[#65A637,#F7BC38,#D93F3C]430000,500000
Concurrent Searchessource="*/metrics.log" sourcetype=splunkd index=_internal active_hist_searches group=search_concurrency "system total"
| stats max(active_hist_searches) as max_historical_searches, avg(active_hist_searches) as avg_historical_searches, max(active_realtime_searches) as max_realtime_searches, avg(active_realtime_searches) as avg_realtime_searches by splunk_server
| rename splunk_server as host
| eval avg_historical_searches=round(avg_historical_searches,0)
| eval avg_realtime_searches=round(avg_realtime_searches,0)
| join type=left host [ search source="*/metrics.log" sourcetype=splunkd index=_internal group=searchscheduler
| stats max(skipped) as max_skipped, max(max_running) as max_running, max(total_runtime) as max_total_runtime, avg(total_runtime) as avg_total_runtime by splunk_server
| rename splunk_server as host
| eval max_total_runtime=round(max_total_runtime,0)
| eval avg_total_runtime=round(avg_total_runtime,0)]$field1.earliest$$field1.latest$[#65A637,#F7BC38]1
Interesting Indexes| tstats count as entries latest(_time) as most_recent where index=itsi* OR index=_internal by index, splunk_server
| stats sum(entries) as entries, max(most_recent) as most_recent, values(splunk_server) as indexers by index
| eval most_recent=strftime(most_recent,"%F %T")$field1.earliest$$field1.latest$[#F7BC38,#65A637]1
Interesting Searches (If the real-time searches are not running, this could indicate a Java problem)| rest splunk_server=local /services/search/jobs/
| search label=itsi*
| fields label dispatchState isFailed isRealTimeSearch runDuration
| rename label as search_name$field1.earliest$$field1.latest$[#65A637,#F7BC38]1
KPI Performance ("runtime_headroom" is (100 - runtime / scheduled interval). For a search scheduled to run every 60sec, with a runtime of 45sec, runtime_headroom_pct = 25. 100 is good, 0 is bad). Your avg_result_count or max_result_count should not exceed the max_action_results for scheduler in limits.conf (default: 50k)
limit = (number of KPIs * number of entities associated with KPIs) + (number of services * 2). Exceeding the limit may lead to inconsistent results for KPI aggregation. Increasing the limit can impact system performance because more memory must be allocated to support increased search results.
index=_internal sourcetype=scheduler savedsearch_name="Indicator*"
| stats dc(sid) as run_count, count(eval(status="delegated_remote_error" OR status="skipped")) as failed_count, count(eval(suppressed!="0")) as suppressed_count,
avg(run_time) as avg_runtime, max(run_time) as max_runtime, earliest(_time) as first, latest(_time) as last,
max(result_count) as max_result_count, avg(result_count) as avg_result_count
by savedsearch_name
| eval KPI_search_type=if(savedsearch_name like "%Shared%", "base", "ad hoc")
| eval runtime_headroom_pct=round((100-(max_runtime/((last-first)/(run_count-1))*100)),1)
| eval avg_runtime=round(avg_runtime, 2)
| eval max_runtime=round(max_runtime, 2)
| eval avg_result_count=round(avg_result_count, 2)
| eval max_result_count=round(max_result_count, 2)
| table savedsearch_name KPI_search_type failed_count suppressed_count runtime_headroom_pct avg_runtime
max_runtime avg_result_count max_result_count run_count
| sort +runtime_headroom_pct$field1.earliest$$field1.latest$[#D93F3C,#F7BC38,#65A637]25,50[#65A637,#D93F3C]1[#65A637,#F7BC38]1
Savedsearch Error Messagesindex=_internal sourcetype=scheduler savedsearch_name="Indicator*"
| join sid
[ search index=_internal sourcetype=splunk_search_messages app="itsi" log_level=ERROR]
| stats count(savedsearch_name) as "count" avg(run_time) as "Avg Runtime(sec)" values(message_key) as "Message Key" values(message) as "Error Message" by savedsearch_name
| eval Avg Runtime(sec)=round('Avg Runtime(sec)', 3)
| rename savedsearch_name AS "Savedsearch Name"$field1.earliest$$field1.latest$
Not Executed Searches (In last 1 hour)index=_internal source=*splunkd.log "search not executed" user="splunk-system-user" | timechart count span=1h$field1.earliest$$field1.latest$Refresh Queue Statistics
The refresh queue ensures data integrity and eventual consistency of your ITSI configuration. It runs as a single instance.
Refresh Queue Runtimesindex=_internal sourcetype=itsi_internal_log source=*itsi_consumer* "Job Successful" |stats avg(transaction_time) as "Average Job Time", avg(queue_time) as "Average Queue Time", max(transaction_time) as "Maximum Job Time", max(queue_time) as "Maximum Queue Time"$field1.earliest$$field1.latest$
If more than one entity is using the same alias field value, KPI base searches might have incorrect statistical aggregation results. To remedy duplicate entity alias values, click Configure > Entities and edit the entity definitions for the entities with duplicate aliases. Keep the alias value for one of the entities and edit the others to remove the duplicate alias value. Learn More
Check for Duplicate Entity Aliases
1
]]>
$field1.earliest$$field1.latest$1